Origination, Expansion, Evolutionary Trajectory, and Expression Bias of AP2/ERF Superfamily in Brassica napus

The AP2/ERF superfamily, one of the most important transcription factor families, plays crucial roles in response to biotic and abiotic stresses. So far, a comprehensive evolutionary inference of its origination and expansion has not been available. Here, we identified 515 AP2/ERF genes in B. napus, a neo-tetraploid forming ~7500 years ago, and found that 82.14% of them were duplicated in the tetraploidization. A prominent subgenome bias was revealed in gene expression, tissue-specific, and gene conversion. Moreover, a large-scale analysis across plants and alga suggested that this superfamily could have been originated from AP2 family, expanding to form other families (ERF, and RAV). This process was accompanied by duplicating and/or alternative deleting AP2 domain, intragenic domain sequence conversion, and/or by acquiring other domains, resulting in copy number variations, alternatively contributing to functional innovation. We found that significant positive selection occurred at certain critical nodes during the evolution of land plants, possibly responding to changing environment. In conclusion, the present research revealed origination, functional innovation, and evolutionary trajectory of the AP2/ERF superfamily, contributing to understanding their roles in plant stress tolerance.


INTRODUCTION
The AP2/ERF superfamily is one of the largest groups of transcription factors in plants, and plays important roles in resistance of abiotic and biotic stress (Licausi et al., 2013). Based on the AP2 domain number, AP2/ERF superfamily can be divided into ERF, AP2, RAV, and Soloist families. The ERF genes encode proteins with a single AP2 domain, while AP2 genes encode protein with two AP2 domains (Nakano et al., 2006;Licausi et al., 2010). According to DNA binding domain sequences, the ERF family can be further divided into two subfamilies, ERF and DREB subfamilies. Besides a single AP2 domain, the RAV family have an additional B3 domain (Hu and Liu, 2011). In addition, there are genes divergent from the ERF and RAV families, though containing the AP2 domains, and here they are named as Soloist according to the previous report (Sakuma et al., 2002).
The DREB subfamily activates dehydration/cold-regulated genes by interacting with DRE/CRT elements, and therefore enhances tolerance to multiple abiotic stresses (Lata and Prasad, 2011). For example, DREB2 genes are involved in dehydrationand high-salinity-responsive gene expression in transgenic Arabidopsis (Nakashima et al., 2000). The EsDREB2B gene cloned from Eremosparton songoricum is shown to be able to enhance tolerance to multiple abiotic stresses in yeast and transgenic tobacco (Li et al., 2014). The Suaeda salsa SsDREB gene enhances abiotic stress tolerance in transgenic tobacco . ERF subfamily is involved in signal pathways of stress, pathogen, and disease-related stimuli (Cheng et al., 2013;Schmidt et al., 2013;Shoji et al., 2013;Zhu et al., 2014). In transgenic plants, the over-expression of ERF genes has been reported in O. sativa , A. thaliana (Wang et al., 2015a), Solanum lycopersicum, and Nicotiana tabacum (Zhang and Huang, 2010), leading to salt and drought tolerance. Most ERF family genes improve abiotic tolerance without causing undesirable growth phenotypes (Xu et al., 2011). Besides, CRL5, an AP2 gene, promotes crown root initiation in rice (Kitomi et al., 2011). CRL5 can also affect sepal abscission, leaf shape, and plant height in B. napus, maize, and water lily Luo et al., 2012;Yan et al., 2012). CaRAV1 cloned from Capsicum annuum increases tolerance to osmotic stress and high salinity in Arabidopsis, and RAV-1-HY15 gene in B. napus can be induced by cold, PEG, and NaCl treatments (Lee et al., 2010;Zhuang et al., 2011b). Therefore, it is important to identify all AP2/ERF genes to reveal mechanisms underlying stress signal transmission, and finally manipulate AP2/ERF protein regulation to improve plant stress resistance.
As an important oilseed crop grown worldwide, the genome of B. napus was recently sequenced and assembled (Chalhoub et al., 2014). B. napus (AACC genome), an allopolyploid, is originated by hybridization between B. rapa (AA genome), and B. oleracea (CC genome) only ∼7500 years ago (Chalhoub et al., 2014). The availability of these Brassica genomes, together with those of endicot relatives, A. thaliana, P. trichocarpa, and V. vinifera etc (Tuskan et al., 2006;Jaillon et al., 2007;Lamesch et al., 2012;Cheng et al., 2014;Liu et al., 2014;Parkin et al., 2014), provides us an opportunity to understand the formation and evolution of AP2/ERF superfamily and may help clarify molecular mechanisms responsible for abiotic and biotic stress responses.

Identification and Characterization of AP2/ERF Superfamily Genes
Pfam database was used to identify genes from AP2/ERF superfamily (Finn et al., 2014), and AP2 domain has Pfam accession number PF00847.16. Genes containing AP2 were defined as AP2/ERF superfamily, and further verified using SMART (Letunic et al., 2012). Gene structures were checked by GSDS (http://gsds.cbi.pku.edu.cn/; Hu et al., 2015). Chromosomal distribution of B. napus genes was displayed using an in-house-developed Perl script. The number of exon and intron were showed by Circos (http://circos.ca/; Krzywinski et al., 2009). The AP2 domains of the protein sequences were used to construct phylogenetic trees. Phylogenetic analyses were conducted using MEGA 6.0 (Tamura et al., 2013). Neighbourjoining (NJ) trees were constructed with a bootstrap value of 1000 replications to assess the reliability of the resulting trees. In addition, the maximum-likelihood phylogenetic trees (ML) were constructed using JTT model with the bootstrap value of 1000 by PhyML program (Guindon et al., 2010).

Identification of Collinear Blocks and Gene Conversion
Firstly, whole-genome protein sequences from all species were searched against themselves using BLASTP with an E < 1 × 10 −5 (Song et al., 2014a). MCScanX was then used to detect collinear blocks with default parameter setting according to a previous report . Using output results of MCScanX, we extracted the AP2/ERF genes located in the collinear blocks. We checked gene conversion by referring to results of a previous analysis at the whole-genome scale (Chalhoub et al., 2014).

Selective Pressure and Divergence Time Estimation
To estimate the divergence time between collinear AP2/ERF gene pairs, alignment of the protein sequences were performed using ClustalW2, and then were translated into CDS alignment. These softwares were implemented by BioPerl. We estimated the synonymous (Ks) and non-synonymous (Ka) nucleotide substitutions rate between the collinear genes using the method developed by Nei and Gojobori, implemented in KaKs_calculator (Wang et al., 2010). Divergence time was therefore inferred using the formula T = Ks/2R, where R is 1.5 × 10 −8 synonymous substitutions per site per year (Koch et al., 2000). We applied likelihood ratio (LR) tests of positive selection based on the ML methods and codon substitution models. Based on previously reported methods (Mondragon-Palomino et al., 2002;Mondragon-Palomino and Gaut, 2005), we implemented Codeml from PAML package and analyzed groups I, VII, and AP2 family to infer ω, the ratio of the non-synonymous to synonymous distances (Yang, 1997;Yang et al., 2000). We employed complete deletion method when analysing alignments with gaps, and eliminated sequences that contained 40% of their length or more as InDels. We detected variation in ω among sites by employing a likelihood ratio test between M0 and M1, and M7 and M8 models.

Expression Pattern Analysis
To analyze gene expression patterns, we used the Illumina RNAseq data reported previously (Chalhoub et al., 2014), containing two tissues (root, leaf) of B. napus, and three replicates. The cluster was analyzed using Cluster3.0, and Reads Per Kilobase per Million mapped reads (RPKM) log2-transformed values. Heat maps were then constructed using TreeView (http:// jtreeview.sourceforge.net/) for visualization of the clustering results. The boxplots of the gene expression were drawn by using R program, and the χ 2 -test was performed to detect significant differences between subgenomes or different tissues. The cases with expression change larger than two-fold, and the corresponding P < 0.05 were used to identify differentially expressed genes (DEGs).

Identification of AP2/ERF genes in B. napus
In all considered plants, a total of 1956 AP2/ERF genes were identified (Figure 1). They were renamed according to plant names, chromosomal locations, and the family types, facilitating easy identification of their sources, distribution, and comparison (Table S1). In B. napus, 515 distinct AP2/ERF genes were identified, with 256 genes (BnaA###) from AA-subgenome and 258 (BnaC###) from CC-subgenome. Gene BnaUnng01150D failed to be assigned to any subgenomes, and was renamed as BnaSERF-001. According to conserved domain similarity to Arabidopsis genes, we classified B. napus AP2/ERF superfamily into four families, including ERF, AP2, RAV, and Soloist (Table S2). The ERF family contains DREB subfamily and ERF subfamily, and both of which can be further divided into several groups I-X, with groups I to V belonging to DERB subfamily, and VI to X belonging to ERF subfamily (Figure 1). There are 435 genes in ERF family, featuring a single AP2/ERF domain, and 58 genes in AP2 family, featuring tandem repetitive AP2/ERF motif (32) and high similarity to the Arabidopsis AP2 family. A single AP2 domain was reported in Arabidopsis, e.g., AT2G39250, AT2G41710, and AT3G54990, from AP2 family (Nakano et al., 2006). There were 19 genes in RAV family, featuring a single AP2/ERF DNA binding domain and a B3 domain. Three genes (BnaCSoloist-001, BnaCSoloist-002, and BnaASoloist-001) were identified to be Soloist ones, for sharing high similarity to Arabidopsis Soloist (AT4G13040) and divergent from the ERF family.
B. napus has the most AP2/ERF genes among the plants under consideration, about fewer than twice of those in B. rapa (281) or B. oleracea (281), showing likely about 9% gene losses in each subgenome after hybridization, over twice of that in P. trichocarpa (202), and over three times of those detected in O. sativa (156), A. thaliana (146), and other plants considered. The gene number of each ERF, AP2, RAV, and Soloist family in B. napus also exceeds those in other plants. There are much fewer AP2/ERF genes (<20) in Green alga than higher plants. For example, Ostreococcus lucimarinus and Micromonas pusilla RCC299 each have 8 AP2/ERF genes, indicating a vast expansion of the superfamily during the evolution of higher plants. Four gene families exist in nearly all the higher plants examined, excepting S. moellendorffii and P. patens, where the Soloist family was absent. No RAV or DREB gene was detected in the five Green alga species. Notably, we found that there were more AP2 genes than other families in Green alga, different from that in higher plants.
Many genes form merely small gene clusters on chromosomes. As to the criteria that two or more genes were located on the chromosome within 200 Kb ( Figure S1), we found 42 and 32 clusters in the AA and CC-subgenomes, respectively. Most clusters contained only 2 or 3 genes, and the largest cluster contained 5 genes. In AA-subgenome, 26 clusters have 2 genes, involving 61.9% of genes in total; and 14 clusters have 3 genes, involving 33.3% of genes. In CC-subgenome, the percentage of 2 gene clusters was over 93.7% (30), and 3 gene clusters was only 6.3% (2), which was significantly different with the AA-subgenome. Interestingly, we identified 2 clusters, which contained 4, and 5 genes in AA-subgenome, while there was no cluster contained more than 3 genes in CC-subgenome. Furthermore, there were more genes located in clusters in AAsubgenome (103 genes) than in CC-subgenome (66 genes; Figure  S2A, Table S3).

Gene Structure and Conservative Motif
Members of the same functional group mostly have similar intron/exon structure ( Figure S3). Among 515 AP2/ERF genes, 309 (60.0%) of them have one exon and 445 (86.4%) genes had 1∼3 exons, (Table S4), similar to GRAS and AP2/ERF superfamily in B. rapa (Song et al., 2013(Song et al., , 2014b. For instance, most genes from groups III, VI, VIII, and IX have only one exon, and the length of them are very similar. Overall, genes from the AP2 family have more exons and introns than those from families ERF and RAV ( Figure S4). Some genes contained abnormally long introns, such as BnaAERF-170 in group II, BnaCERF-168 in group V, BnaAERF-089 in group VII, and BnaCAP2-015, and BnaCAP2-023 in group AP2. Some exceptional genes, such as BnaARAV-006, BnaCERF-001, and BnaAERF-085, contain 8 or more exons. In addition, all 58 AP2 genes had 5∼11 exons, which was significant different from ERF and RAV families ( Figure S5). For example, the AP2 genes BnaAAP2-013, BnaAAP2-027, and BnaCAP2-012 had 11 exons and 10 introns. Furthermore, we found evidence that some newly duplicated genes in the superfamily have no introns, though their highly similar homologs have introns, evidencing their formation through retrotransposon activity.

Subgenome-Biased Gene Conversion
Using the gene conversion dataset previous reported (Chalhoub et al., 2014), we conducted the gene conversion analyses between AA-and CC-subgenome for B. napus AP2/ERF genes. A total of 68 gene pairs showed likely gene conversion ( Figure S2A, Table  S5). Among these gene pairs, 356 conversion sites were identified to occur with CC-subgenome as donor, while 267 conversion sites with AA-subgenome as donor, showing a significant bias between subgenomes (P = 3.6e-04).
To reveal the relations of gene conversion and physical clusters, we conducted comparative analyses of these two gene sets. Fifty-two (20.6%) AP2/ERF genes were shared between these two gene sets in the B. napus genome ( Figure S2B). Among these genes, 29 (20.4%) genes located in the AA-subgenome, and 23 (20.7%) genes located CC-subgenome ( Figures S2C,D).

Homologous Genes
Not surprisingly, among all 14 species, B. oleracea and B. rapa have the most orthologous gene pairs with B. napus than other species (Figure 2A, Table S6). The number of orthologous gene pairs between B. oleracea and B. napus was 429, very close to The ratio indicated that the one gene has the one or more orthologous genes in each species. (C) The Venn diagram shows the number of common and specific clusters and AP2/ERF genes in five Green alga species. The first number in the brackets represents the number of cluster, and the second number represents the number of genes. (D) The Venn diagram shows the number of common and specific clusters and AP2/ERF genes in eight angiosperms species. (E) The Venn diagram shows the number of common and specific clusters and AP2/ERF genes in B. napus and other 14 species used in this study. The abbreviations represent the species as follows: Bna,B. napus;Bra,B. rapa;Bol,B. oleracea;Ath,A. thaliana;Ptr,P. trichocarpa;Vvi,V. vinifera;Osa,O. sativa;Atr,A. trichopoda;Smo,S. moellendorffii;Ppa,P. patens;Cre,C. reinhardtii;Vca,V. carteri;Csu,Mpu,M. pusilla RCC299;Olu,O. lucimarinus. that between B. rapa and B. napus (425), about nearly twice of that between A. thaliana and B. napus (236). There are only 5 and 9 orthologous AP2/ERF gene pairs detected between two green algas, Coccomyxa subellipsoidea C-169 and O. lucimarinus, and B. napus, respectively, (Figure 2A). We found that each Arabidopsis AP2/ERF gene had one to eight B. napus orthologous genes, demonstrating that some AP2/ERF genes in B. rapa and B. oleracea were subjected to copy number increase likely due to the recursive polyploidizations.
To check how different duplication events have contributed to the expansion of the superfamily, we identified collinear paralogous genes ( Figure 2B, Table S7). More paralogous gene pairs were observed in B. napus (146), P. trichocarpa (118), and P. patens (86) than in the other species. However, there were only 20, 23, and 6 paralogous AP2/ERF gene pairs in B. rapa, B. oleracea, and A. thaliana, respectively, ( Figure S6). This phenomenon indicated that several AP2/ERF genes might be lost or subjected to fast sequence divergence after polyploidization. In green alga, there were much fewer paralogous gene pairs than in the higher plants. It might be due to the fact that the genomes of green alga did not undergo whole-genome duplication as most of the higher plants did.
Furthermore, we identified AP2/ERF genes clusters using MCL algorism according to the previous report (Xu et al., 2013;Song et al., 2015). Firstly, we checked gene clusters among the 5 green alga ( Figure 2C, Table S8). A total of 8 clusters were detected, containing 28 AP2/ERF genes. Secondly, we checked gene clusters among 8 angiosperms ( Figure 2D, Table S9). Totally, 266 clusters were detected, which contained 1586 AP2/ERF genes. A total of 24 clusters contained 381 genes, which were shared by all the 8 angiosperms. Last but not the least, we checked gene clusters among B. napus and other 14 species ( Figure 2E, Table S10). A total of 285 clusters were detected, containing 1726 AP2/ERF genes. However, there was no cluster shared by all these species.
By checking gene collinearity within a genome, we found that more than 60% AP2/ERF genes in four Brassicaceae species were located in large collinear blocks with >100 collinear genes ( Figure 4A, Table 2, Figures S7-S9), showing their duplication during polyploidization. In B. napus, we identified 586 collinear AP2/ERF gene pairs, ( Table 3, Table S12), two times more than those in B. rapa (273) and B. oleracea (252). Among these collinear gene pairs, 415 AP2/ERF genes were detected in B. napus, followed by B. rapa (243), B. oleracea (238), and P. trichocarpa (212). In many other species, there were fewer than 100 genes in collinearity. Especially, in A. trichopoda and 5 green alga, no collinear AP2/ERF gene pair was found. Though non-Brassicaceae plants have much shorter intragenomic collinear blocks, we found that the percentage of AP2/ERF genes located in the collinear blocks was larger than the genome-wide average, showing that polyploidization contributed to the expansion of the superfamily.
We identified collinear gene pairs between B. napus and other plants ( Table S13). The collinear AP2/ERF gene pairs were only detected between B. napus and 7 angiosperms. A total of 837 collinear AP2/ERF gene pairs were detected between B. napus and B. rapa, followed by B. oleracea (808), A. thaliana (442), and P. trichocarpa (177).
Sequence divergence analysis supported the polyploidizations have contributed to expansion of the superfamily. Among 586 collinear AP2/ERF gene pairs of B. napus, Ks values were from 0.0057 to 1.5340, and the corresponding divergence time from 0.19 to 51.13 MYA ( Figure 4B, Table S12). To more directly demonstrate the divergence of collinear AP2/ERF genes, we plotted Ks distribution for each species and between each two of them (Figures 4C,D), and found two obviously peaks in B.  napus, and one peak in each of the other species (Figure 4C, Figure S10). For B. napus, the first peak might be formed due to the hybridization between B. rapa and B. oleracea, and the second peak, shared by B. rapa and B. oleracea, might be formed due to the Brassica triplication events 5∼9 MYA.

Phylogenetic and Evolutionary Analysis
As to the constructed phylogenetic tree of AP2/ERF superfamily, genes can be divided into 4 distinct clades (Figure 5A), corresponding to the ERF (ERF subfamily and DREB subfamily), AP2, RAV, and Soloist families, respectively, congruent with previous studies (Sakuma et al., 2002;Nakano et al., 2006). The DREB and ERF family can be further divided into 10 groups (groups I to V for DREB, and VI to X for ERF family). We explored the origination and evolution of this superfamily in higher and lower plants. Several groups were not detected in the lower plants. For example, no gene was found in DREB groups I to V, and RAV family in five Green alga species (Figure 1, Figure S11). This supports a hypothesis that DREB family genes in land plants acquired these groups from other genes during the very early stage of their origination. To evaluate this hypothesis, we managed to reconstruct the phylogenetic tree. As to ERF subfamily, genes from VII and VIII were found in both green alga and land plants, while genes from VI, IX, X, and VI-L groups found only in land plants but not in alga. Notably, the group VII genes existed in five green alga species. These facts support that the ERF subfamily may have been firstly originated and expanded from the group VII. The DREB subfamily and RAV family were found in P. patens and other land plants, but not in green alga. The phylogeny analysis implied that the expansion of DREB subfamily was the most likely from its group I, after acquiring an AP2 domain ( Figure 5A, Figure S11). The RAV family might be originated from the AP2 family after loss one AP2 domain and acquired a new B3 domain from an unknown source. Therefore, we speculate that the AP2 domain in higher plants for groups I to VI, IX or RAV family was acquired from other groups through gene-gene merge, which is a normal avenue to produce new genes.

Frequent Drive of Positive Selection
Strong positive selection was observed on the major nodes leading to origination and divergence of higher plants. To uncover whether and when natural selection had acted on the evolution of AP2/ERF superfamily, we performed selection pressure analyses in group I (DREB subfamily), group VII (ERF subfamily), and AP2 family, respectively, ( Figure 5B). A whole-scale analysis of all genes and families was not done in that too many genes were computationally infeasible for software PAML. In group I, significantly more non-synonymous than synonymous substitutions was detected, showing strong positive selection before the divergence of higher plants. In addition, significantly strong positive selection (ω = 443.6120) was detected before the divergence of the examined eudicot plants. In group VII, interestingly, more positive selected nodes were found during the divergence of five green alga than the higher plants, in which, actually, only 1 examined gene (VviERF-017) was inferred to be positively selected (ω = 513.4823). This seemingly suggested that group VII genes have been subjected to much selective pressure during the evolution of alga. As to the AP2 family, which was inferred as to be the likely ancestor node of the superfamily, was also subjected to positive selection during its further divergence with higher plants. Besides, positive selection (ω = 2.3207) was also observed on the branch before the split of the higher plants and green alga. For the higher plants, the positive selection was detected (ω = 139.3114) on the branches leading to the divergence of SmoAP2-003, PpaAP2-007, AtrAP2-007, and OsaAP2-002, but not on the branches leading to AthAP2-008, BnaAAP2-003, and VviAP2-004.

AP2 Family Evolution Mechanism in Plants
Among the AP2/ERF superfamily, most family genes had one AP2 domain except of the AP2 family genes, which had one or more, and mostly two, AP2 domains. For example, three genes (VviAP2-017, VcaAP2-001, and CreAP2-004) had three AP2 domains. Even more, seven AP2 domains were detected in the gene CsuAP2-007 of C. subellipsoidea C-169. To uncover the evolution mechanism of AP2 domain loss or duplication in plants, we constructed the phylogenetic tree using the AP2 domain of the representative AP2 family in each species examined ( Figure 6A). We found, for genes that have two domains, that the two domains (denoted as R1 and R2 in order on genes), could often be divided into two branches in the higher plants, and most R1 domains form a group, and R2 domains form another. At the meanwhile, domains from single-domain genes appeared in an intervening manner with the R1 or R2 domains. For example, three Brassica two-domain genes BnaAAP2-002, BraAP2-001, and BolAP2-001 are grouped with a single-domain gene BraAP2-003.
Here, as to copy number variation of AP2 domains, we found two major evolutionary trajectories of AP2 family ( Figure 6B). (i) Alternative domain loss; this occurred in both lower and higher plants, with some lost R1 and the others R2 (Figure 6B-model I and II). (ii) Duplication, loss and likely conversion; this occurred to the grape gene VviAP2-017 and alga gene MpuAP2-002. The grape gene has three domains closely grouped together with the first domain (R1) from a two-domain gene. This can be explained by recursive duplication of the R1 domain with R2 domain lost at some stage, or alternatively explained by conversion of R2 domain to R1 (Figure 6B-III). The alga gene CsuAP2-007 has 7   Based on the phylogenetic trees, we inferred these alga species had at least 4 domains in their common ancestor, and likely R4 in CsuAP2-007 was the most ancient one. A series of domain duplication, and likely conversion between domains, may have contributed to the gene's evolution (Figure 6B-IV). Interestingly, we found that R1 and R3 domains of C. reinhardtii and V. carteri genes likely exchanged their locations, as compared to C. subellipsoidea C-169. Besides, they all lost the R4 domain, and these changes might have occurred before their split.

Gene Expression Profiling
We analyzed expression level of AP2/ERF genes in leaf and root of B. napus. This dataset contained three replicates for leaf and root, and the expression level was calculated and normalized to RPKM (Table S14). Our analyses showed a good correlation among three replicates ( Figure 7A). Totally, 504 AP2/ERF genes were expressed in at least one tissue, and 11 genes were not expressed in both. In root, 325 genes were expressed with the RPKM values larger than 1.0. The RPKM value of three genes (BnaAERF-141, BnaCERF-210, BnaAERF-054) was more than 100. In leaf, 206 genes were expressed with the RPKM values larger than 1.0, and only 1 gene (BnaCERF-049) with the RPKM larger than 50. A total of 185 genes with RPKM >1 were detected in both tissues. Interestingly, the RPKM value of BnaCERF-049 were achieved 95.007 in root, and 50.099 in leaf, demonstrating that it may be important in B. napus. We compared the expression patterns of genes from the same family or group ( Figure S12). The results showed that the expression pattern of several genes was different with other genes in the same group. For example, we found several genes were down-regulated in leaf, while others were up-regulated in leaf for AP2 family (Figure 7B). Although the expression pattern was divergent in the whole gene family, the expression pattern of genes from the same clade in phylogenetic tree was similar, showing relative consistence between evolutionary closeness and functional similarity.
We analyzed the difference of the each AP2/ERF genes expression between tissues (Figures 8A,B). Grossly, we found that the average gene expression in root was higher than that in leaf (χ 2 -test, P = 8.525e-06), showing obvious tissue specificity. For ERF family, the average gene expression was significant different between leaf and root in the whole-genome scale, and in each of AA-subgenome (χ 2 -test, P = 1.545e-04) and CC-subgenome (χ 2 -test, P = 9.694e-05). For AP2 and RAV family, no significant difference was found between root and leaf.
In addition, we checked whether there is any subgenome bias with genes expression (Figures 8C,D). In leaf, the average expression level of genes from ERF family in CC-subgenome was significantly higher than those in AA-subgenome (χ 2 -test, P = 0.0284), while no significant difference was found between subgenomes for AP2 and RAV families. In root, the average expression level of genes from ERF family in the CC-subgenome was significantly lower than those in AA-subgenome (χ 2 -test, P = 0.0137), while no significant difference was observed between subgenomes for AP2 and RAV families.
Here, we identified and characterized the genomic structural, compositional, and expressional features of AP2/ERF genes in B. napus. Moreover, by performing evolutionary and phylogenetic analysis, we inferred their origination, formation, and evolution during the origination and evolution of land plants. These efforts can serve as a first step in comprehensive functional characterization of AP2/ERF genes by reverse genetic approaches and molecular genetics research.

Exploring AP2/ERF Genes Function in B. napus
The functions of most AP2/ERF genes have been well characterized in Arabidopsis. A comparison of sequence homologs between B. napus and Arabidopsis, might aid in understanding the function of these AP2/ERF genes in B. napus. We checking genes within the same taxonomic group on the phylogenetic tree, which could have similar functions. We identified 4 AP2/ERF genes  in B. napus, which clustered   together with the AtCBF1-3 ( Figure S13A), functionally relating to cold tolerance Lata and Prasad, 2011; Table S15). For AtCBF4 gene, functionally relating to drought and ABA response (Haake et al., 2002), two homologous genes were identified in B. napus ( Figure S13A; Table S15). Similarly, 4, 5, and 2 homologous genes in B. napus were clustered, respectively, together with AtDREB2A-2C ( Figure S13B), functionally relating to drought, salt, heat, or cold tolerance (Table S15; Sakuma et al., 2006;Lim et al., 2007;Djafi et al., 2013).

Subgenome Bias in B. napus
As a very young neo-tetraploid, B. napus may have formed for only ∼7500 years (Chalhoub et al., 2014), by hybridizing the genomes of B. rapa and B. oleracea. This provided a precious opportunity to understand how genes or gene families were affected in young polyploids. Here, we found different gene groups from the AP2/ERF superfamily, were much preserved after the formation of the tetraploid, showing that the genome structure has been much stable, i.e., a very low rate of gene loss, which was often proposed to be wide-spread during the early stage of a neo-polyploid, contributing to fast diversification of new plants (Jiao et al., 2011;Paterson et al., 2012;Liu et al., 2014;Woodhouse et al., 2014). Actually, in some artificial/synthetic tetraploids, chromosomal DNA and gene loss rates can be 15% during the first generations (Ozkan et al., 2001(Ozkan et al., , 2002. Therefore, the finding here shows that an appreciable span of genome stability should occur for polyploids. Some may have very instable genomes, but others may have very stable ones. The B. napus genome should be like the latter. A very stable genome in whole-scale may permit considerable subgenome interaction. It has been reported that illegititmate recombination may occur between subgenomes, leading to crossing-over or gene conversion (Wang et al., 2009b;Wang and Paterson, 2011;Guo et al., 2014). Gene conversion transfers genetic information in a unidirectional manner between genes. Divergence between subgenomes would decrease illegitimate recombination sharply (Peters et al., 2009). Especially, chromosomal rearrangements may be a critical restricting factor (Wang et al., 2009b). The two subgenomes of B. napus are much similar in chromosome numbers and compositions, permitting appreciable illegitimate recombination to occur. Homoeologous exchanges were inferred between the two subgenomes, and conversion could explain 86% differences between subgenomes (Chalhoub et al., 2014). Significant bias was observed between subgenomes that nearly 1.3 times more conversions occurred from the subgenome AA to subgenome CC than the other direction. Here, we found that 68 genes from AP2/ERF superfamily was much affected by gene conversion, and the conversion events occurred biasedly from subgenome AA to CC, showing a similar trend as to the whole-genome finding.
Subgenome bias was also observed for gene expression. In leaves, the ERF family genes from subgenome CC significantly higher expressed than those from subgenome AA, whereas in roots, an opposite finding was revealed. This revealed a tissue-related gene expression bias between two subgenomes. In contrast, other genes from the superfamily did not show this kind of bias. A biased gene conversion and gene profiling showed the relative dominance between two subgenomes. Previously, maize was shown to have two subgenomes (Schnable et al., 2011), which merged together ∼26 millions of years ago to produce the ancestral tetraploid (Wang et al., 2015b). However, one subgenome shows dominance over the other one in gene retention and gene expression (Schnable et al., 2011). Here, in a very young neo-tetraploid, B. napus, though two subgenomes may have preserved much of their ancestral genes, we observed biased gene expression and gene conversion as to a gene superfamily. This shows that subgenome bias may be popular effect between hybridized genomes in polyploids. As to the cause of subgenome bias or dominance, there have been hypothesis that differential genomic methylation might have played a role (Woodhouse et al., 2010). However, after thousands or millions of years, methylation pattern might have not been able to transfer through so many generations and have changed considerably, therefore there has been little evidence to support such a hypothesis.

Origination and Expansion
As to our phylogenetic analysis, the superfamily genes may have firstly originated from the AP2 family, and through recursive duplications of domain AP2, losses of domain AP2, and acquiring other domains, such as B3, to develop novel genes. This shows interesting evolutionary trajectories of building new genes by using existing domains. Many AP2 genes have two AP2 domains, which should be initially produced in the alga, which always have multiple AP2 domains. This phenomenon of duplicated domains was frequently observed, producing novel genes, and enriching, enhancing, and expanding gene functions (Wang et al., 2009a), which have recursively contributed to the resistance functions of plants.
The production, expansion, and deletion of tandem duplicated domains should be resulted from unequal DNA crossing-over during meiosis, which is a fundamental DNA recombination mechanism resulting in DNA variations (Keren et al., 2010). However, the mechanisms by which new protein domains arise and diversify are difficult to test experimentally (Black, 2003). It is hypothesized that ancient domains arose by fusion of short peptide ancestors and that they are further diversified by fusion with other domains. Recently, researchers tested how duplicated domains formed through biological experiment (Cahn et al., 2016). They tested the possibility whether the class II ketol-acid reductoisomerase (KARI) have been produced from an ancestral class I KARI by duplication of the C-terminal domain and corresponding loss of obligate dimerization. Eventually, they constructed a novel class II KARI by duplicating the C-terminal domain of a hyperthermostable class I KARI.

ETHICS STATEMENT
The study was approved by the North China University of Science and Technology, and Hebei Agricultural University, China. All patients provided written informed consent.

AUTHOR CONTRIBUTIONS
The study was conceived by XS, XW, JZ. XS, XM, JW, TL, YL, LW, WG, DG, ZW, and CL. contributed to data collection and bioinformatics analysis. XS, XM, XW, and JZ. participated in preparing and writing the manuscript. All authors contributed to revising the manuscript. All authors had read and approved the final manuscript.