Comparative Analysis of Genomic and Transcriptome Sequences Reveals Divergent Patterns of Codon Bias in Wheat and Its Ancestor Species

The synonymous codons usage shows a characteristic pattern of preference in each organism. This codon usage bias is thought to have evolved for efficient protein synthesis. Synonymous codon usage was studied in genes of the hexaploid wheat Triticum aestivum (AABBDD) and its progenitor species, Triticum urartu (AA), Aegilops tauschii (DD), and Triticum turgidum (AABB). Triticum aestivum exhibited stronger usage bias for G/C-ending codons than did the three progenitor species, and this bias was especially higher compared to T. turgidum and Ae. tauschii. High GC content is a primary factor influencing codon usage in T. aestivum. Neutrality analysis showed a significant positive correlation (p<0.001) between GC12 and GC3 in the four species with regression line slopes near zero (0.16–0.20), suggesting that the effect of mutation on codon usage was only 16–20%. The GC3s values of genes were associated with gene length and distribution density within chromosomes. tRNA abundance data indicated that codon preference corresponded to the relative abundance of isoaccepting tRNAs in the four species. Both mutation and selection have affected synonymous codon usage in hexaploid wheat and its progenitor species. GO enrichment showed that GC biased genes were commonly enriched in physiological processes such as photosynthesis and response to acid chemical. In some certain gene families with important functions, the codon usage of small parts of genes has changed during the evolution process of T. aestivum.


INTRODUCTION
In protein synthesis, triplet codons in the mRNA are translated into amino acids to form polypeptide chains. From 2 to 6 synonymous codons can be assigned to a specific amino acid, except for methionine and tryptophan, which are encoded only by AUG and UGG, respectively. Such synonymous codons are not used equally but are instead biased to some optimal codon (Sirihongthong et al., 2019). Codon usage bias, also known as codon bias, can Frontiers in Genetics | www.frontiersin.org 2 August 2021 | Volume 12 | Article 732432 vary among species and even within genes of the same organism (Mukhopadhyay et al., 2007;Bali and Bebok, 2015). Codon usage bias reflects a mutation-selection balance, which can be affected by mutation, translational selection, and genetic drift in a population (Shah and Gilchrist, 2011;Duan et al., 2021). Therefore, understanding the codon usage bias can reveal the effects of long-term evolution on plant genomes. Long-term studies account of the non-random variation in codon usage have shown that it is not simply a neutral process driven by mutational bias and genetic drift. There is substantial evidence that codon usage bias in some organisms is under selection pressure for translational efficiency (Zalucki et al., 2007;Szövényi et al., 2017;LaBella et al., 2019). Theories suggested that the neutralist model and selective model are not mutually exclusive and that codon usage might reflect a balance between mutational and selection effects (Duret and Mouchiroud, 1999). The "selection-mutation-drift" model was proposed to describe this balance. This model proposes that codon usage bias in genes with high expression is primarily the result of selection at the translational level and that in genes with low expression the bias is mainly caused by either mutation or genetic drift with the selective effect being relatively weak (Sharp and Li, 1987;Bulmer, 1991;Mukhopadhyay et al., 2007). Codon usage bias in unicellular organisms, such as Escherichia coli and Saccharyomyces cerevisiae, is adequately described by the selection-mutation-drift model. In contrast, codon usage bias is more complex in multicellular eukaryotic organisms. Lavner and Kotlar (2005) found codon bias tends to be high in some weakly expressed genes in mammalian eukaryotes. In such genes, non-optimal codons may have been selected to reduce the expression by reducing the elongation rate during gene translation. In mammals and plants, GC-biased gene conversion (gBGC) has also been proposed as a main driving force in the evolution of base composition, especially in grasses such as the genus Oryza and in some non-grass monocot species (Muyle et al., 2011;Clément et al., 2014;Mazumdar et al., 2017).
Codon usage bias in evolution research has been reported on model organisms, including E. coli (Krisko et al., 2014), Dengue virus (Manokaran et al., 2019), S. cerevisiae, Drosophila melanogaster (Kliman et al., 2003), and humans (Sauna and Kimchi-Sarfaty, 2011), as well as studies assessing codon usage in plant evolution research (Rensing et al., 2005;Liu et al., 2010;Sahoo et al., 2019;Chi et al., 2020). Plant species are broadly diverse in their gene expression, physiology, and stress response in varied environments. Therefore, the knowledge of codon usage and codon-pair context patterns in plants and the underlying evolutionary forces will advance understanding of the molecular mechanisms of environmental adaptation and biological diversity. Shen et al. (2020) analyzed codon usage patterns in eight citrus species based on coding sequence data, drawing the conclusion that few differences in codon features among citrus species and the genomes of citrus species were conserved. Researches showed that Arabidopsis genes show relatively little variation in codon usage, whereas an extreme degree of heterogeneity in codon usage patterns within the rice genome, which is highly correlated with differences in GC content between the genes (Wang and Hickey, 2007). Paralogs with high and low GC contents in rice and other cereals were analyzed and provided evidence for selectively driven codon usage, which represented a potential evolutionary process for the origin of genes with a high GC content in Gramineae (Guo et al., 2007).
With three closely related A, B, and D genomes, hexaploid wheat (Triticum aestivum L.) is the most widely cultivated crop on earth and is a model for allopolyploidy in plants . The progenitor species of the A genome is diploid wild einkorn wheat, Triticum urartu (AA; Ling et al., 2018). The ancestor of the D genome is the wild diploid grass Aegilops tauschii (DD; Jia et al., 2013), which spontaneously hybridized with cultivated tetraploid wheat Triticum turgidum (AABB; Avni et al., 2017) resulting in hexaploid wheat (AABBDD; Appels et al., 2018). Genome sequencing and assembly of wheat and its genome donor species have been finished. The aim of the present study is to assess which evolutionary forces have a significant impact on codon usage bias in hexaploid wheat and its genomic donors.

Sequence Data
The CDS genomic sequences of T. urartu accession G1812, wild tetraploid wheat (T. turgidum) accession "Zavitan, " Ae. tauschii accession AL8/78, and common wheat cultivar "Chinese Spring" were downloaded from the Ensembl Plants. 1 The CDS sequences were filtered using Python and the following criteria: (1) the gene is a complete CDS, (2) the length of the CDS sequence is more than 300 bp, and (3) the CDS begins with ATG and ends with a termination codon (Liu et al., 2020). The leaf transcriptome data in common wheat and its ancestors were collected from the NCBI 2 under accessions SRR3274670 (T. urartu), SRR11292316 (T. turgidum), SRR11292312 (Ae. tauschii), and SRR9647023 (T. aestivum). In addition, the sequences of genes for grain size (GS; Li and Yang, 2017), the CCT gene family (Zheng et al., 2017), and the prolamin genes (Ling et al., 2018) identified from previous studies were used. The collected fasta sequences of genes from different families were aligned to the local protein database of other species using BALSTX, which was downloaded from URGI database 3 with the expected value (E-value) of the lowest. Meantime, the conserved domains of the gene family were identified using HMMER 3.0.

Codon Bias Analysis
The codon usage of the T. aestivum genome and its ancestor species were analyzed using the software program CodonW 1.4.2 4 (Peden, 2000). The parameters for the codon usage calculation were listed as following: Relative Synonymous Codon Usage (RSCU), Effective Number of Codon (ENC), GC content (GC), codon first and second and the third position for GC content (GC1, GC2, and GC3), the average GC content of the first and second codons (GC12) and the content of each base in the third position of the synonymous codon (A3s, T3s, G3s, and C3s).
(1) RSCU: Among the synonymous codons, the codon with the larger RSCU value has a higher usage probability (Wright, 1990). RSCU values were estimated using the following formula: (2) ENC: A value below 35 indicates high codon usage bias and values above 50 indicates low bias (Sharp and Li, 1987). ENC were calculated by the following formula:  Effective Number of Codon-plot and neutrality plot analysis were used to analyze factors affecting the codon preference of T. aestivum and its ancestor species. In ENC-plot analysis, when ENC value is within the allowable range of the standard curve, the codon preference is caused by base mutation, otherwise the codon preference is caused by natural selection (Cai et al., 2006).
Neutral plot analysis was used to investigate the effect of mutational and selection pressure on the codon usage pattern. The neutrality plot was mapped with the scatterplot to analyze the correlation between GC12 and GC3. When the regression coefficient was close to 0, the correlation was low, indicating that the GC composition for the three positions of the codon differed, the GC content is highly conserved, and the codon preference for the species is mainly affected by natural selection (Su et al., 2009;Nasrullah et al., 2015). Otherwise, the codon bias of species is more affected by mutations.
PR2-plot was used to analyze whether the third base of a codon is biased according to the bias rule (Sueoka, 2001). The bias rule states that if A = T and G = C between the two complementary strands of DNA then there is no mutation and selection bias. If otherwise, then the codon is biased.

Correspondence Analysis and Determination of Codon
Correspondence analysis (COA) was generated to analyze codon usage changes in different genes. Correlation analysis is used to determine the main factors affecting codon bias (Jia et al., 2015). In COA, all the genes were distributed in multidimensional vector spaces using the RSCU COA function of codonW. The first axis represented the most different codon usage changes with sequential decreases represented in the second, third, and fourth axes.
To determine optimal codons of T. aestivum genome and its ancestor species, genes with the 5% highest and lowest ENC values were screened to establish a low preference group and a high preference group. Among genes from two groups, the RSUC difference was greater than 0.08, and the codon with the RSUC value greater than 1 was determined to be the optimal codon (Liu et al., 2020).

Major Factors of Variations in Synonymous Codon Usages Analysis
To determine the factors affecting the codon bias of the T. aestivum genome and its ancestor species, we analyzed the number of tRNA anti-codons and the length of genes. Firstly, CDS sequences of common wheat and its ancestor species were divided into four length categories: <1,000 bp, 1,000-2,000 bp, 2,000-3,000 bp, and >3,000 bp. Secondly, codon bias analysis for genes of different length was done using CodonW 1.4.2 and the numbers of genes and values of GC3s were calculated. Anti-codons of common wheat and its ancestor species were identified using tRNAscan-SE 2.0 (Chan and Lowe, 2019).

Gene Function
The rawdata of common wheat and its ancestors were used for quality control using fastp (Chen et al., 2018). To analyze the relationship between codon preference and function of expressed genes, the TPM of CDS sequences from common wheat and its ancestors were calculated using Salmon v1.3.0 (Patro et al., 2017). To determine differences in the function of genes with different codon bias, gene ontogeny (GO) enrichment were conducted separately using genes with different codon bias. For gene annotation, coding sequence regions of each gene in wheat and its ancestor species reference genomes were compared against the default database using eggNOGmapper (Huerta-Cepas et al., 2017. The CDS sequences of the TPM > 1 of all species were divided into high GC bias genes and high AT bias genes according to GC content of each species. The CDS sequences with different codon bias were calculated via GO enrichment analysis using the clusterProfiler R package (Yu et al., 2012).

Effective Number of Codons Value and GC Contents
The codon usage characteristics of the T. aestivum genome and those of its ancestor species, T. urartu, T. turgidum, and Ae. tauschii, were investigated using CodonW1. showed more codon usage bias than the A and B genome from T. turgidum. GC3s frequencies in T. aestivum (63.7%) and T. urartu (62.0%) were higher than those in T. turgidum (52.6%) and Ae. tauschii (54.6%; Figure 1; Supplementary Table 1). Codon usage bias has been shown to correlate with GC composition (Wan et al., 2004). The mean GC contents were examined for the whole coding sequences of the four species and the higher GC content partly explained the bias for G/C-ending codons seen in T. aestivum and T. urartu. PR2-bias plot analysis showed a non-proportional use of A/T and C/G in the four genomes, with T3>A3 and C3>G3 (Supplementary Figure 1). Neutrality plot results showed that the correlation between GC12 and GC3 was statistically significant (p < 0.001) in T. aestivum (r = 0.650), Ae. tauschii (r = 0.596), T. turgidum (r = 0.551), and T. urartu (r = 0.517). The slopes of the regression line were close to zero (0.16-0.20, Supplementary Figure 2), indicating that selective pressure should be the dominant factor shaping codon usage pattern in T. aestivum and its ancestor species.

Correspondence Analysis
In the present study, COA based on RSCU was expected to further identify the major factors affecting codon usage frequencies and synonymous codon preferences observed in the coding sequences of the four genomes. Genes from the four species were widely distributed along the two explanatory axes, suggesting a strong codon usage bias in these coding sequences (Figure 2). The first and second explanatory axis accounted for 40.2/4.3, 39.1/3.6, 47.9/4.1, and 40.6%/3.7% in T. urartu, T. turgidum, T. aestivum, and Ae. tauschii, respectively. Thus, the first axis reflects the primary factor that explains the differences in codon usage among all genes and genes with high GC and high AT were separated along this axis. The corresponding distribution of synonymous codons shows the separation of C/G-ending codons (except AGG that encodes arginine) and A/U-ending codons along this same axis. High G/C genes from T. urartu and T. turgidum were distributed on the right side of axis 1. The separation of genes on the second axis appears to be largely due to frequency differences in C-ending and G-ending codons among the GC rich genes; however, genes with more A-ending and T-ending codons tended to cluster at the center of axis 2. Correlation analysis showed that the third base preference of coding genes positioned on axis 1 of the COA were significantly related (Supplementary Table 2), indicating that base composition is the main factor affecting codon usage and the COA could be further used in optimal codon identification.
The optimal codons of common wheat and its ancestor species were identified based on the RSCU value (Supplementary Table 3) and the number of optimal codons in T. urartu, T. turgidum, T. aestivum, and Ae. tauschii were 21, 14, 22, and 16, respectively. All the optimal codons showed G/C at the third codon position, indicating that wheat and its ancestor species have similar optimal codon preferences for codons with G/C-endings instead of A/U-endings.

Variations in Synonymous Codon Usages in Triticum Genomes
To investigate the effect of coding sequences length on codon usage, coding sequences were classified into four groups: <1,000 bp, 1,000-2,000 bp, 2,000-3,000 bp, and >3,000 bp, and the average GC3s values computed for genes from each species (Figure 3). Bias for G/C-ending synonymous codon usage was much stronger in genes coding for short vs. long proteins (Supplementary Table 4). Among coding sequences longer than 1,000 bp, more GC-ending synonymous codons were observed in T. aestivum than in its ancestor species. However, among sequences <1,000 bp, T. urartu sequences showed the greatest bias for GC-ending codons. Thus, in longer genes T. aestivum showed the greatest tendency to use G/C at the third codon position.
In humans, GC distribution and codon preference have been proved to be related to gene density (Versteeg et al., 2003). In the present study, substantially higher gene density occurred in the distal regions of chromosome arms and genes distributed in these regions exhibited higher GC3s values. In contrast, the centromere regions showed lower gene density and the GC3s values of genes in these regions were low (Figure 4).

Corresponding tRNA Abundance
Biased usage of synonymous codons can also be constrained by the amount of isoacceptor tRNA to increase translation efficiency (Duret and Mouchiroud, 1999). The frequency of each synonymous codon encoding 21 amino acids was estimated, and we further investigated the amount of corresponding tRNA in the four species (Supplementary Table 5). Significant differences were found among the species for codon frequency and corresponding tRNA abundance. For example, T. urartu and T. aestivum showed high correlations (r = 0.667 and 0.724, p < 0.01) for the amino acids with two synonymous codons (Tyr, Phe, Lys, His, Glu, Gln, Cys, Asp, and Asn; Figures 5A,B). In contrast, amino acid with more than four synonymous codons (Val, Thr, Ser, Pro, Leu, Gly, Arg, and Ala), T. turgidum and Ae. tauschii showed high correlations (r = 0.511 and 0.488, p < 0.01; Figures 5C,D).

Gene Ontogeny Enrichment of GC/AT Biased Genes
Based on GO categories, the functional enrichment of codon usage biased genes from the four species was analyzed, and the top 10 GO terms with values of p < 0.05 for genes from each species were identified (Figure 6). G/C-ending genes performed different functions compared to A/T-ending genes. The GC3s biased genes from T. aestivum and its ancestor species were commonly enriched in genes related to photosynthesis ( Figure 6A). Also enriched were transmembrane transport (72/625), and response to acid chemical (71/625) in T. urartu, photosynthesis (46/802) and photosynthesis light reaction (43/802) in T. turgidum, response to acid chemical (158/1,214) and response to inorganic substance (113/1,214) in T. aestivum, and photosynthesis (46/751) and photosynthesis light reaction (40/751) in Ae. tauschii (Supplementary Table 6). AT3s biased genes likely perform similar functions in T. aestivum and T. urartu ( Figure 6B) as these genes were enriched in GO terms such as cellular location, macromolecule localization, and protein location. A/T ending genes from Ae. tauschii were mainly enriched in response to extracellular stimulus (50/836) and response to nutrient levels (47/836; Supplementary Table  7). The GO enrichment (Supplementary Tables 8 and 9) demonstrated that the evolutionary features reliably reflect the relative function of different codon biased genes.

Codon Preference in Important Functional Gene Families
The present study shows that synonymous codons had bias at the whole genome level in T. aestivum and its ancestor species. We next sought to investigate if codon usage preference changed during evolution for gene families with important functions, including grain size (GS) genes, the CCT gene family regulating flowering, and the quality related prolamin gene family. Based on 36 grain size related genes identified in previous studies in Triticum (Li and Yang, 2017), similar GS genes from the A genome of T. aestivum, T. turgidum, and T. urartu were isolated and the GC3s variation of GS genes during polyploidization was calculated (Supplementary Table 10). A difference in the GC3s value between the homologous genes of >0.1 was classified as being codon biased. Triticum urartu is the progenitor of the A subgenome of tetraploid T. turgidum and hexaploid T. aestivum (Ling et al., 2018). From T. urartu to T. aestivum, seven GS genes were identified with different preferences in codon usage patterns ( Table 1). Five of the seven genes, TRIUR3_05970-T1, TRIUR3_33526-T1, TRIUR3_34310-T1, TRIUR3_08952-T1, and TRIUR3_09477-T1, showed codon usage bias from the time T. urartu evolved into T. turgidum ( Table 2).
The RSCU of each codon was measured for CCT and prolamin genes to investigate the variation in codon usage bias of these genes across the four genomes. Consistent with the results at the genome level, the G/C-ending codons in most of the CCT family genes were overrepresented (RSCU > 1), except CCT24 which showed obvious A/U bias in the third position of synonymous codons. Among the species, the codon usage in T. aestivum CCT genes was most like that for genes in the D genome ancestor Ae. tauschii (Figure 7). Euclidean clustering of RSCU for codons in each gene showed that among the 26 CCT genes in T. aestivum, 15 were like Ae. tauschii, eight were like T. turgidum, and the last three were like T. urartu (Figure 8). For the prolamin genes, some optimal codons identified at the genome level (Supplementary Table 5) were underrepresented (RSCU < 1) including GUG encoding Val, UAC encoding Try, CCG encoding Pro, UUC encoding Phe, CUC encoding Leu, CAG encoding Glu, GAC encoding Asp, and CGC encoding Arg (Figure 9).

DISCUSSION
Triticum aestivum has a close phylogenetic relationship with its ancestor species. However, due to the long-term evolutionary processes and the various environments, where the plants evolved, the codon usage patterns among the species differ. In the present study, we analyzed the codon bias patterns in T. aestivum and its progenitor species and found that T. aestivum has similar preference in codon usage with T. urartu and that it differs most from T. turgidum. This phenomenon is mainly explained by the  Table 1), which considered to be the strongest determinant of codon usage variation caused by mutational processes across species (Muyle et al., 2011;Plotkin and Kudla, 2011).

tRNA Abundance Supports the Role of Selection for Translational Efficiency
Codon bias is determined by a balance between mutation, drift, and selection for optimal translational efficiency and/or accuracy (De La Torre et al., 2015). In many cases, codon usage correlates with the cognate tRNA abundance with highly used codons generally having higher intracellular tRNA concentrations than low-use codons (Zalucki et al., 2009). This fact indicates that tRNA abundance may be the main selection pressure for synonymous codon usage (Mohanta et al., 2020). In the present study of Triticum, more codon-anticodon interaction occurred in amino acids with two synonymous codons in T. aestivum and T. urartu and in amino acids with four or six synonymous codons in T. turgidum and Ae. tauschii. This result also provides evidence for the similar codon usage bias in T. aestivum and T. urartu. In contrast, for codons encoding glutamine from T. aestivum and T. urartu, corresponding tRNA copy numbers for the optimal codon (CAG) were much less than for the non-optimal codon (CAA), suggesting that the primary factor influencing codon-usage for these amino acids is not translation efficiency (Kanaya et al., 2001). Simultaneously, we considered the influence of codon swinging on tRNA abundance and codon usage preference. Crick (1966) proposed the wobble hypothesis based on stereochemistry to explain this phenomenon: when the first anticodon is A or C, only one codon can be recognized; when the first anticodon is G or U, two codons can be recognized, where G can identify C and U, where U can identify A and G. Due to the wobble phenomenon, a tRNA anticodon can be combined with more than one mRNA codon. We conjecture that UUG oscillates when recognizing CAA, so that more UUG can recognize the optimal codon CAG. We also found the same phenomenon in Ser. Wobble rules say that tRNAs with a GCU anticodon can pair with AGU, which explains the apparent  Table 5). In a recent research of 128 species of the plants ( Mohanta et al., 2020), high and low frequencies of wobble base-pairing were occurred at the G:U base pairing to meet translational demand.
Triticum aestivum Showed More GC3 Codon Bias in Genes With Optimal Length Duret and Mouchiroud (1999) found that codon usage bias is negatively correlated with protein length in Caenorhabditis elegans, D. melanogaster, and Arabidopsis thaliana. Gene length showed a negative correlation with GC3 content in non-grass monocot species Frontiers in Genetics | www.frontiersin.org 9 August 2021 | Volume 12 | Article 732432 (Mazumdar et al., 2017). The same phenomenon was found in the three progenitor species, as coding sequences shorter than 1,000 bp had the highest GC3 codon bias (Figure 3). In T. aestivum, though, synonymous codon bias increased among coding sequences of 1,000-2,000 bp compared to short sequences (<1,000 bp). In the T. aestivum genome, we found that most coding sequences were  between 1,000 and 2,000 bp (N = 53, 051), which is consistent with the result for T. aestivum transcript sequences reported by Wang et al. (2019). All the optimal codons identified in the Triticeae in the present study preferred G/C-endings at the genome wide level, and the GC3 content of coding sequences shorter than 2,000 bp was higher than that for longer genes. This result supports the idea that in T. aestivum sequences of less than 2,000 bp are the optimal length for translation, especially for important functional genes, and that the use of optimal codons increases their translational efficiency and accuracy. The optimal gene length of T. aestivum may be longer than that of its ancestral species, and perhaps the process of evolving into a complex hexaploid species enabled T. aestivum to translate longer proteins with minimal mismatches.

Particular Gene Families Showed Different Codon Usage Preference Compared With the Whole Genome
Genes for particular physiological processes, such as photosynthesis, response to salt stress/acid chemical/cold stress, and seed germination, were biased to GC-ending codons. In contrast, A/U-ending biased genes tended to be associated with the function of basic biological processes, like RNA processing, protein localization, protein transport, and membrane organization (Supplementary Tables 8 and 9). Some important genes always maintained conserved codon usage bias. For example, we identified 1,755 disease resistance genes in Ae. tauschii (Supplementary Table 11; Jia et al., 2013) and compared the GC3s contents with the corresponding homoeologous genes in D genome of T. aestivum. Only 19 genes had a GC3 difference of more than 0.1, an exceptionally low level (Supplementary Table 12).
CCT family genes are important for the flowering process, which enables the adaptation for reproductive success in various geographical environments (Yang et al., 2013). RSCU values of 59 synonymous codons in the CCT gene family from the four species were investigated and most CCT genes from T. aestivum (15/26) were clustered together with genes from Ae. tauschii, indicating that CCT genes from T. aestivum and Ae. tauschii were similar in codon usage frequency. We further measured the GC3 values of CCT genes and found that T. aestivum and Ae. tauschii CCTs showed similar G/C preference at the third position of codons (Supplementary Figure 3A; Supplementary Table 13), which was consistent with the RSCU results. Previous studies showed that the rapid evolution of the CCT gene family promotes the adaptability of Ae. tauschii (Zheng et al., 2017), the similar codon usage bias implies that CCT genes from T. aestivum maintained the functions from its D progenitor, Ae. tauschii, presumably to enhance environment adaptability. The prolamin sequences of the four species are highly similar, but there are obvious differences in the usage of codons, especially the optimal codons (Supplementary Figure 3B; Supplementary Table 14). As a donor of the D genome of wheat, Ae. tauschii provided wheat with better adaptability and quality traits (Jia et al., 2013).
For codon optimization, not only must the codon preference and tRNA abundance be considered, but also other factors including local mRNA folding, codon coordination, and codon correlation. By both natural evolution and genetic engineering, these properties can be altered to gain additional transcriptional regulatory ability for appropriate gene expression under specific cell conditions.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

AUTHOR CONTRIBUTIONS
CY collected datasets and performed the bioinformatics analysis. CY, QZ, and YW wrote original draft. XZ and JuZ conducted funding acquisition, project administration, and manuscript review and editing. JiZ and LQ plotted the figures. BW and SY prepared the supplementary data. All authors contributed to the article and approved the submitted version.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2021.732432/ full#supplementary-material FIGURE 9 | RSCU analysis showing synonymous codon usage preference of the Prolamin gene families in common wheat and its ancestor species. Columns correspond to the 59 nondegenerate, non-stop codons. Rows correspond to special family genes in the four species. Blue cells indicate low bias codons and red cells indicate high bias codons.