E183K Mutation in Chalcone Synthase C2 Causes Protein Aggregation and Maize Colorless

Flavonoids give plants their rich colors and play roles in a number of physiological processes. In this study, we identified a novel colorless maize mutant showing reduced pigmentation throughout the whole life cycle by EMS mutagenesis. E183K mutation in maize chalcone synthase C2 (ZmC2) was mapped using MutMap strategy as the causal for colorless, which was further validated by transformation in Arabidopsis. We evaluated transcriptomic and metabolic changes in maize first sheaths caused by the mutation. The downstream biosynthesis was blocked while very few genes changed their expression pattern. ZmC2-E183 site is highly conserved in chalcone synthase among Plantae kingdom and within species’ different varieties. Through prokaryotic expression, transient expression in maize leaf protoplasts and stable expression in Arabidopsis, we observed that E183K and other mutations on E183 could cause almost complete protein aggregation of chalcone synthase. Our findings will benefit the characterization of flavonoid biosynthesis and contribute to the body of knowledge on protein aggregation in plants.


INTRODUCTION
Flavonoids, which mainly consist of such compounds as chalcones, flavones, flavonols, anthocyanins, and proanthocyanins (Williams and Grayer, 2004), are a group of secondary metabolites that play important roles in multiple physiological processes. Anthocyanins and some other pigments are responsible for the red-blue color in flowers, seeds and other tissues, which could help plants attract pollinators or seed dispersers (Schaefer et al., 2004). Flavonoids have antioxidant functions by inhibiting the generation or reducing the levels of reactive oxygen species (ROS) (Agati et al., 2012); they are involved in UV protection (Schmelzer et al., 1988;Li et al., 1993;Bieza and Lois, 2001) and tolerance to other abiotic stresses Ren et al., 2019). Recent studies have observed associations between the biological activities of flavonoids and a variety of health benefits, including eyesight improvement (Ghosh and Konishi, 2007), cardiovascular health (Wallace et al., 2016), and cancer prevention (Lin et al., 2017).
Due to the success of isolating mutants with alterations in pigmentation (Koornneef, 1990;Holton and Cornish, 1995), the metabolites and key genes involved in flavonoid biosynthesis have been studied intensively in plants, including maize (Winkel-Shirley, 2001;Falcone Ferreyra et al., 2012;Eloy et al., 2017). In the upstream phenylpropanoid biosynthesis, phenylalanine is converted to 4-coumaroy-CoA by phenylalanine ammonia lyase (PAL), cinnamate 4-hydroxylase (C4H), and 4-coumarate CoA ligase (4CL). In the first two steps of flavonoid biosynthesis, 4coumaroy-CoA is converted to naringenin chalcone by chalone synthase (CHS) and chalcone isomerase (CHI). Naringenin can be catalyzed into flavonols, anthocyanins, or flavones through different branches and enzymes. In addition to these structural genes, biosynthesis is controlled by transcriptional regulators, including R2R3 MYB, basic helix-loop-helix (bHLH) and WD repeat (WDR) proteins and their complexes (Jin et al., 2000;Gonzalez et al., 2008;Feller et al., 2011;Xu et al., 2015;Strygina et al., 2019;Sun et al., 2020). Considering the important functions of flavonoids and the biosynthesis as a colorful model for genetics and cell biology (Winkel-Shirley, 2001), it is inspired to discover novel genes and mutations affecting flavonoid biosynthesis.
Artificial mutants serve as essential materials for forward and reverse genetics. In Arabidopsis and rice, T-DNA insertion mutagenesis has been widely used (Krysan et al., 1999;Jeon et al., 2000). The success obtained with this model cannot be easily replicated in maize owing to the difficulty of largescale transgene manipulation. However, other efficient methods, including irradiation, Mu transposon or chemical reagents, have been successfully implemented in maize (McCarty et al., 2005;Jia et al., 2016;Lu et al., 2018).
Traditional map-based cloning of mutants relies on repeatedly screening putative markers and evaluating the entire population, which is time-consuming and labor-intensive. Recently, a significant cost reduction was achieved by combining the bulked segregant analysis (BSA) strategy (Michelmore et al., 1991) with high-throughput sequencing. Individuals with extreme phenotypes are selected from each tail of the population and mixed together as a pool. Genotyping is only performed on chosen pools instead of the entire population. Distinct sequencing technologies (e.g., whole genome resequencing, RNA-seq, and exome capture sequencing) can be implemented, and different calculation methods were developed, such as the SNP-index (Abe et al., 2012), ED values (Hill et al., 2013) and G statistics (Magwene et al., 2011). Furthermore, population construction strategies evolved. Such methods as NGM (Austin et al., 2011), SHOREmap (Schneeberger et al., 2009), and BSRseq (Liu et al., 2012) use an F2 or BC1 population derived from two lines with distant ecotypes. Abe et al. (2012) developed the MutMap strategy for chemical-induced mutation, in which the mutant line is crossed to the wild-type line with the same ecotype. These strategies have been implemented to accelerate the process of mutant mapping by direct genome sequencing in Arabidopsis (Cuperus et al., 2010;Austin et al., 2011), rice (Abe et al., 2012;Takagi et al., 2015), and maize (Lu et al., 2018;Tran et al., 2020).
In the process of gene expression, from DNA to a functional protein or RNA, modification or regulation frequently occurs and plays indispensable roles. Posttranslational regulatory steps include posttranslational modifications, protein degradation, and aggregation. Intracellular protein degradation is the breakdown of abnormal or superfluous proteins into smaller polypeptides or amino acids. Some misfolded proteins may escape from degradation, instead accumulating and clumping together to form aggregates (Tyedmers et al., 2010). One of the best-known examples of protein aggregation is sickle cell anemia (Clancy, 2008). Genetic mutation (E6V) occurs in the oxygen-carrying protein hemoglobin (HbS). Under low-oxygen conditions, HbS polymerizes and forms fibrous precipitates. As a consequence, the shape of red blood cells changes from a normal flexible and round shape to a rigid and sickle-like. Recently, the accumulation and aggregation of abnormal proteins was found to be a common feature of human neurodegenerative diseases, such as Alzheimer's disease, Parkinson's disease, Huntington's disease, and amyotrophic lateral sclerosis (Boeynaems et al., 2018;Guo et al., 2018). Different from the brilliant picture in the animal field, studies about protein aggregation in plants remain rare. Although there are reports showing that stressful conditions could induce protein unfolding and aggregation to comprise cell survival (Nakajima and Suzuki, 2013;McLoughlin et al., 2019), there is little knowledge about how genetic mutations affect protein aggregation in plants, owing to the lack of related mutants.
In this study, combing ethyl methane sulfonate (EMS) mutagenesis and MutMap strategy, we identified and did gene mapping for a novel colorless maize mutant. The causal mutation proved to cause protein aggregation and dysfunction. Also, the effect of causal mutation on transcriptomic and metabolic levels were investigated.

Maize Growth and Sampling
For whole lifetime phenotypic observation, maize materials were grown in the field with regular management. In all other cases, materials were grown to three leaf stages in a greenhouse. For anthocyanin measurement, fresh tissues were used. For DNA, RNA, and metabolome extraction, samples were immediately frozen in liquid nitrogen and stored at −80 • C.

Anthocyanin Measurement
Anthocyanin was extracted with a methanol-HCl method (Lee and Wicker, 1991;An et al., 2017). Briefly, 0.1-0.2 g weighted tissues were grounded in liquid nitrogen and then soaked in 1 ml 1% HCl-methanol under darkness for 2 h with occasionally shaking for mixture. Optical density (OD) was measured using a UV spectrophotometer at 530, 620, and 650 nm. Anthocyanin concentration was calculated as [(OD530-OD620)−0.1 × (OD650-OD620)] × v/m, where m represents the weight of tissues (g), v is the volume (1 ml) of extraction buffer.

Mutant Mapping
Mutant mapping was taken with a modified MutMap strategy (Abe et al., 2012). An F2 population was derived from the B73 and colorless mutant lines. The two parents and MP (pool of over 200 mutant-type F2 individuals) were subjected to whole genome sequencing. Library preparation and sequencing were performed by Biomaker (Beijing, China) using Illumina Hiseq4000 (PE150) following a standard protocol.
Bioinformatics analysis after sequencing was implemented following a commonly used approach. The raw fastq format files were filtered using NGSGCToolkit (Patel and Jain, 2012) to remove adaptors and low-quality reads. Clean reads were aligned to the B73_RefGen_v4 reference genome (Jiao et al., 2017) achieved from Ensembl-Plants 1 using BWA (Li and Durbin, 2009). Picard 2 was used to assess the alignment and mark PCR duplicates. GATK (McKenna et al., 2010) and samtools  were used for variant calling. Finally, snpEff (Cingolani et al., 2012) was used for variant effect annotation.
Variants supported by more than two reads were treated as real variants. Homozygous variant was defined as one allele supported by at least four reads, while the other allele supported by no more than one reads, allowing the tolerance of sequencing errors. Variants that were homozygous and unique to the mutant line were used as markers for gene mapping. Variant-index was calculated as alternative allele frequency (read count for alternative allele/total read count) in mutant-type pool (MP). Lowess fitness was performed variant-index for smoothing using the "lowess" function in R (R Core Team, 2019).

Sanger Sequencing Validation
For variant validation, the PCR products were directly sequenced using the corresponding PCR primers (Supplementary Table 7, primer #1). For plasmid construction, sequencing was implemented to confirm the accuracy of the construct using common primers in the vector. All sanger sequencings were performed by Sangon (Shanghai, China). 4peaks 3 were used for the visualization of sequencing peaks. Sequence alignment was performed using pairwise sequence alignment tools available at EMBL-EBI 4 .

RNA-seq
RNA-seq was performed on a pool of B73-type F2 progenies (two biological replicates, BP1 and BP2) and a pool of mutant-type F2 progenies (two biological replicates, MP1 and MP2). Each pool consists of 20 individuals' first sheaths at the three-leaf stage. cDNA library preparation and sequencing were performed by Sangon (Shanghai, China) using Illumina HiseqXTen platform following a standard protocol. Bioinformatics analysis followed a common approach. Quality control was performed on raw reads using Trimmomatic (Bolger et al., 2014). Clean reads were further aligned to the reference genome B73_RefGen_v4 using Hisat2 (Kim et al., 2015). The R package "DESeq2" (Love et al., 2014) was used for differential expression analysis. Differentially expressed genes (DEGs) were defined as |log 2 FoldChange| ≥1 and qvalue ≤ 0.05. Gene annotation was achieved from Ensembl-Plants BioMart 6 .

Protein Conservation Analysis
For conservation analysis among species, ZmC2 was blast against Reference proteins (refseq_protein) database in NCBI 10 and 1,582 proteins with 358-405 amino acid lengths were selected. Multiple sequence alignment (MSA) was performed using Clustal-Omega 11 (Madeira et al., 2019). The MSA results were further analyzed and visualized using Mview 12 . Phylogenetic tree was visualized using Evolview v2 (He et al., 2016). For conservation analysis within species, variants information of ZmC2, ZmWHP1, OsCHS1, and AtCHS were achieved from Ensembl-Plants 13 .
The expression pattern for ZmC2 and ZmWHP1 in 23 tissues spanning multiple stages of maize development (Walley et al., 2016) were achieved from Expression Atlas 14 (Papatheodorou et al., 2020).

Gene Cloning and Plasmids Construction
A longer precursor containing the full CDS of CHS (ZmC2 or ZmWHP1) was cloned using cDNA as template (Supplementary Table 7, primer #4, #5). Point mutations at E183 site including E183K, E183R, and E183D were induced using specific primers containing designed mutations (Supplementary Table 7, primer #6-8) following the overlap PCR strategy (Ho et al., 1989). For prokaryotic expression, the CDS was cloned into the pPH vector (preserved in our lab) using the double-digest system (NdeI and XhoI restriction enzymes) to generate the fusion protein His6tag:CHS (Supplementary Table 7, primer #9, #10). The R12E point mutation and A2_P20 deletion mutation (A2_P20del) were induced using specific primers (Supplementary Table 7, primer #11, #12). For transient expression in maize leaf protoplasts, CDS was cloned into the pSATN1-GW vector using the Gateway Cloning System (Invitrogen) to generate the CHS:GFP-tag fusion protein. For transformation into Arabidopsis thaliana, CDS was cloned into pEarleyGate101 vector using the Gateway system to express CHS:YFP-tag fusion protein (Supplementary Table 7, primer #13-15).

Prokaryotic Expression, Protein Purification, and Western Blot
For prokaryotic expression, the constructed plasmid was transformed into JM109 Escherichia coli cells. Prokaryotic expression was induced for 16-20 h at 18 • C after adding IPTG. Cytolysis was performed using repeated freeze-thaw and lysozyme. The soluble His-tag fusion proteins were purified using a Ni-NTA column.
Western blotting was performed using a Western Blot Kit (Mouse) with a PVDF membrane (Order No. C600393, Sangon, Shanghai, China) following the instructions of the manufacturer. Proteins were transferred from gel to PVDF membrane using a semidry method. The first antibody used was an anti-His mouse monoclonal antibody, and the second antibody used was HRPconjugated goat anti-mouse IgG. ECL luminescence reagent was used as the substrate of HRP, and X-film was used in the darkroom to capture the luminescence.

Maize Leaf Protoplasts Extraction, Plasmids Transformation, and GFP Observation
Maize protoplast isolation and transformation were carried out as previously described in A. thaliana (Zhai et al., 2009) with some modifications. In brief, B73 was cultivated under darkness for approximately 10 days until the two-leaf stage. The second leaves were sliced vertical to the leaf vein into 1mm thicknesses and used for protoplast extraction. Plasmid DNA was transformed into protoplasts using the PEG-Ca 2+ method. Between 12 and 24 h after transformation, protoplasts were scanned for GFP signals using an inverted fluorescence microscope (Nikon, Japan).

Arabidopsis thaliana Transformation and YFP Observation
Constructed plasmids were transformed into Agrobacterium tumefaciens strain GV3101. Transformation of Arabidopsis was done following the vacuum infiltration method (Bechtold et al., 1993). Transformed T1 seedlings were selected with 0.1% glufosinate ammonium (Basta), followed by PCR and RT-PCR detection (Supplementary Table 7, primer#2). AtACT2 [actin 2 (A. thaliana); TAIR accession At3g18780] was used as internal control (primer #3). Leaves of positive lines were subjected to protoplasts extraction and YFP observation following a similar procedure as "maize leaf protoplasts extraction and GFP observation."

Phenotypic Observation of a Colorless Maize Mutant
A novel colorless mutant was screened from an EMS-induced mutant library for the maize B73 inbred line. Throughout the life cycle, this mutant showed reduced level of pigmentation, especially in sheath, aerial roots, cobs, etc., which are tissues expected to contain notably abundant pigment ( Figure 1A).
The mutant was crossed with wild-type B73. Compared to B73 with red/purple leaf sheaths and high anthocyanin content, the mutant line showed green leaf sheaths with almost zero anthocyanin content, and the F1 hybrids showed intermediate color type and anthocyanin content (Figures 1B,C). Investigation of sheath color in the F2 population showed a 3:1 segregation ratio of colored individuals to colorless individuals. All these results indicate that the colorless phenotype is controlled by a single recessive allele.

Mutant Mapping
To identify the causal mutation, we used MutMap pipeline and an F2 population derived from B73 and the colorless mutant (Figure 2A). Over 200 F2 offspring showing mutanttype phenotype were mixed together as a MP. Whole genome sequencing was implemented on two parents and MP, generating approximately 232 million clean reads for B73, 333 million clean reads for the mutant, and 1,180 million clean reads for MP (Supplementary Table 1). Over 95% of these reads were mapped to maize reference genome (B73_RefGen_v4), and the average depth for the two parents was 14.2x and 19.4x, while 61.7x for MP. Cov_ratio_5x (position covered by at least five reads) for three samples were all over 96%. The high coverage and depth make it reliable for the detection of variants and calculation of allele frequencies.
In total, 176,899 variants (137,925 SNPs and 38,974 indels) were identified for the three samples (Supplementary Figure 1A). Over 90% of the variants were shared by all three samples and were treated as natural variants during cultivar and multiplication. After comparing the two parents, 5,654 variants (5,364 SNPs and 290 indels) that were unique and homozygous in the mutant were filtered and further used as markers for gene mapping. Of all these variants, there were 2,730 G-> A and 2,465 C-> T substitutions, together occupying 91.9% (Supplementary Figure 1B).
Variant-index was calculated as the allele frequency of the alternative alleles in MP for these markers. Subsequent smoothing showed that there is only one peak across the whole genome and located at Chr4:150 M-250 M (Figure 2B). Summit of the peak is Chr4:195678521. In the close up look of this candidate region, there is a G > A mutation with missense effect at Chr4:196895508, corresponding to g.2306G > A | p.E183K mutation in gene Zm00001d052673 (chalcone synthase C2, ZmC2) ( Figure 2C and Supplementary Table 2). To validate this mutant site, the surrounding region was cloned and subjected to Sanger sequencing. The Sanger sequencing results were consistent with the results of the whole genome sequencing, as the genotype for B73 or B73-type F2 progeny is G/G, that of the mutant or mutant-type F2 progeny is A/A, that of F1 or F1-type F2 progeny is G/A (Figure 2D).

Transgene Validation of ZmC2-WT and ZmC2-E183K in Arabidopsis
There is a single homolog for ZmC2 in A. thaliana, that is At5g13930 (Chalcone and stilbene synthase family protein, AtCHS), and the alignment identity for them is 82.7% (Supplementary Figure 2). The available transparent testa 4 (tt4) mutant contains a T-DNA insertion in the second exon of AtCHS ( Figure 3A). To explore the effect of E183K mutation on the activity of ZmC2. The two types of ZmC2, wild-type (-WT) and mutant-type (-E183K) were transformed into tt4 mutant separately (Figure 3B). For each allele, more than two independent lines were gained (Figures 3C,D). In comparison, Col-0 could accumulate strong red pigmentation in its stem and seeds, while tt4 mutant shows no pigmentation in these tissues. Transformation with ZmC2-WT could recover the mutant phenotype to distinct levels, while ZmC2-E183K cannot. These findings indicate that the E183K mutation in ZmC2 leads to its dysfunction and maize colorless.

Transcriptomic and Metabolic Changes Caused by ZmC2-E183K Mutation in Maize Sheath
To investigate the effect of ZmC2-E183K on flavonoid biosynthesis and other related mechanisms, RNA-seq (two biological replicates) and widely targeted metabolome (three biological replicates) were performed on the first sheaths of B73type F2 progeny (BP) and those of mutant-type F2 progeny (MP).
RNA-seq generated 26.4-31.2 million pairs of reads for each sample, over 80% of the reads could be mapped to the reference genome in proper pairs (Supplementary Table 3). A total of 45,855 genes were detected in at least one sample. Seventeen genes were determined as DEGs (cut-off: | log2FoldChange| ≥ 1 and q-value ≤ 0.05), where 12 genes were up-regulated, and 5 were down-regulated. Zm00001d017186 (Shikimate O-hydroxycinnamoyltransferase) is the only DEG with slight down-regulation related to flavonoid biosynthesis ( Figure 4A).
In flavonoid biosynthesis (kegg-zma00941), it is obvious that most of the detected metabolites in the downstream of ZmC2 involved reaction (EC: 2.3.1.74) showed significant decrease ( Figure 4D). Also, detected metabolites in the downstream pathways, including anthocyanin biosynthesis (kegg-zma00942) and flavone and flavonol biosynthesis (keg-zma00944) showed significant decrease (Supplementary  Figures 3, 4). In phenylpropanoid biosynthesis, the upstream of flavonoid biosynthesis, several phenolic acids were upregulated. Of them, p-coumaric acid, caffeic acid, ferulic acid, and scopoletin are in the same path to generate scopolin, indicating this may be the major shunt (Supplementary Figure 5). In summary, ZmC2-E183K mutation caused blocking of downstream pathways and the shunt of metabolites in the upstream pathways without affecting gene expression.

Conservation Analysis of Chalcone Synthase C2 and the E183 Site
Multiple alignment was performed on ZmC2 and its 1581 homologs. Of them, 441 proteins belong to Bacteria (Kingdom), 1 belongs to Metazoa (Kingdom), 167 belong to Plantae (Kingdom)-Magnoliopsida (Class)-Poales (Order), and the rest belong to the other orders or classes in Plantae (Kingdom). Principal components analysis ( Figure 5A) and phylogenetic analysis ( Figure 5B) showed that these proteins could be grouped into three clades (A, B, and C). Multiple alignment was further performed on ZmC2 and proteins within each clade ( Figure 5C). Proteins in clade C were all from Bacteria (Kingdom) and very different from ZmC2 at E183 site and surrounding region, while proteins from Plantae (Kingdom) were all in clade A/B and they were highly conserved at ZmC2-E183 site and surrounding region. ZmC2 is in a cluster of 49 proteins belonging to Poales (Order) (Figure 5B). The most similar protein for ZmC2 was ZmWHP1 [chalcone synthase WHP1 (Z. mays); NCBI accession: NP_001149022.1] with 97.25% identity. These two maize chalcone synthase (CHS) was in the same branch with CHS from Setaria italica, Panicum hallii, and seven CHS from Sorghum bicolor. Furthermore, we explored the conservation of chalcone synthase within species but among different varieties. Publicly available variation information was used for ZmC2, ZmWHP1, OsCHS1, and AtCHS ( Figure 5D). There were few missense variations exist in these proteins, and none of them was at ZmC2-E183 site. These findings suggested that the corresponding ZmC2-E183 site in chalcone synthase have been kept conserved during evolution in Plantae (Kingdom) and during cultivar or domestication in different varieties within species.
Given that the highly identical ZmWHP1 cannot complement the dysfunction of ZmC2, we explored the expression pattern of both genes. From the RNA-seq results on the first sheaths of MP_vs_BP, we found that ZmC2 had much higher (about 100 times) transcription level than ZmWHP1. Besides, ZmC2-E183K mutation did not cause a differential regulation of ZmWHP1 ( Figure 5E). Moreover, we used a publicly available RNA-seq data on 23 tissues spanning the vegetable and reproductive stages of maize development (Walley et al., 2016). In most tissues, the expression of ZmC2 was much higher than ZmWHP1 ( Figure 5F). All these findings showed that ZmC2 and ZmWHP1 are independently regulated, and ZmC2 plays the major role in flavonoid biosynthesis.

E183K Mutation Causes Protein Aggregation of ZmC2
To evaluate the effect of the E183K mutation on protein structure or enzyme activity, ZmC2-WT and ZmC2-E183K were fused with His-tag to perform prokaryotic expression followed by protein extraction and purification. Western blot analysis showed that ZmC2-WT proteins exist in total protein, precipitated and soluble component after lysis and they could bind to and eventually be eluted from the Ni-NTA column. In contrast, ZmC2-E183K proteins could not be purified because they were almost completely precipitated/aggregated and barely existed in the supernatant after lysis (Figure 6A).
To further explore if ZmC2-E183K also aggregates in plant cells, the subcellular localization strategy was taken. The GFP (green fluorescent protein)-tag was fused at the C-terminus of ZmC2-WT and ZmC2-E183K, and the reconstructed plasmids were transferred into maize leaf protoplasts for transient expression. ZmC2-WT proteins existed as evenly distributed in the cytoplasm without aggregation (type I) or with some FIGURE 4 | Transcriptomic and metabolic changes in maize first sheath due to ZmC2-E183K mutation. RNA-seq and widely targeted metabolome were performed on the first sheaths of B73-type F2 progeny (BP) and those of mutant-type F2 progeny (MP). Seedlings were grown to three-leaf stage in a greenhouse. For each type of sample, multiple individuals were pooled together. RNA-seq was performed with two biological replicates, and metabolome with three biological replicates.   (Walley et al., 2016). The expression information is achieved from Expression Atlas (https://www.ebi.ac.uk/gxa/home; Papatheodorou et al., 2020). NA means data not available.
Another convincing evidence for the aggregation of ZmC2-E183K was gained through Arabidopsis transformation. ZmC2-WT and ZmC2-E183K were fused with YFP (yellow fluorescent protein)-tag at their C-terminus and stably expressed in Arabidopsis tt4 mutant. ZmC2-WT fused proteins were evenly distributed in the cytoplasm (type I) in independent transgenic lines, while ZmC2-E183K significantly aggregated into some points (type III) (Figure 6C).
Similar phenomena were observed for ZmWHP1, the most identical protein of ZmC2. In prokaryotic expression, ZmWHP1-WT proteins were detected as soluble and precipitated, while ZmWHP1-E183K proteins completely aggregated/precipitated ( Figure 6A); when transiently expressed in maize leaf protoplasts, ZmWHP1-WT showed evenly distributed or with some aggregation, while ZmWHP1-E183K totally aggregated ( Figure 6B). Besides, additional point mutations were deployed on E183 of ZmC2, including ZmC2-E183R and ZmC2-E183D. E and D are both acidic amino acids with similar structures, while R and K are both basic amino acids. All mutant-type proteins showed almost completely aggregated in prokaryotic expression and transient expression in maize protoplasts (Figures 6A,B). As E and D are both acidic amino acids and their side-chain are -(CH 2 ) 2 -COOH and -CH 2 -COOH respectively, the E183D mutation leading to protein aggregation may indicate a restrictive structural requirement, i.e., chain-length of the acidic side chain. In summary, E183K and other mutations on E183 could cause protein aggregation of chalcone synthase, leading to dysfunction.

A Proposed Aggregation Model for ZmC2-E183K
Structure modeling of ZmC2 indicated the wild-type proteins exist as homodimers, and the side chain of E183 has polar contacts with A21, K179 in the same molecule and R12, N184 in the paired molecule. Of them, A21 and R12 may be important for the conformation of N-terminal helix ( Figure 7A). An aggregation model was proposed as following (Figures 7B,C). In wild-type chalcone synthase, the N-terminal helix could bend to cover the paired molecule, generating hydrophilic surface. Thus, the homodimer is stable to exist in cytoplasm or lysis. When mutation happens on E183, related polar contacts were broken; the N-terminal helix fails to cover the paired molecule and buried hydrophobic surface exposed. Thus, the dimer is not stable any more. The hydrophobic interactions drive proteins to aggregate and protect the hydrophobic surface inside.
According to this proposed model, the N-terminal helix is essential for generating stable dimer. We performed artifact damage on the N-terminal helix (ZmC2-A2_P20del, deletion mutation from A2 to P20). ZmC2-A2_P20del proteins were almost totally precipitated/aggregated in prokaryotic expression and transient expression in maize protoplasts ( Figure 7D). As E183 has polar contacts with R12, we performed a complementary mutation (R12E) for ZmC2-E183R and achieved ZmC2-E183R_R12E with double mutations. From both prokaryotic expression and transient expression in maize protoplasts, we observed that the complementary mutation could at least partially rescue the aggregation of ZmC2-E183R ( Figure 7E). All these indicate the importance of N-terminal helix and E183-R12 interaction for the stability of ZmC2 proteins.

Consistence of Phenotype and the Protein Aggregation Model
The protein aggregation model is consistent with the observed phenotype (Figure 1). In B73 line or B73-type F2 progenies with rich color, there are only ZmC2-WT proteins, which are dimers and functional. In the mutant line or mutant-type F2 progenies which are colorless, there are only ZmC2-E183K proteins, which are aggregated and dysfunctional. According to "co-co assembly" mechanism (Bertolini et al., 2021), ZmC2-WT genes would generate ZmC2-WT homodimers directly, while ZmC2-E183K genes generate ZmC2-E183K aggregates directly, thus ZmC2-WT/ZmC2-E183K proteins (unknown status) only occupy a tiny proportion or even do not exist. In F1 or F1-type F2 progenies, there are half ZmC2-WT homodimers (functional) and half ZmC2-E183K aggregates (dysfunctional), which is completely consistent with the observed mid-colored phenotype. However, there may be other mechanisms for the aggregation instead of our proposed hydrophilic and/or hydrophobic model. Furthermore, in vivo experiments are in need to determine ZmC2-E183R, ZmC2-E183D, and ZmC2-E183R_R12E proteins are functional or not.
Protein aggregation raises strong interest in animal fields due to its involvement in human diseases (Clancy, 2008;Boeynaems et al., 2018;Guo et al., 2018). Previous knowledge in the plant field was limited to that stressful conditions could induce the aggregation of misfolded proteins (Nakajima and Suzuki, 2013;McLoughlin et al., 2019). Our work is the first to reveal the effect of genetic variations on protein aggregation in the plant field. However, additional characteristics of the mutation and protein aggregation are still under investigation.

Comparison of C2-E183K and C2-Idf Mutants
In maize, previous studies on chalcone synthase focused on the C2-Idf mutant (Brink and Greenblatt, 1954). C2-Idf is a dominant mutant; homozygous C2-Idf plants and seeds do not show pigmentation, while heterozygotes show significantly reduced pigmentation (Dooner, 1983;Della Vedova et al., 2005). At the RNA level, ZmC2 transcripts were significantly reduced to ∼20% in heterozygotes or nearly zero in homozygotes compared to the wild-type (Franken et al., 1991;Della Vedova et al., 2005). Della Vedova et al. (2005) demonstrated that there are three copies of ZmC2 in C2-Idf mutant instead of a single copy, one of them may generate natural anti-sense transcript, leading to the dominant inhibitory effect through RNA silencing. Eloy et al. (2017) reported that C2-Idf mutants display strongly reduced levels of apigenin-and tricin-related flavonoids, resulting in impeded incorporation of tricin into the lignin polymer. In this study, we identified the E183K mutation as a novel loss-of-function allele for ZmC2. Unlike C2-Idf, C2-E183K is a recessive mutant, heterozygotes show intermediate pigmentation level (Figure 1B). E183K has no effect on the transcript expression of ZmC2 and other genes in related biosynthesis pathways ( Figure 4D and Supplementary Figures 3-5). Instead, it causes protein aggregation of ZmC2, leading to dysfunction. Our metabolome results showed that in the first sheaths of C2-E183K mutants, the downstream flavonoids biosynthesis was blocked, while the metabolites in the biosynthesis of scopolin were up-regulated (Supplementary Figure 5). This novel loss-of-function mutant provides a new chance to investigate how CHS affects these biosynthesis pathways. As flavonoids are involved in multiple physiological processes, it is also of interest to explore the performance of the mutant under biotic and abiotic stresses in the future. Furthermore, a combination of the colorless mutant as acceptor material, ZmC2 as a marker gene and the pollen mediated transformation may provide a simple and effective way for maize transgenic transformation and screening ( Figure 7F).

Advantage of Combining EMS Mutagenesis, MutMap Strategy, and Whole-Genome Sequencing
In this study, combining EMS mutagenesis, MutMap strategy and whole-genome sequencing, we identified the causal gene and mutation directly without further fine mapping. EMS mutagenesis has the advantage that most of the time, EMS induces C to T changes, resulting in C > T or G > A substitutions. The small size of SNPs or indels makes them easily detectable by current high-throughput sequencing technology. The other advantage of EMS mutagenesis is that multiple mutations are jointly and randomly distributed across the whole genome, which makes it possible to cover most genes and to establish as markers. A significant advantage of MutMap is the improvement of mapping population construction. In other methods, the mapping population is constructed by crossing two lines with distinct ecotypes. However, the high density of variations between the ecotypes makes it challenging to pinpoint the causal mutation. In MutMap, the mutant is crossed to the wild-type with the same ecotype, and variants induced by mutagenesis are taken as markers. Although density is relatively low, the markers are still sufficient for association and linkage analysis. Superior to RNA-Seq or exon-capture sequencing, wholegenome sequencing has the power to cover those variants exist in intergenic or regulatory regions. In our study, the colorless mutant (B73 ecotype) was crossed with B73. Only 5654 variants homozygous and unique in the mutant were taken as markers, and over 90% of them are C > T or G > A substitution (Figure 2 and Supplementary Figure 1). The EMS mutagenesis, MutMap strategy and whole-genome sequencing fully utilize the advantages of each other, and their combinations will accelerate and facilitate the process for forward genetics.

DATA AVAILABILITY STATEMENT
The raw data for whole genome sequencing are available at NCBI-SRA (PRJNA593329). Raw RNA-seq data are available at NCBI-SRA (PRJNA593349). Other relevant data are within the article and Supplementary Material. The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

AUTHOR CONTRIBUTIONS
YY, HD, and HL planned and designed the research. HD, HL, YX, SS, SL, XS, HKL, NJ, XW, and ZZ performed the experiments, analyzed the data, conducted the field work, etc. HD, HL, and YY wrote the manuscript. All authors contributed to the article and approved the submitted version.