Combining association with linkage mapping to dissect the phenolamides metabolism of the maize kernel

Phenolamides are important secondary metabolites in plant species. They play important roles in plant defense responses against pathogens and insect herbivores, protection against UV irradiation and floral induction and development. However, the accumulation and variation in phenolamides content in diverse maize lines and the genes responsible for their biosynthesis remain largely unknown. Here, we combined genetic mapping, protein regulatory network and bioinformatics analysis to further enhance the understanding of maize phenolamides biosynthesis. Sixteen phenolamides were identified in multiple populations, and they were all significantly correlated with one or several of 19 phenotypic traits. By linkage mapping, 58, 58, 39 and 67 QTLs, with an average of 3.9, 3.6, 3.6 and 4.2 QTLs for each trait were mapped in BBE1, BBE2, ZYE1 and ZYE2, explaining 9.47%, 10.78%, 9.51% and 11.40% phenotypic variation for each QTL on average, respectively. By GWAS, 39 and 36 significant loci were detected in two different environments, 3.3 and 2.8 loci for each trait, explaining 10.00% and 9.97% phenotypic variation for each locus on average, respectively. Totally, 58 unique candidate genes were identified, 31% of them encoding enzymes involved in amine and derivative metabolic processes. Gene Ontology term analysis of the 358 protein-protein interrelated genes revealed significant enrichment in terms relating to cellular nitrogen metabolism, amine metabolism. GRMZM2G066142, GRMZM2G066049, GRMZM2G165390 and GRMZM2G159587 were further validated involvement in phenolamides biosynthesis. Our results provide insights into the genetic basis of phenolamides biosynthesis in maize kernels, understanding phenolamides biosynthesis and its nutritional content and ability to withstand biotic and abiotic stress.


Introduction
Maize (Zea mays L.) is the world's most widely grown crop for staple foods, animal feed, biofuel and other industrial raw materials.By 2050, it is estimated that the human population will reach 9 billion (FAO, 2009).Therefore, increasing maize yield while providing additional nutritional value is essential to meet the growing nutritional needs of a large global population (Casas et al., 2014;Jin et al., 2017).
In recent years, the rapid development of metabolomics and the use of different populations for genetic mapping have provided us an unprecedented insight into the regulation of the abundance of multiple chemical components in plants (Wen et al., 2016).Based on the spatiotemporal distribution characteristics of PAs with the natural genetic diversity of plants, several new PA biosynthetases were successfully identified in rice and maize (Wen et al., 2014;Dong et al., 2015).Two spermidine N-hydroxycinnamoyl transferases, LOC_Os12g27220 and LOC_Os12g27254, responsible for the biosynthesis of spermidine-containing PAs in rice have been recently reported by GWAS (Dong et al., 2015).Several additional rice genes were associated with the condensation of putrescine and agmatime with hydroxycinnamoyl-CoA substrates in GWAS experiments (Chen et al., 2014b).
Previously, comprehensive metabolic profiling using liquid chromatography tandem mass spectrometry (LC-MS/MS) was carried out in mature maize kernels from association panel and RIL populations (Wen et al., 2014(Wen et al., , 2015(Wen et al., , 2016)).Combined linkage analysis and GWAS were carried out on the resultant datasets, which led to the identification of a variety of loci involved in multiple biosynthetic pathways (Wen et al., 2014(Wen et al., , 2016)).Here, we combined genetic mapping, metabolite profiling from these previous studies and protein regulatory network analysis to further enhance the understanding of the maize phenolamides pathway.Correlation between phenolamides and agronomic performance (Yang et al., 2014), GWAS, linkage mapping, protein regulatory network, and bioinformatics analysis of candidate genes were conducted in the current study.These results provide new insights for understanding phenolamides biosynthesis and its nutritional content and ability to withstand biotic and abiotic stress.

Genetic materials and field trials
The metabolic data used in this study were obtained from genetic materials, including an association mapping panel with 368 lines (referred to as AMP hereafter) for GWAS and two recombinant inbred line populations (RILs; BB, F 9 RIL B73/ By804, and ZY, F 10 RIL Zong3/Yu87-1) for linkage analysis as described previously (Wen et al., 2014;Pan et al., 2016), BB and ZY RIL populations was derived from a single F 1 plant and was developed through self-pollination and single seed descent for nine and ten generations, respectively.Maize kernels of AMP were planted in Yunnan (Kunming, E 102°30′, N 24°25′, referred to as AMPE1) and Chongqing (E 106°50′, N 29°25′, referred to as AMPE2) in March 2011, the 197 BB RIL population was planted in Hainan (Sanya; E 109°519, N 18°259) in October 2010 (referred to as BBE1) and Henan (Zhengzhou; E 113°429, N 34°44′) in June 2011 (referred to as BBE2), and the 197 lines of the ZY RIL population were planted in Yunnan (Kunming; E 102°309, N 24°2 59; referred to as ZYE1) and Henan (Zhengzhou; E 113°429, N 34°4 49; referred to as ZYE2) in March and June 2011, respectively.An incomplete block design was used for the field trials of all the inbred lines, including AMP and two RIL populations, and a single replicate was conducted in each environment.All lines were selfpollinated, and five ears were harvested from each plot at maturity and air-dried and shelled.For each line, ears from five plants were harvested at the same maturity and bulked.Twelve well-grown kernels were randomly selected from the harvested ears and bulked for grinding (Wen et al., 2014;Deng et al., 2020).

Metabolic data, genotype and expression data
Samples from each line of AMP and RIL populations were extracted before analysis using an LC-ESI-MS/MS system, more details information were provided in previous study (Wen et al., 2014).The genotype data was used in present study obtained from the Maizego database (http://www.maizego.org/Resources.html)consisted of 1.25 million SNP (B73_RefGen_v2) that covered the whole maize genome, with a minimum allele frequency ≥ 0.05 (Liu et al., 2017).The two RIL populations were also genotyped by the Illumina MaizeSNP50 BeadChip, and high-density linkage maps were constructed with 2496 and 3071 unique bins for BB and ZY, respectively (Pan et al., 2016;Xiao et al., 2016).The expression data of 28 769 genes were obtained by RNA sequencing from kernel of five immature ears of 368 maize inbred lines were collected 15 days after self-pollination for RNA extraction (Fu et al., 2013;Li et al., 2013).

Genetic mapping
A genome-wide association study (GWAS) was conducted for maize kernel phenolamides.To test the statistical associations between genotype and phenotype, a mixed linear model was used to account for the population structure and relative kinship (Li et al., 2013).Considering the maker number in the present study is 1.25 million, many of them should be in linkage disequilibrium.The effective number of independent markers (N) was calculated using the GEC software tool (Li et al., 2012).Suggestive (1/N) P value thresholds were set to control the genome-wide type 1 error rate.The suggestive value was 2.04E-06 for the whole population and was used as the cutoff (Deng et al., 2017).The P value of each SNP was calculated using Tassel3.0.For all traits, the lead SNP (SNP with the lowest p value) at an associated locus and its corresponding candidate genes in or near (within 100 kb up-and downstream of the lead SNP) known genes were reported.If the associated SNPs were not in or near an annotated phenolamides metabolism gene, the closest of the lead SNP candidate gene was considered the most likely candidate gene (Deng et al., 2017).The physical locations of the SNPs were based on B73 RefGen_v2.
Linkage mapping was conducted using composite interval mapping (CIM) implemented in Windows QTL Cartographer V2.5 (Zeng and Kao, 1999;Wang et al., 2010) for all phenolamides measured in the maize kernels of the two RIL populations.The methods followed the Windows QTL Cartographer V2.5 user manual.Zmap (Model 6) with a 10 cM window and a walking speed of 0.5 cM was used.For each trait, a uniform threshold for significant QTLs was determined by 500 permutations (p = 0.05).The parameter was set as default.A 2.0 LOD-drop confidence interval was used for each QTL as described.
Expression mapping (eQTL) analysis used the same method described above for GWAS.The association analysis between the genome-wide SNPs and the identified candidate gene expression level was performed.

Data analysis
The line mean-based broad-sense heritability (H 2 ) for each trait was calculated as H 2 =s2 g/(s2 g+s2 e/n), where s2 g is genetic variance, s2 e is error variance, and n is the number of environments.The estimates of s2 g and s2 e were obtained by the mixed linear model, treating genotype and environment as random effects (R Core Team, 2012).For each metabolite, the BLUP value for each line across environments was used to reduce environmental noise based on the mixed linear model implemented in the R package 'LME4' (R Core Team, 2012).The Pearson correlation between different phenolamides and between phenolamides and other agronomic traits of this association panel (Yang et al., 2014) were calculated in subpopulations using the R function COR.TEST (www.r-project.org).Cytoscape v3.9.1 (http:// www.cytoscape.org/download.php) was used for visualization.

Natural variations in phenolamides in maize kernels
Using high-throughput liquid chromatography tandem mass spectrometry (LC-MS/MS), we assessed the variation in phenolamides content in dry matured maize kernels, which included two recombinant inbred line populations (RIL), B73/ BY804 (BB) and ZONG3/YU87-1 (ZY), and an association panel (AMP) harvested from multiple environments (simply called AMPE1, AMPE2 for AMP, BBE1, BBE2 for BB RIL population, and ZYE1 and ZYE2 for ZY RIL population, which are described in detail in "Materials and Methods").In a previous study, 748 and 735 metabolites were detected in AMPE1 and AMPE2, respectively (Wen et al., 2014), and the chemical structures of 184 metabolites were identified or annotated in BB and ZY RIL populations (Wen et al., 2015).In the current study, we extract the profile of phenolamides from these previous datasets, which includes 16 phenolamides.Among them,16,16,15,16,15,and 16 phenolamides were found in AMPE1, AMPE2, BBE1, BBE2, ZYE1, and ZYE2, respectively, and 15 phenolamides were detected in all six environments (Table 1).

Correlation analysis with other traits
Phenolamides are an important class of metabolites in maize kernels.To explore how the kernel phenolamides are coordinated with other phenotypic traits.Pearson correlation coefficients were calculated using the COR.test in R to detect the statistical correlations between kernel phenolamides content and 23 other phenotypic traits previously measured in the same association panel.These 23 phenotypic traits included the morphological attributes plant height (PH), ear height (EH), ear leaf width and length (LW, LL), tassel main axis length (TML), tassel branch number (TBN), and leaf number above ear (ULN); yield-related traits ear length and diameter (EL, EW), cob diameter (CD), kernel number per row (RKN), row number per ear (ERN), hundred kernel weight (HW), cob weight (CW), kernel width (KW), kernel length (KL) kernel thickness (KT); maturity traits days to heading, anthesis, and silking (HD, PS, ST); disease resistance maize rough dwarf virus (MRDV) and sugarcane mosaic virus (SCMV); and cob color (CC).

Linkage mapping for phenolamides levels in the two RIL populations
Two RIL populations (BB and ZY) were genotyped with a highdensity SNP array (Pan et al., 2016) and were used for QTL mapping for phenolamides.For the BB population, 58 and 58 QTLs were mapped for 15 and 16 traits in BBE1 and BBE2, respectively, with an average of 3.9 and 3.6 QTLs per trait, respectively (Table 2, Supplementary Table 6).Only nine QTLs were detected for the six common phenolamides in BBE1 and BBE2 (Supplementary Table 6).Each QTL explained 4.79%-20.20%(BBE1) and 2.75%-76.38%(BBE2) of the phenotypic variation, with averages of 9.47% and 10.78% (Table 1, Supplementary Table 6), respectively.Forty-one QTLs that explained greater than 10% of the phenotypic variation (R 2 = 10.41%-76.38%)were identified in two experiments.
For the ZY RIL population, 39 and 67 QTLs were detected for 11 and 16 phenolamides traits in ZYE1 and ZYE2, respectively, with averages of 3.6 and 4.2 QTLs, respectively (Table 2, Supplementary Table 6).Only four QTLs were detected for the three common phenolamides in ZYE1 and ZYE2 (Supplementary Table 6).Each QTL explained 6.59%-17.54%(ZYE1) and 5.73%-29.98%(ZYE2) of the phenotypic variation, with averages of 9.51% and 11.40% (Table 1, Supplementary Table 6), respectively.Forty-three QTLs that explained greater than 10% of the phenotypic variation (R 2 = 10.02%-29.98%)were identified in two experiments (Supplementary Table 6).For the same trait, only 15 QTLs were detected in more than one population, implying that different low-frequency QTLs existed in different genetic backgrounds.We analyzed the resolution of QTL mapping, and the results showed a 19.82% (44/222) QTL interval less than 1 Mb and a 67.57% (150/222) QTL interval less than 5 Mb (Supplementary Figure 2, Supplementary Table 6).There are many QTLs explaining more than 10%, but only 2 and 6 more than 20% (for BB and ZY respectively).

GWAS for phenolamides levels
GWAS was performed using an association panel including 368 maize diverse inbred lines (Wen et al., 2014) and 1.25 million highquality single nucleotide polymorphisms (SNPs) with minor allele frequency (MAF) >0.05 (Fu et al., 2013;Liu et al., 2017).A total of 73 loci were identified by GWAS at a significance level of p ≤ 2.04 × 10 −6 in two experiments (AMPE1, AMPE2) (Table 2).Briefly, 38 and 35 loci were identified for 12 phenolamides in AMPE1 and 13 phenolamides in AMPE2, with an average of 3.2 and 2.7 loci for each trait, respectively, and only five of these loci were conserved for the same phenolamides in both experiments.Each locus could explain phenotypic variation (R 2 ) ranging from 6.97% to 23.10% and 7.10% to 17.71%, with means of 10.07% and 10.01%, respectively.Twenty-four loci with effects greater than 10% were identified in two environments (Supplementary Table 7).Detailed information on the GWAS results, including the p value and R 2 of each locus, physical position and minor allele frequency (MAF) of the lead SNP, annotation, eQTL, and correlation between phenotype and expression of the most likely candidate gene, is provided in Supplementary Table 7.The colocalization of QTLs and/or significant loci for the same trait identified across different environments or different populations is summarized.In total, 10 trait-loci combinations that are 8 QTLs corresponding to five traits were detected in more than one environment or population (AMP, BBRIL, ZYRIL) in this study (Figure 3, Supplementary Table 6, 7).Detailed analyses of the candidate genes underlying these loci will provide useful further information concerning the phenolamides biosynthetic pathway.

Prediction and annotation of candidate genes
Subsequently, limited overlaps were found between the loci (10/ 73) identified by GWAS and the QTLs identified by linkage mapping for the same trait in the present study.A total of 58 unique candidate genes corresponding to 73 trait-locus associations identified in two experiments were annotated, and other potential candidate genes within 200 kb (100 kb upstream and downstream of the lead SNPs) of the 73 loci are also listed in Supplementary Table 7.Among these candidate genes, 46 genes that may affect phenolamides were found in different developmental stages of maize kernel (Figure 4).Based on the current database, among the 58 candidate genes, those encoding enzymes involved in amine and derivative metabolic processes accounted for 31%, the enzymes involved in other biological processes accounted for 19%, transcription factors accounted for 10%, and the unknown functions accounted for 17% (Supplementary Figure 3).
Expression QTLs (eQTL, n = 368) were identified for a plurality of these candidate genes (55.2%, or 32/58) using the previous RNA-sequencing data of immature kernels (Fu et al., 2013).Significant correlations (p < 0.05, n = 335-339) between the expression level of the candidate genes with eQTLs identified and the phenotypic variation of the corresponding phenolamides were Correlation coefficient based network of all phenolamides in each experiment for AMP and both BB and ZY populations.|r| ≥ 0.3 for correlation coefficient between two phenolamides was used to construct the network.
found in 14 cases (24.1%) (Supplementary Table 7), which suggests that some of these loci affect phenotypic variation via transcriptional regulation.

Protein-protein interaction network analysis
Proteins usually regulate the growth and development of plants in complex interrelated networks.To understand the metabolism of phenolamide-related traits in maize, protein-protein interaction networks were constructed of 46 highly expressed candidate genes through the STRING database (https://string-db.org/).Then, 358 genes were detected that were associated with 43 candidate genes (Supplementary Figure 4, Supplementary Table 8).Gene Ontology (GO) term analysis of the protein-protein interrelated genes revealed significant enrichment in terms relating to cellular nitrogen metabolism, amine metabolism, amino acid and derivative metabolism, organic acids and other processes (Supplementary Figure 5).

Discussion
Phenolamides are important secondary metabolites in plant species.They are important for defense responses against pathogens, insect herbivores, sulfur starvation, salt stress, protection against UV irradiation and floral induction and development (Demkura et al., 2010;Onkokesung et al., 2012;Gaquerel et al., 2013;Figon et al., 2021;Fang et al., 2022;Xu et al., 2022).Overexpression of endogenous tyramine hydroxycinnamoyltransferase increased its resistance to Pseudomonas syringae in tomato (Campos et al., 2014), the ectopic expression of the AtACT in torenia plants rendered them more resistant to Botrytis cinerea (Muroi et al., 2012).Meanwhile, the accumulation of p-coumaroylagmatine, p-coumaroylputrescine, and caffeoylputrescine reduced spore germination of P. infestans on the potato leaf surface (Dobritzsch et al., 2016).Through jasmonate-mediated activation of defense-related genes and accumulation of aromatic phenolamides in Qingke increased the resistance to powdery mildew (Xu et al., 2022).In this study, we found that the n0183 (N-(coumaroyl-O-hexoside)spermidine) showed a significant positive correlation (p = 0.0217; R = 0.131) with sugarcane mosaic virus, and this result showed that n0183 might increase resistance to sugarcane mosaic virus.In addition, the levels of n0130, n1377, n1376, n1394 and n0979 exhibited significant positive correlations with HD, PS and ST (Figure 2, Supplementary Table 5), suggesting that increasing their contents might increase days to heading, anthesis, and silking.
Linkage analysis is a classical method for dissecting the genetic basis that underlies quantitative traits.Fine mapping based on the primary mapping results remains a conventional strategy.With the rapid development of sequencing technology, we could obtain an increasing number of molecular markers.However, due to the limited combinations and the narrow genetic background of the parents, linkage mapping is usually not very effective for complex quantitative traits.GWAS is characterized by a high density of SNPs and a large population, which can effectively solve the problem of low diversity and detection rate, but a large number of false-positive results will confuse the truly relevant sites and reduce the detection ability (Zhang et al., 2022).Currently, as an efficient approach, the combination of the GWAS approach and linkage analysis can help  us quickly identify candidate genes.To date, only few studies have focused on the genetic architecture of maize kernel PAs (Wen et al., 2014).
In the present study, we focused on the phenolamides that were found in mature kernels harvested from an association panel and two RIL populations grown across multiple environments.GWAS and linkage mapping were used to dissect the genetic basis of PA content in mature maize kernels from the aforementioned populations.Combining linkage mapping and GWAS for 16 PA traits revealed 58, 58, 39 and 67 QTLs and 39 and 36 significant loci, respectively.Only a few QTLs (15/222) could be identified in multiple RIL populations, and only 10 trait-loci combinations that were 8 QTLs corresponding to five traits were detected in more than one environment or population (Supplementary Tables 6, 7).Similar results have also been reported in other metabolite studies in maize (Wen et al., 2014;Deng et al., 2017Deng et al., , 2015Deng et al., , 2016)).These results implied that QTLs affecting PA composition were genetic background dependent.In this study, 73 loci were detected in AMPE1 and AMPE2 with 1.25 million SNPs, and only 23/73 loci colocalized with 1.06 million high-quality SNPs identified in the corresponding environment in a previous report (Wen et al., 2014).Therefore, a high-density map increased the QTL detection power and resolution (Liu et al., 2017).A protein-protein network was constructed based on the genes identified by GWAS (Supplementary Figure 4), and the interacting proteins were found.These proteins are enriched in terms relating to cellular nitrogen metabolism, amine metabolism, amino acid and derivative metabolism, organic acids and other processes (Supplementary Figure 5).Further studies are needed to fully explore the genetic control of phenolamides biosynthetic pathways.

Conclusion
An association panel and two RIL populations were used to identify candidate genes for 16 phenolamide traits in multiple environments.A total of 58, 58, 39 and 67 QTLs, explaining 9.47%, 10.78%, 9.51% and 11.40% of the phenotypic variation for each QTL on average, were mapped in BBE1, BE2, ZYE1 and ZYE2, respectively.Thirty-nine and 36 significant loci, explaining 10.00% and 9.97% of the phenotypic variation for each locus on average, were identified in two different environments.GRMZM2G066142, GRMZM2G066049, GRMZM2G165390 and GRMZM2G159587 were further validated using bioinformatics approaches.These findings provide insights into the genetic basis of phenolamide biosynthesis in maize kernels, understanding phenolamide biosynthesis and its nutritional content and ability to withstand biotic and abiotic stress.

Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material.Further inquiries can be directed to the corresponding author.

FIGURE 3
FIGURE 3Chromosomal distribution of phenolamides loci and QTLs identified in this study.QTL regions (represented by the confidence interval for linkage mapping and the 100kb up-and downstream of the lead SNP for association mapping) across the maize genome responsible for phenolamides levels from the different populations are shown as midnight blue (AMPE1), green (AMPE2), cyan (BBE1), yellow (BBE2), red (ZYE1) and brown (ZYE2) boxes, respectively.The x axis indicates the genetic positions across the maize genome in Mb.Heatmap under the x axis illustrates the density of amino acid loci and QTLs across the genome.

FIGURE 4
FIGURE 4Heatmap of the expression profiles of candidate genes.The scale bars represent standardized gene levels.S0-S38 indicate days after pollination of maize seed.
FIGURE 5Validation of association analysis using QTL Interval.(A) LOD curves of QTL mapping for level of n1048 in maize kernels on chromosome 10.(B) Scatterplot of association results between SNPs in the confidence interval and the level of n1048.Association analysis was performed using the mixed linear model controlling for the population structure (Q) and kinship (K).(C) The candidate genes of 200kb in the confidence interval.(D) Box plot for n1048 (red) from AMPE1 and expression of GRMZM2G066142 (sky blue) and GRMZM2G066049 (sky blue).(E) Plot of correlation between the n1048 level in AMPE1 and the normalized expression level of the GRMZM2G066142 and GRMZM2G066049, red triangle and blue spot represented the lead SNP GG and AA, respectively.The r value is based on a Pearson correlation coefficient.The p value is calculated using the Student's-t test.

TABLE 1
Detailed information of 16 phenolamides detected in this study.

TABLE 2
Summary of significant loci-trait associations identified by GWAS and QTL identified by linkage mapping.