Genome-Wide Association Study Identifies Candidate Genes for Starch Content Regulation in Maize Kernels

Kernel starch content is an important trait in maize (Zea mays L.) as it accounts for 65–75% of the dry kernel weight and positively correlates with seed yield. A number of starch synthesis-related genes have been identified in maize in recent years. However, many loci underlying variation in starch content among maize inbred lines still remain to be identified. The current study is a genome-wide association study that used a set of 263 maize inbred lines. In this panel, the average kernel starch content was 66.99%, ranging from 60.60 to 71.58% over the three study years. These inbred lines were genotyped with the SNP50 BeadChip maize array, which is comprised of 56,110 evenly spaced, random SNPs. Population structure was controlled by a mixed linear model (MLM) as implemented in the software package TASSEL. After the statistical analyses, four SNPs were identified as significantly associated with starch content (P ≤ 0.0001), among which one each are located on chromosomes 1 and 5 and two are on chromosome 2. Furthermore, 77 candidate genes associated with starch synthesis were found within the 100-kb intervals containing these four QTLs, and four highly associated genes were within 20-kb intervals of the associated SNPs. Among the four genes, Glucose-1-phosphate adenylyltransferase (APS1; Gene ID GRMZM2G163437) is known as an important regulator of kernel starch content. The identified SNPs, QTLs, and candidate genes may not only be readily used for germplasm improvement by marker-assisted selection in breeding, but can also elucidate the genetic basis of starch content. Further studies on these identified candidate genes may help determine the molecular mechanisms regulating kernel starch content in maize and other important cereal crops.


INTRODUCTION
Maize (Zea mays L.) is one of the most widely grown crops in the world, serving as the staple food for more than 900 million people. Maize seeds (i.e., kernels) provide a rich source of carbohydrates, mostly in the form of starch, which accounts for 65-75% of kernel dry weight. Therefore, the seed yield of maize is largely determined by the efficient biosynthesis and storage of starch (Boyer and Hannah, 2001). Accordingly, improving the kernel starch content could yield higher value products in the corn processing industry (Goldman et al., 1993).
Starch biosynthesis is a complicated process. In brief, sucrose is transported to the sink (e.g., kernels) and converted to uridine diphosphate (UDP)-glucose and fructose by sucrose synthase encoded by the Sh1 gene (Carlson and Chourey, 1996), and then UDP-glucose serves as the primary precursor for starch biosynthesis. In the endosperm, starch biosynthesis is coordinated by a series of enzymes, including ADP-glucose pyrophosphorylase (AGPase), soluble starch synthase (SS), starch branching enzyme (SBE), starch debranching enzyme (SDBE), and plastidial starch phosphorylase (Pho1), whereas amylose is synthesized by AGPase and granule-bound starch synthase (GBSS; Jeon et al., 2010). AGPase, mainly present in maize endosperm, embryo, and leaf tissue, is a heterotetramer comprised of two structurally related dimerized polypeptides designated as the large subunits (LS) and the small subunits (SS), which are encoded by the Shrunken2 (Sh2) and Brittle2 (Bt2) genes, respectively (Huang et al., 2014). Overexpression of related genes, such as Bt2, Sh2, and Shrunken1 (Sh1), can induce significant increases of the starch content in maize endosperm (Hannah et al., 2012;Jiang et al., 2013). In contrast, mutations occurred at the starch accumulation genes can usually cause starch content decreases. For example, the null mutant of AGPase LS maize endosperm results in 25-30% less starch content relative to the wild type (Tsai and Nelson, 1966). Additionally, the two transcripts encoding AGPase LS, namely agpsemzm and agpllzm, contributed to approximately 7% of the total endosperm starch content (Huang et al., 2014).
Kernel starch content is a typical quantitative trait controlled by a large number of quantitative trait loci (QTL). Despite recent progress in understanding starch biosynthesis and accumulation at molecular levels, QTLs underlying phenotypic variation in starch content across maize lines are still to be defined (Willmot et al., 2006;Dudley et al., 2007;Chen et al., 2012). Additionally, inconsistencies in the number and locations of the detected QTLs for kernel starch content have been reported using traditional QTL markers (Li et al., 2009;Wang et al., 2010;Yang et al., 2013). These inconsistent results are probably caused by the limited population size and genetic diversity within mapping populations, and the resultant type I or type II errors are often further magnified by the use of traditional molecular markers, which can only capture a small portion of the genetic diversity of the species. Genomewide association studies (GWAS) with SNP arrays are an alternative to the traditional QTL-mapping approaches using markers such as SSRs, AFLPs, RFLPs, and SRAPs. Compared with traditional QTL mapping approaches, association mapping has exhibited several advantages in exploring the genetic mechanism underlying the complex agronomic traits. First, the populations used in association mapping can be collections of individual varieties, inbred lines, or landraces with varying traits. Diversified historical recombination events among different lineages make it possible to discover linked markers associated with causative genes (Xue et al., 2013;Wen et al., 2014). Second, the application of SNP arrays allows the detection of 10s of 1000s of loci simultaneously. Mutations or standing variation may influence the expression and functionality of genes and ultimately lead to different agronomically important characters. The main objectives of this study were to identify QTLs controlling kernel starch content in maize through GWAS using newly developed robust maize SNP50 array (Ganal et al., 2011) and to predict the possible candidate genes responsible for starch synthesis and accumulation.

Plant Material
Accurate quantification of the target trait is a prerequisite for molecular mapping and a biased population structure can lead to false positive associations (i.e., type I error). In this study, a population comprised of a global core collection of 263 maize inbred lines was used to represent a wide range of diversity, among them, 71 inbred lines are from tropical and subtropical zones, while the remainder are from temperate regions. In order to ensure the normal flowering and pollination of the tropical germplasm, we choose to conduct this study in the tropical regions of Sanya. All the inbred lines were planted in a field in Ledong (Hainan Province, 18.75 • N, 109.17 • E) in Nov. 2011. The average annual rainfall total was 1,181 mm. The average annual temperature is 23 • C. The average annual sunshine total is 1039.6 h. The accumulated temperature (in excess of 10 • C) is 9,300.7 • C. In the field plot, 24 seeds of each maize inbred line were planted in a randomized complete block design with three replicates (across growing seasons). In one block, eight seeds of each maize inbred line were planted in a row, leaving a 20-cm gap between each line. During the growing seasons, plants were irrigated and given extra doses of pollen by hand. The mature seeds of each inbred line were harvested in bulk and used for starch content analysis.

Determination of Starch Content
For three consecutive years, the maize kernels of 263 lines were harvested and analyzed each year to minimize variation across different years. The subsampled kernels from the bulk harvest were used for starch content measurement with a near-infrared analyzer (model: MATRIX-1, Bruker Corporation, Karlsruhe, Germany) according to the method described by Dudley and Lambert (1992). The uniform kernels of maize were filled into the sample cup, and the excess corn was scraped off with a ruler. Every sample from every year was analyzed three times to generate technical replicates, and the average values of the three replicates were calculated for further analyses. The data was analyzed using the ANOVA method as implemented in SPSS 18.0 (IBM Corp., Armonk, NY, USA). Broad-sense heritability was also calculated according to the method developed by Knapp et al. (1985) with the formula is the genetic variance, σ 2 GE is the genotype × environment interaction variance, σ 2 e is the error variance, n is the number of environments, and b is the number of replications in each experiment. A frequency map was made using Origin 8.0

Association Mapping Analysis
A modified CTAB procedure was followed for DNA extraction in this study (Healey et al., 2014), and the DNA quality and concentration were verified by gel-electrophoresis and spectrophotometer before genotyping. The panel consisting of 263 inbred lines was genotyped with the Illumina MaizeSNP50 chip (Illumina, Inc., San Diego, CA, USA) according to the method described by Ganal et al. (2011), which was also applied in previous studies (Xue et al., 2013). The kinship matrix was calculated using the Loiselle algorithm (Loiselle et al., 1995) to determine relatedness among the sampled individuals. With the two matrixes, a mixed linear model (MLM) was applied to analyze the dataset using the GWAS software GAPIT (Genome Association and Prediction Integrated Tool-R package; Lipka et al., 2012), and principal components (PCs) were used to control for population structure (quantile-quantile plots showed over fitting of a PC + K model). These methods were used for genome-wide association mapping with 52,370 SNPs (MAF > 0.05).

Gene Ontology Analysis
The available genome sequence of the maize line B73 was used as the reference genome for candidate gene analyses (Schnable et al., 2009). The approximately 120-bp SNP probe sequences (Illumina, Inc.) were used as queries to BLAST against MaizeGDB (Lawrence et al., 2004) 1 . A 100-kb window was selected in order to fall within the estimated 100-kb window of linkage disequilibrium (LD) decay that occurs in our association panel. The genes within this window size were identified in MaizeGDB according to the positions of the closest flanking significant SNPs (P < 0.0001) or supporting intervals (Maize genetics and genomics database, 2015) 2 . The functions of corresponding genes were predicted using the Blast2Go program (Conesa et al., 2005). 1 http://archive.maizegdb.org/ 2 http://gbrowse.maizegdb.org/gb2/gbrowse/maize_v2/

Variation in Starch Content among the Maize Inbred Lines
The starch contents of the 263 maize lines (Supplementary Table S1) were measured using the NIRS method. In this panel, the average kernel starch content was 66.99% with a range from 60.60% to 71.58% (Table 1). Additionally, the starch content showed a high heritability of 88.30%. The majority of inbred lines produced kernels that were of 65-70% starch content exhibited a normal distribution (r 2 = 0.88; Figure 1). Starch content varied significantly among different germplasms (P ≤ 0.05; Table 1). These results suggested that the selected population was suitable for the association analysis of kernel starch content.

Linkage Disequilibrium in the Association Panel
All 52,370 SNPs (MAF > 0.05) were used as input data to calculate the genome-wide LD in the present panel. A rapid decline in LD was observed with increasing physical distance on all chromosomes (Chr), but the decay rate varied among Chr (Figure 2). LD was reached within 30-45 kb on Chr 1, 50-60 kb on Chr 4 and 5, and 80-150 kb on the remaining Chr. The mean length of LD decay across all Chr was 80-100 kb (r 2 = 0.1). At a cut-off of r 2 = 0.2, the mean length of LD decay decreased rapidly to 5 kb.

Genome-Wide Association Study and SNP Discovery
In the present study, population structure was controlled for by using the PC matrix, with six PC axes explaining about 6.9% of the variance. Quantile-quantile plots (Figure 3) showed that the MLM model with six PC axes effectively accounted for the population substructure of starch content. A total of four SNPs (Supplementary Table S2) significantly associated (P < 0.0001) with starch content were detected on Chr 1, 2, and 5 (Figure 4), among which SNP SYN1878 was the most significant (P = 2.9 × 10 −5 ). These results are consistent with the quantitative nature of starch content, which is known to be controlled by a large number of genes with small effects.
The overall LD decay across the genome of this panel was 100 kb; as such, a 200-kb region flanking the left and right side of each SNP was defined as a QTL, which were then used to identify   genes from the maize genome browser publically available at www.maizegdb.org. A total of 77 candidate genes were found in these associated regions (listed in Supplementary Table S2). The genes within this 200-kb window size were identified through MaizeGDB according to the positions of the closest flanking SNPs or supporting intervals 3 . Based on the gene annotation of the B73 maize genome, some genes that are likely involved in starch synthesis were subjected to gene ontology (GO) analysis. The Blast2Go program was used to predict the functions of these genes ( Table 2), and at least two of these genes or their orthologs were known functional genes involved in the starch synthesis pathway: (GRMZM2G163437, which encodes the ADP glucose pyrophosphorylase SS, and GRMZM2G450163, which encodes 6-phospho-fructokinase).

NIRS for Kernel Starch Content Analysis
Traditional starch content measurement protocols employ kernel powdering and starch hydrolysis to measure starch content according to the percentage of hydrolyzed soluble sugar glucose in the dry kernel (Yemm and Willis, 1954;Li et al., 2011). This method is not only laborious and time-consuming, but also damages the integrity of maize kernels that could have value in breeding programs. On the contrary, NIRS is a fast, 3 http://gbrowse.maizegdb.org/gb2/gbrowse/maize_v2/  Gene ontology (GO) terms are categorized and denoted as cellular components (C), molecular functions (F), or biological processes (P) according to their superscripts.
Frontiers in Plant Science | www.frontiersin.org reliable, and non-destructive method for the evaluation of such breeding traits, especially using bulk materials (Plumier et al., 2013). The reliability of NIRS for measuring maize kernel starch content has been previously confirmed by Fang (2011), who found a remarkable positive correlation (r 2 = 0.961) between the values measured by NIRS and those obtained by traditional chemical methods. In our study, the NIRS method also yielded reliable results as demonstrated by the small deviations (<5% in most cases) among the biological replicates (Supplementary Table S1).

Population Structure for Association Mapping of Kernel Starch Content
In previous studies, kernel starch content was often studied together with protein or oil content (Wassom et al., 2008;Li et al., 2009;Yang et al., 2013) and most genetic materials used in these previous experiments were derived from high oil maize lines. To conduct a proper GWAS on kernel starch content in maize, it was necessary to test an unbiased population with a sufficient number of individuals from among diverse genetic backgrounds. In our study, the association panel was comprised of tropical, subtropical, and temperate lines from regions including South Asia, North and South America, Europe, and the Middle East, representing the core diversity of the broader species population that has been characterized previously (Yan et al., 2009;Xue et al., 2013). Our study particularly focused on results obtained in the Hainan seed base because the primary agronomic features of these lines have been preliminarily evaluated in various maize breeding programs. Owing to previously demonstrated significant genotype-by-environment effects (Visioni et al., 2013), the interaction between genotype and environmental conditions was considered in the design of this study, and starch content data from 3 years (2011, 2012, and 2013) was used to minimize the effect of growing environment on starch content.

Maize Kernel Starch Content Is Genetically Controlled by Many Small Effect QTLs
The complex genetic architecture of starch content has been demonstrated through long-term selection experiments using inbred lines (Moose et al., 2004). The continued phenotypic response of kernel composition provides convincing evidence that starch content is controlled by a number of small effect QTLs as demonstrated by other studies using near isogenic lines (Goldman et al., 1993;Clark et al., 2006;Dudley, 2008;Wassom et al., 2008;Li et al., 2009;Cook et al., 2012). The advent of high-density DNA marker linkage maps in many plant species has provided an opportunity to identify QTLs (Liu et al., 2008;Sun et al., 2008;Li et al., 2009). Through this GWAS, we were able to detect substantially more reliable QTLs. The information provided in this study can serve as the starting point for functional gene studies to clarify the genetic mechanisms underlying starch content diversity among different maize lines.

Potential Genes Involved in Kernel Starch Biosynthesis and Accumulation Revealed by GWAS
The maize SNP50 BeadChip maize array used in this study contains a total of 56,110 SNPs. Given the 2,300-Mb genome size of maize, there is approximately 50 kb of total sequence for each SNP marker used in this study. Therefore, a significantly associated SNP may represent an approximately 100-kb region covering the 50 kb upstream and downstream along each chromosome. Thanks to the publically available maize genome browser, we were able to browse and retrieve the coding gene information from the associated QTLs .
Efforts were made to utilize functionally characterized maize genes for starch content regulation. Four structural genes influence starch synthesis in maize: Sh2, Bt2, opaque-2 (O2), and Sh1 (Goldman et al., 1993). Some functional genes involved in metabolic reactions were also detected in this study. For example, through GO analysis, the Glucose-1-phosphate adenylyltransferase (APS1) gene (gene ID GRMZM2G163437) was identified as a closely linked gene; this protein presumably catalyzes the alpha-D-glucose-1-phosphate to ADP-glucose (ADPG) pathway and plays an important role in starch synthesis (Li et al., 2009). Another identified gene (gene ID GRMZM2G450163) encodes 6-phospho-fructokinase, which is involved in the Embden-Meyerhof Pathway (EMP). The EMP is important to the energy cycle as it converts fructose to glucose-1-phosphate. A transposable element (gene ID GRMZM2G128149) was also identified. As the maize genome is heavily duplicated and full of gene insertions owing to transposable elements, it is not surprising to see that this particular transposon might be involved in starch content regulation. Similarly, Liu et al. (2014) reported a transposable element insertion that disturbed the starch synthase gene SSIIb in maize and thus altered starch content.

FUTURE PERSPECTIVES
The GWAS presented here was able to uncover associations between SNPs and kernel starch content in maize. Although this methodology only provides a statistical link between traits and genomic sequences, such information can be a solid starting point for functional genetic studies. The gene candidates identified by our GWAS will be profoundly enhanced as additional molecular evidence (e.g., via transgenic approaches) becomes available. Furthermore, SNPs within candidate genes can also be used to further test the contributions of these genes to traits like kernel starch content.

AUTHOR CONTRIBUTIONS
This study was conceived by JT and NL. GWAS and GO analyses were conducted by YX and ZG. Collections of maize were performed by WL. NL wrote the manuscript. All authors read and approved the final manuscript.