A Genome-Wide Association Study Reveals a Rich Genetic Architecture of Flour Color-Related Traits in Bread Wheat

Flour color-related traits, including brightness (L*), redness (a*), yellowness (b*) and yellow pigment content (YPC), are very important for end-use quality of wheat. Uncovering the genetic architecture of these traits is necessary for improving wheat quality by marker-assisted selection (MAS). In the present study, a genome-wide association study (GWAS) was performed on a collection of 166 bread wheat cultivars to better understand the genetic architecture of flour color-related traits using the wheat 90 and 660 K SNP arrays, and 10 allele-specific markers for known genes influencing these traits. Fifteen, 28, 25, and 32 marker–trait associations (MTAs) for L*, a*, b*, and YPC, respectively, were detected, explaining 6.5–20.9% phenotypic variation. Seventy-eight loci were consistent across all four environments. Compared with previous studies, Psy-A1, Psy-B1, Pinb-D1, and the 1B•1R translocation controlling flour color-related traits were confirmed, and four loci were novel. Two and 11 loci explained much more phenotypic variation of a* and YPC than phytoene synthase 1 gene (Psy1), respectively. Sixteen candidate genes were predicted based on biochemical information and bioinformatics analyses, mainly related to carotenoid biosynthesis and degradation, terpenoid backbone biosynthesis and glycolysis/gluconeogenesis. The results largely enrich our knowledge of the genetic basis of flour color-related traits in bread wheat and provide valuable markers for wheat quality improvement. The study also indicated that GWAS was a powerful strategy for dissecting flour color-related traits and identifying candidate genes based on diverse genotypes and high-throughput SNP arrays.


INTRODUCTION
Bread wheat (Triticum aestivum L.) is among the most important food crops and is one of the most traded commodities in world markets (Curtis and Halford, 2014). Developing cultivars with appropriate end-use quality is the primary objective of all wheat breeding programs. Flour color plays a significant role in the end-use quality of wheat, particularly for Asian noodles and steamed bread, since it affects consumer acceptance, market value and human nutrition (Zhai et al., 2016a). Color measurements were expressed as tristimulus parameters (L * , a * , and b * ). L * is a measure of flour brightness, ranging from 0 (black) to 100 (white); a * measures redness when positive or greenness when negative; and b * describes the yellow-blue color value, and is positive for yellowness and negative for blueness (Hutchings, 1999). Yellow pigment content (YPC) is the most important determinant of flour yellowness caused mainly by accumulation of carotenoids in the grain (Mares and Campbell, 2001). It is also a very important quality criterion for pasta made from durum wheat. Understanding the genetic basis of flour color-related traits (L * , a * , b * , and YPC) is necessary for improving wheat quality by marker-assisted selection (MAS).
During recent decades, numerous quantitative trait loci (QTL) for flour color-related traits have been identified using biparental populations (Mares and Campbell, 2001;Patil et al., 2008;Zhang and Dubcovsky, 2008;Zhang et al., 2009;Blanco et al., 2011;Roncallo et al., 2012;Crawford and Francki, 2013a;Colasuonno et al., 2014;Zhai et al., 2016b). Cloning genes relevant to flour color and developing functional markers have become major research focus (Mares and Campbell, 2001;He et al., 2008He et al., , 2009Howitt et al., 2009;Zhang et al., 2011;Crawford and Francki, 2013b). As observed in previous studies flour color is a complex trait, and knowledge of the genetic control is still limited due to use of low density marker platforms and low resolution in bi-parental mapping studies. Phytoene synthase 1 (PSY1) catalyzes the first committed step in carotenoid biosynthesis, and it is generally accepted as the most important regulatory node, significantly correlated with carotenoid accumulation and determined flour color (r = 0.8) (Zhai et al., 2016c). Thus, more effort is needed to further dissect the genetics of flour color-related traits.
Genome-wide association study (GWAS), based on linkage disequilibrium (LD), is an effective complementary strategy to QTL mapping in dissecting associations between genotype and phenotype in germplasm collections (Yu and Buckler, 2006;Zhu et al., 2008). GWAS has a number of advantages over traditional linkage mapping, including the use of germplasm populations, potential for increased QTL resolution, and a wide sampling of molecular variation (Buckler and Thornsberry, 2002;Flint-Garcia et al., 2005;Waugh et al., 2009). With the development of high-throughput genotyping platforms, GWAS has been increasingly used to identify loci responsible for complex traits in plants, including wheat (Brachi et al., 2010;Zhao et al., 2011;Wang et al., 2012;Marcotuli et al., 2015;Tadesse et al., 2015). A potential disadvantage is identification of spurious associations. The mixed linear model (MLM) method is an effective method to avoid spurious associations as it simultaneously accounts for population structure (Q) and genetic relatedness (K) between individuals (Yu and Buckler, 2006;. The wheat Illumina 90 K SNP array made a dramatic improvement in the number of gene-based markers, and widely used to detect QTL for important traits and identify candidate genes (Ain et al., 2015;Sun et al., 2017). Nevertheless, the genetic distances between markers were large on wheat chromosomes using the 90 K SNP array, particularly a low coverage in the D genome, reducing the power of marker identification and the precision of QTL mapping (Liu et al., 2017). Recently, the wheat Axiom 660 K SNP array was developed, providing higher marker density, higher resolution and better coverage of wheat genome .
The objectives of the present study were to: (1) identify the genetic basis underlying flour color-related traits in a collection of 166 bread wheat accessions using the wheat 90 and 660 K SNP arrays, and 10 allele-specific markers for known relevant genes; and (2) propose candidate genes affecting flour color-related traits based on biochemical information and bioinformatics analyses, with the ultimate aim of facilitating molecular breeding. Finally, the genetic determinants of flour color-related traits could be identified more accurately by GWAS using diverse genotypes and high-throughput SNP arrays.

Plant Material and Field Trials
The association panel used in this study was a genetically diverse collection, comprising 166 elite bread wheat cultivars mainly from Yellow and Huai Valley of China, but also Italy, Argentina, Japan, Australia, and Turkey. Information about the accessions, including cultivar names, origins and subpopulation identity, was described in Table S1.
Flour color parameters [brightness (L * ), redness (a * ) and yellowness (b * )] were measured with a Minolta colorimeter (CR-310, Minolta Camera Co., Ltd., Osaka, Japan) using the Commission Internationale de l'Éclairage (CIE) L * a * b * color system, respectively (Oliver et al., 1992). YPC was assessed according to Zhai et al. (2016b). Briefly, 1 g of flour sample was extracted with 5 ml of water-saturated n-butanol, along with shaking on an orbital incubator for 1 h at room temperature. After centrifuging at 5,000 rpm for 10 min, the absorbance was measured at 436.5 nm. YPC was expressed as µg.g −1 using a correction coefficient of 0.301 (American Association for Cereal Chemistry, 2000). Each sample was assayed twice, and a third assay was performed if the difference between two repeats was more than 10%. The mean values were used for statistical analysis.
Statistical analysis was carried out using SAS 9.2 software (SAS Institute, Inc., Cary, NC, USA, http://www.sas.com). Briefly, basic statistics such as means, standard deviation, and minimum and maximum values were calculated with the UNIVARIATE procedure. Analysis of variance was performed using the PROC MIXED procedure, where environments were treated as fixed effects, and genotypes, genotype × environment interaction and replicates nested in environments as random effects. Broad-sense heritability (h 2 ) was calculated following Zhai et al. (2016b). For each trait, a best linear unbiased predictor (BLUP) for each accession of each trait was calculated across four environments and used for further analyses. Pearson's correlation coefficient (r) among the flour color-related traits was calculated using the PROC CORR procedure based on BLUP values.

Genotypic Characterization
Genomic DNA was extracted from seedling leaves using the CTAB (cetyltrimethylammonium bromide) method (Doyle and Doyle, 1987). All 166 accessions were genotyped by both the Illumina wheat 90 K (comprising 81,587 SNPs) and Affymetrix wheat 660 K (containing 630,517 SNPs) SNP arrays by Capital Bio Corporation, Beijing, China (http://www.capitalbiotech. com/). SNP allele clustering and genotype calling were performed using the polyploid version of GenomeStudio software (Illumina, http://www.illumina.com). The default clustering algorithm was initially used to classify each SNP call into three allele clusters. Manual curation was then performed for more accurate genotyping. Markers with a minor allele frequency (MAF) < 5%, and more than 20% missing data were removed from the data matrix. After stringent filtration 259,922 SNP markers from the wheat 90 and 660 K SNP arrays were integrated into a common physical map for GWAS. The physical positions of SNP markers were obtained from the International Wheat Genome Sequencing Consortium (IWGSC RefSeq v1.0, http:// www.wheatgenome.org/) and used in association analysis.

Population Structure and Linkage Disequilibrium Analysis
A subset of 2,000 polymorphic SNP markers, distributed evenly across all the wheat chromosomes, was selected for population structure analysis using STRUCTURE (Pritchard et al., 2000). We applied the admixture model with correlated allele frequencies to assess numbers of hypothetical subpopulations ranging from K = 2 to 12 using 10,000 burn-in iterations followed by 100,000 MCMC (Markov Chain Monte Carlo) replicates. For each K, five independent runs were carried out. According to Evanno et al. (2005), the 166 accessions were structured into three subpopulations as described in Liu et al. (2017).
Pairwise LD was calculated as squared correlation coefficients (r 2 ) among alleles, and significance was computed by 1,000 permutations using TASSEL (Bradbury et al., 2007). LD was calculated separately for loci on the same chromosome (intrachromosomal pairs) and unlinked loci (inter-chromosomal pairs). The extent and distribution of LD were graphically displayed by plotting intra-chromosomal r 2 -values for loci in significant LD at P ≤0.001 against the physical distance, and a locally weighed polynomial regression (LOESS) curve was fitted using XLSTAT (Addinsoft, Paris, France). The critical r 2 -values beyond which LD is due to true physical linkage was calculated by taking the 95th percentile of the square root transformed r 2values of unlinked markers (Breseghello and Sorrells, 2006). The intersection of the LOESS curve with the line of the critical r 2 was estimated to see how fast LD decay occurs.

Marker-Trait Association Analysis
Marker-trait association analysis was performed separately for each environment and BLUP values across four environments using the MLM in TASSEL. Association mapping model evaluations were based on visual observations of the quantilequantile (Q-Q) plots, which are the plots of observed -log 10 (Pvalue) vs. expected -log 10 (P-value) under the null hypothesis that there was no association between marker and phenotype. A false discovery rate (FDR) of 0.05 was used as a threshold for significant association (Benjamini and Hochberg, 1995). The association of a marker with a trait was represented by its R 2value, an estimate of the percentage of variance explained by the marker. To provide a complimentary summary of declared putative MTAs, Manhattan plots were generated using a script written in R software (http://www.r-project.org/).
To assess the pyramiding effect of favorable alleles of MTAs for flour color-related traits identified in this study, the BLUP value for each trait in each accession was regressed against the number of favorable alleles using the line chart function in Microsoft Excel 2016.

Prediction of Candidate Genes for Flour Color-Related Traits
To assign putative biological functions of significant SNP markers associated with flour color-related traits, the flanking sequences of SNPs were blasted against the NCBI (http://www. ncbi.nlm.nih.gov/), European Molecular Biology Laboratory (http://www.ebi.ac.uk/) and European Nucleotide Archive (http://www.ebi.ac.uk/ena) public databases following Zhai et al. (2016b).
In addition, chromosomal locations and physical distances of 40 known genes influencing flour color-related traits, mainly involved in terpenoid backbone biosynthesis and carotenoid biosynthesis and degradation, were evaluated using IWGSC RefSeq v1.0. If the physical distances between MTAs identified in this study and known genes were less than LD decay, the MTAs were considered to be co-located with these known genes. Following Colasuonno et al. (2017), these 40 known genes were blasted against the available dataset of SNP marker sequences (Wang et al., 2014). Markers aligned with more than 80% similarity were considered to be within the coding sequences of known genes. Gene ontology annotation for the candidate genes was also conducted using EnsemblePlants (http://plants.ensembl. org/index.html).

Phenotypic Variation
There was a wide range of phenotypic variation for flour colorrelated traits among the 166 accessions, particularly for a * , b * and YPC where nearly 3 to 16-fold differences were observed ( Table 1). Across all four environments, the mean L * was 90.29, ranging from 87.29 to 92.06; a * averaged −0.86, ranging from −1.75 to −0.11; b * varied from 5.29 to 14.56 and averaged 8.83; and YPC averaged 1.18 µg.g −1 , with a range of 0.58-2.95 µg.g −1 .
The frequency distribution of each trait in each environment is provided in Figure S1.
High broad-sense heritabilities (h 2 ≥0.89) were observed for these traits, indicating that genetic effects played a determinant role for each trait ( Table 1). Analysis of variance indicated that the effects of genotypes, environments, and their interactions were significant for flour color-related traits, and genotype had a larger effect (Table S3). Pearson's correlation coefficients among all flour color-related traits were mostly significant (P < 0.001), except for L * × a * ( Table S4). b * was negatively correlated with L * (r = −0.71) and a * (r = −0.68). YPC was negatively correlated with L * (r = −0.28) and a * (r = −0.89), and positively correlated with b * (r = 0.83).

Population Structure and Linkage Disequilibrium
According to Evanno et al. (2005), K was plotted against the number of subgroups K. The maximum value of K occurred at K = 3, indicating that a K-value of 3 was the most probable prediction for the number of subpopulations (Liu et al. 2017).
In general, subgroup 1 comprised 62 cultivars primarily derived from Shandong province and abroad; subgroup 2 consisted of 54 varieties predominantly from Henan, Shaanxi and Anhui provinces; and subgroup 3 included 50 cultivars mainly from Henan and Hebei provinces. For the whole genome the threshold r 2 , calculated as the 95th percentile of the distribution of r 2 of unlinked markers, was 0.082, and thus all values of r 2 > 0.082 were probably due to physical linkage. The intersection between the threshold r 2 and the LOESS curve was at 8 Mb, which was considered as the LD decay rate of the population. Similarly, the LD decay was 6, 4 and 11 Mb for A, B and D sub-genomes, respectively.

Marker-Trait Association
As shown in the Q-Q plots (Figure S2), the observed -log 10 (P)values were closed to the expected distribution, suggesting that the MLM model (Q + K) was appropriate for association analysis of flour color-related traits in this study. One hundred MTAs for flour color-related traits were detected, and 78 were identified in all four environments ( Table 2). The highest number of MTAs was detected for YPC (32), followed by a * (28), b * (25) and L * (15). The associations between markers and flour color-related traits are shown by Manhattan plots in Figure 1 and Figure S3.

In Silico Prediction of Candidate Genes
The putative biological functions of 1,543 significant SNP markers associated with flour color-related traits identified in the present study were assigned (data not shown). Briefly, AX_109030196, associated with a * , b * and YPC, corresponds to premnaspirodiene oxygenase (PO) gene involved in sesquiterpenoid and triterpenoid biosynthesis ( Table 3). For the MTA on chromosome 1D (97,314,998-97,315,098 Mb), two candidate genes for a * and YPC were identified; one is a putative gene (IWB35120) encoding the dihydrolipoyl dehydrogenase (DLD) enzyme, and the other (IWB31766) corresponds to leghemoglobin reductase (LBR), both involved in glycolysis/gluconeogenesis. Either one or even both genes are potential candidates for the observed MTA in this genomic region. AX_111471334, corresponding to phosphomannomutase (PMM) gene, is a potential candidate gene for the MTA on chromosome 3B (146,971,117-146,971,187 Mb) for YPC. IWB36240, significant for a * and YPC, corresponds to 4hydroxy-3-methylbut-2-en-1-yl diphosphate synthase (MDPS) gene involved in terpenoid backbone biosynthesis.

Detection of QTL by GWAS
GWAS has become an efficient tool for genetic dissection of complex traits. Among the diverse accessions analyzed in this study, substantial phenotypic variation in flour color-related traits indicated a wide range of genetic diversity (Table 1, Figure S1). Based on diverse genotypes and high-throughput SNP arrays, GWAS was a powerful strategy for dissecting flour color-related traits and identifying candidate genes.
Spurious associations are often the result of structured relationships within the population and may be reduced by taking population structure into account (Pritchard et al., 2000;. Therefore, assessment of population structure is important prior to conducting a GWAS. As described in Liu et al. (2017) the core collection in this study was structured into three subpopulations, largely in agreement with their geographical origin. This pattern can be caused by several factors, including disproportional usage of a limited number of founders in developing regional populations and enrichment of alleles associated with regional adaptation by local breeding programs.
As shown in Q-Q plots (Figure S2), MLM greatly reduced spurious associations. Consistency across environments was used as an additional criterion for MTAs significant at FDR < 0.05 to reduce the risk of false marker-trait associations. Finally, the 100 MTAs for flour color-related traits were identified in at least three environments, which will be useful for metabolomic studies of flour color-related traits. The more robust markers can then be implemented in breeding programs to ensure that increasingly stringent color requirements imposed by industry are met through early screening of breeding lines. Importantly, the effects of these MTAs and consequent values for selection in breeding programs require validation in bi-parental populations.

Correlations Among Flour Color-Related Traits
Pearson correlation coefficients among all flour color-related traits showed that b * was negatively correlated with brightness (r = −0.71) and redness (r = −0.68) and yellow pigment content was negatively correlated with redness (r = −0.89) and positively correlated with yellowness (r = 0.83). As expected the strong phenotypic correlations among flour color-related traits were generally based on a large number of shared MTAs ( Table 2). b * had five and 13 MTAs in common with L * and a * , respectively. YPC shared 27 and 13 same genetic regions with a * and b * , respectively. Of course, some loci were also identified specific for L * , a * , b * and YPC, respectively, indicating that flour  color-related traits could be controlled by common major-effect loci, but also modified by numerous trait-specific loci.
To improve flour color-related traits, using multi-trait markers in MAS may increase QTL pyramiding efficiency. Most Chinese foods, such as steamed buns and noodles, typically require a white or creamy flour type (higher L * ) with lower b * and YPC. In this situation, these multi-trait regions could be important targets to reduce b * and YPC and increase L * simultaneously.
Because no QTL for L * was previously mapped to chromosome 7A (2), the two MTAs in this work may be new loci associated with this trait. Similarly, MTA for a * located on chromosome 2A (629,632,[627][628][629]632,697 Mb) and MTA for YPC on chromosome 7D (635,560,079-635,560,149 Mb) were not matched to any previously identified QTL. Hence these loci are likely to be new and should be attractive candidate regions on which to focus further in dissecting the genetic architecture of flour color-related traits in wheat. Notably, MTAs on chromosomes 1A (7,653,653,223 Mb) and 2B (418,276,276,723 Mb) showed stronger association with a * than Psy-A1 and Psy-B1 in at least three environments (data not shown). Similarly, 11 MTAs on chromosomes 1A (7,653,153-7,653,223 Mb), 1D,2B (418,276,276,723 Mb), 4D, 5B (379722302-379722372 Mb), 5D, 6A (594,420,538-594,420,608 Mb), 6B (2), 6D, and 7B (481,839,758-481,839,828 Mb) explained much more phenotypic variation of YPC than Psy-A1 and Psy-B1 in at least three environments (data not shown). These demonstrated that GWAS is a powerful mapping approach for identifying genomic regions underlying variation in flour color based on diverse genotypes and high-throughput SNP arrays; they also confirmed previously detected QTL and revealed novel QTL that were not found in bi-parental populations previously.

Candidate Genes
Knowledge of the functions of significant SNP markers provides a very useful tool for identification of candidate genes for traits under investigation. Based on a comparative genomics approach, annotations of flanking sequences of significant SNP markers predicted that some genomic regions encode proteins that are important components of pathways linked to carotenoid biosynthesis and degradation ( Table 3). For example, PO (AX_109030196) participated in sesquiterpenoid and triterpenoid biosynthesis; DLD (IWB35120), LBR (IWB31766) and PMM (AX_111471334) were involved in glycolysis/gluconeogenesis; and MDPS (IWB36240) played a role in terpenoid backbone biosynthesis. These genes can be considered potential candidate genes for flour color-related traits.
Over the past decades, the gene discovery in bread wheat has been largely limited due to the absence of reference genome sequences. However, recent advances in highthroughput genotyping platforms and publicly available wheat genome sequences offer researchers new opportunities to achieve their goals. Based on the reference wheat genome sequences from the IWGSC RefSeq v1.0, 40 known genes influencing flour colorrelated traits were assigned to chromosome locations ( Table S5). Eleven of the 40 known genes were located close to 23 MTAs for flour color-related traits ( Table 2), including four terpenoid backbone biosynthesis genes (FPPS2, IPPS, DXR and MK), five genes involved in carotenoid biosynthesis (ZISO, LYCB, PSY-D1, BCH2 and NCED4), and two related to carotenoid degradation (LOX2 and POD). Therefore, the co-linearity of 11 known genes and 23 MTAs suggested that these might be candidate genes for flour color-related traits. In fact, all these candidate genes should be confirmed in bi-parental population mapping or by reverse genetic approaches.
Based on gene ontology analysis, all these 16 candidate genes are related to four main biological processes. Briefly, PMM is involved in GDP-mannose biosynthetic process; MDPS, FPPS2, IPPS, DXR and MK participate in the isoprenoid/terpenoid biosynthetic process; ZISO, LYCB, PSY and BCH2 involve in the carotenoid biosynthetic/carotene process; and PO, DLD, LBR, NCED4, LOX2 and POD are related to the oxidation-reduction process; these are corresponding to carotenoid precursor supply, carotenoid biosynthesis and carotenoid degradation, respectively (Rodríguez-Concepción et al., 2001;Qin et al., 2012;Zeng et al., 2015).

Effects of Favorable Alleles on Flour Color-Related Traits
To assess the pyramiding effect of favorable alleles of MTAs for flour color-related traits, we examined the number of favorable alleles in each accession, and the BLUP value for each trait was regressed against the number of favorable alleles. The favorable alleles of significantly associated SNPs showed additive effects on brightness, redness, yellowness and yellow pigment content (Figure 2). As the number of favorable alleles increased, L * , b * and YPC values also increased and a * decreased, and linear regressions (R 2 ) between numbers of favorable alleles and phenotype were 0.94, 0.72, 0.91 and 0.87, respectively. Higher b * and YPC are desirable for yellow alkaline noodles and human health, while most Chinese foods typically require a white or creamy flour type with lower b * and YPC. In this situation, cultivars with minimum (Wennong5, Lumai21, Jinan17 and Funo) or maximum (Huaimai18, Zhongmai875, Shan715, Lumai11 and Jinmai61) numbers of favorable alleles for b * and YPC can be used as breeding parents to achieve defined color and nutritional properties of end-use products.

CONCLUSIONS
We performed a genome-wide association analysis on 166 bread wheat cultivars using the wheat 90 and 660 K SNP arrays and 10 allele-specific markers, and identified 100 MTAs for flour color-related traits. Broad comparison of MTAs identified in this study with QTL in previous reports indicated many common loci conditioning flour color-related traits, and four MTAs detected were new, including MTAs on chromosome 7A (2) for L * , chromosome 2A (629,632,627-629,632,697 Mb) for a * and chromosome 7D (635,560,079-635,560,149 Mb) for YPC. Two and 11 loci explained much more phenotypic variation of a * and YPC than phytoene synthase 1 gene (Psy1), respectively. Based on biochemical information and bioinformatics analyses 16 predicted candidate genes were related to carotenoid biosynthesis and degradation, terpenoid backbone biosynthesis and glycolysis/gluconeogenesis. We will confirm these candidate genes in bi-parental populations and do some gene function analysis in the future.
The genomic regions associated with flour color-related traits identified in this study bring new insights to understanding the genetic basis of these traits, and new markers are useful for wheat quality improvement by MAS. Moreover, the candidate genes may serve as promising targets for study of the molecular mechanisms underlying flour color-related traits in wheat. This study also confirmed that GWAS is a powerful approach to validate known genes for complex traits and identify novel loci.