A Combined Association Mapping and Linkage Analysis of Kernel Number Per Spike in Common Wheat (Triticum aestivum L.)

Kernel number per spike (KNPS) in wheat is a key factor that limits yield improvement. In this study, we genotyped a set of 264 cultivars, and a RIL population derived from the cross Yangmai 13/C615 using the 90 K wheat iSelect SNP array. We detected 62 significantly associated signals for KNPS at 47 single nucleotide polymorphism (SNP) loci through genome-wide association analysis of data obtained from multiple environments. These loci were on 19 chromosomes, and the phenotypic variation attributable to each one ranged from 1.53 to 39.52%. Twelve (25.53%) of the loci were also significantly associated with KNPS in the RIL population grown in multiple environments. For example, BS00022896_51-2ATT, BobWhite_c10539_201-2DAA, Excalibur_c73633_120-3BGG, and Kukri_c35508_426-7DTT were significantly associated with KNPS in all environments. Our findings demonstrate the effective integration of association mapping and linkage analysis for KNPS, and underpin KNPS as a target trait for marker-assisted selection and genetic fine mapping.


INTRODUCTION
Yield improvement is an on-going endeavor in wheat breeding. Wheat yield is determined by spike number per unit area, kernel number per spike (KNPS) and thousand kernel weight. Increased yield mainly depends on increased KNPS when the other two parameters are unchanged (Slafer and Andrade, 1989;Fischer, 2008Fischer, , 2011Dobrovolskaya et al., 2015). Thus, molecular interpretation of the inheritance mechanism of KNPS is of significance for marker-assisted selection and molecularly designed wheat breeding.
The major methods for genetic dissection of complex traits in crop species include family-based linkage mapping and association mapping of germplasm collections (Mackay and Powell, 2007;Cadic et al., 2013). Association mapping has three advantages compared to conventional linkage mapping: (1) it saves time and cost of construction of suitable segregating populations, and by using existing populations there can be a wide diversity of materials; (2) it is able to detect multi-allelic variation, and thus helps to identify the most favorable alleles contributing to a target trait in a single analysis; and (3) its higher resolution is more powerful for fine mapping of quantitative trait loci (QTLs; Breseghello and Sorrells, 2006;Atwell et al., 2010). Research across many crops has shown that association analysis is a promising method for mining favorable alleles, despite some limitations. For example, association analysis is less efficient than linkage analysis for study of species with low genetic diversity (Zhao et al., 2007;Myles et al., 2009). However, association analysis and linkage mapping are complementary methods, and their combination can be used for cross-validation (Nordborg and Weigel, 2008). Thus, an integrated application of both methods is more efficient in discovering and validating QTLs in crop species (Zhang et al., 2011).
With recent developments in wheat gene chip technology and reduced of sequencing costs, single nucleotide polymorphism (SNP) markers have been extensively adopted due to their high density, representativeness, stable inheritance, and capability of automatic detection (Allen et al., 2011;Cavanagh et al., 2013). In particular, the 90K SNP GoldenGate chip based on the Illumina platform has been widely applied in detection of polymorphisms in both tetraploid and hexaploid wheat (Akhunov et al., 2009;Lai et al., 2012). The iSelect wheat 90K SNP chip has been used to discover yield-related QTLs in wheat. For example, an F 8 -generation RIL population derived from the cross Zhou 8425B/Chinese Spring was used by Gao et al. (2015) to identify 24 yield-related QTLs, of which five loci (QGC-W.caas-7AL, QNDVIS.caas-7AL, QGC-S.caas-3AS, QCTD-A.caas-5BS, and QCTD-10.caas-5BS) were detected simultaneously in multiple environments. Through genomewide association analyses, Zanke et al. (2015) detected 58 loci significantly associated with thousand kernel weight and distributed in all chromosomes except 4D and 5D, and Ain et al. (2015) detected 44 loci significantly associated with yield (grain number per spike, thousand grain weight, grain yield, biological yield, and harvest index) in germplasm sets. As these genome-wide association studies were based on genetic resource collections and focused on mining of yield-related genes or loci the individual findings still need validation by linkage analysis in segregating populations.
In this study, a set of 264 wheat cultivars, and a RIL population with 198 lines derived from the cross Yangmai 13/C615 were genotyped using iSelect wheat 90K SNP high-density chips. Together with the KNPS phenotypic data detected in multiple environments, we confirmed the presence of KNPS-related loci and identified favorable alleles through an integration of association mapping and linkage analysis. The findings provide useful information for marker-assisted selection of KNPS in wheat.
Yangmai 13, bred by Lixiahe Agricultural Institute in Jiangsu province, has been extensively promoted in the Lower and Middle Yangtze River Valley winter wheat region and C615 is a synthetic wheat accession [durum cultivar CEAT × AE. SQUARROSA (895)] introduced from the International Maize and Wheat Improvement Center (CIMMYT), Mexico. The average KNPS for Yangmai 13 exceeds that of C615 (57.9 vs. 47.0; Table 1). The F 7 RIL population of 198 lines was developed by single seed descent from an initial F 2 population.

Phenotyping
The cultivar set was planted in 2013-2014 and 2014-2015 at Jingzhou in Hubei province and Yangzhou in Jiangsu, and in 2015-2016 at Xinxiang in Henan; these environments were designated 14JZ, 14YZ, 15JZ, 15YZ, and 16XX, respectively. The field trials were grown as thrice-replicated randomized blocks. Lines in each replicate were planted in 3-row plots at a density of 40 kernels/133 cm row, and a row spacing of 25 cm. Plant densities were thinned to around 30 at the seedling stage. Thirty spikes of each line were randomly selected from the middle row and used to score KNPS. Field management followed local practices. The Yangmai 13/C615 RIL population was planted in 2014-2015 at Yangzhou, and in 2015-2016 at Yangzhou and Jingzhou. These environments were named 15YZ, 16JZ, and 16YZ, respectively. The field experiments were designed as randomized blocks with three replicates. Plot sizes, plant densities, and spike sampling were similar to those described for the cultivars.

Genotyping and Data Analysis
Genomic DNA was extracted from the test materials using the CTAB method (Sharp et al., 1989). Statistical analyses were conducted on SPSS 21.0 (http://www.brothersoft.com/ibm-spssstatistics-469577.html). The mean KNPS was computed by a best linear unbiased predictor (BLUP) method (Bernardo, 1996a,b;Bernardo et al., 1996). SNP markers were detected by the Biotechnology Center, Department of Plants, University of California, USA, by using the Illumina SNP genotyping platform and BeadArray Microbead Chip (Cavanagh et al., 2013). SNP allele clustering and genotype calling were conducted on Genomestudio v2011.1 . Chromosome position of SNP markers are provided in Cavanagh et al. (2013).
Genetic diversity of SNP markers was analyzed on PowerMarker 3.25 (Liu and Muse, 2005). Genetic structure of the cultivar set was evaluated by Structure 2.3.2 using 3,656 SNP markers distributed on all 21 wheat chromosomes (Pritchard et al., 2000). The number of subpopulations was determined by a K model (Evanno et al., 2005). Genome-wide association analysis of KNPS with SNP markers was based on a Q+K model (Yu et al., 2005;Zhang et al., 2010) and TASSEL 5.0 (Bradbury et al., 2007;http://www.maizegenetics.net/). SNP loci at frequencies lower than 0.05 were not considered, the threshold P of association signals was set as the 1/SNP marker number (1/20,037 = 4.99 × 10 −5 ), or namely P < 4.99 × 10 −5 , −Log P > 4.30. The genetic effects of favorable alleles at associated loci in the cultivar set, and in the RIL population were tested via ANOVA on SPSS 21.0.

Phenotypic Assessment of the Cultivar Population and Rils
Analysis of KNPS in the cultivar population grown in five environments (14JZ, 14YZ, 15JZ, 15YZ, 16XX) and best linear unbiased predictions (BLUP) indicated coefficients of variation of KNPS data in the range 9.58-14.84%. Between-environment correlation coefficients for KNPS varied between 0.521 and 0.874 (P < 0.001) for cultivars, compared to 0.513 and 0.833 (P < 0.001) for the RIL population (Table S2). Although the coefficient of variation of KNPS for the RIL population of 7.58-10.94% was less than the cultivar population the variation was still quite rich ( Table 1).

Allelic Diversity and Genetic Structure Analysis
The genetic diversity of the cultivar population was analyzed using 22,325 SNP markers. The major allele frequency (MAF) varied from 0.500 to 0.998 (mean 0.785), polymorphism information content (PIC) varied between 0.004 and 0.375 (mean 0.238), and gene diversity varied between 0.004 and 0.500 (mean 0.294; Table S3), indicating this set of cultivars has high genetic diversity at the SNP level.
To reduce false associations, genetic structure (Q-value) and between-individual relationship coefficient (K-value) of the cultivar population were determined. The population divided into two subpopulations (Figure 1A), and K was maxim at K = 2, further validating the above subdivision ( Figure 1B).

Favorable Alleles and Their Genetic Effects
The genetic effects of alleles at the 47 associated loci (Table 3) ranged from 0.45 to 3.68, indicating positive effects on KNPS. GENE-4456_153-7B TT has the largest effect (2.80 kernels per spikelet, 14JZ; 2.24 kernels, 14YZ; 3.25 kernels, 15JZ; 2.51 kernels, 15YZ; 3.68 kernels, 16XX) and was detected in all environments (Table 3). Moreover, the frequency of favorable alleles at associated loci varied from 8.33 to 92.82%. The frequencies of 19 favorable alleles exceeded 50% and were distributed in obviously skewed ways, indicative of prior strong selection in breeding programs.

Overlapping between Association Signals and Linkage Analysis
To validate the effectiveness of associated loci in the germplasm set, we used the same iSelect wheat 90K SNP chip to scan the RIL population. Among the 47 associated loci found earlier 16 (34.04%) were polymorphic between Yangmai 13 and C615. Furthermore, the genetic effects of the favorable alleles in the cultivar population were analyzed by ANOVA in the RIL population. Twelve loci (25.53% of all associated loci) were significantly correlated with KNPS in multiple environments (P < 0.05). In particular, four favorable alleles, BS00022896_51-2A TT , BobWhite_c10539_201-2D AA , Excalibur_c73633_120-3B GG , and Kukri_c35508_426-7D TT were significantly associated with KNPS in all environments. BS00022896_51-2A TT had a large genetic effect on KNPS (2.09 kernels, 15YZ; 1.05 kernels, 16JZ; 1.28 kernels, 16YZ; 1.22 kernels, BLUP; Table 4). Moreover, analysis of the 12 SNP loci showed that the favorable alleles at those loci were identical in both populations, indicating that these alleles had consistent effects in both populations.
BobWhite_c10539_201 was associated with KNPS in all environments in the cultivar population ( Figure 3A, Table 2), and the favorable allele was AA. The frequency of this allele was 70.45% in the cultivar population and was obviously skewed in a positive direction, indicating strong selection during modern breeding ( Figure 3B). The KNPSs in the cultivar population in each environment (14JZ, 14YZ, 15JZ, 15YZ, 16XX, BLUP) increased by 1.39, 1.50, 1.15, 1.19, 2.92, and 1.20, respectively ( Figure 3C). Moreover, the average KNPS of the lines carrying the AA allele in all environments for the RIL population were significantly higher than that for lines carrying other alleles (P < 0.01; Figure 3D).

Utilization of the 90K Wheat iSelect SNP Chip
SNP markers are the richest and ultimate point mutations in genomes, and are representative of ancient and stable variation. Genotyping using SNPs can be standardized, and the differences can be as simple as a single base pair. Moreover, SNP chips are highly-integrated, and can be combined with analysis software and relevant breeding information Maccaferri et al., 2015). Thus, SNP markers can be used in a molecular breeding platform. Currently, applications of SNP chips in studies on QTLs in wheat are still at a preliminary stage. For instance, Thompson et al. (2015) used 9 K SNP chips to scan a wheat RIL population derived from   The 90K wheat iSelect SNP chip used in this study consisted of 81,587 SNP markers and scanned 34,039 polymorphic markers in the cultivar population, with a polymorphism frequency of 41.72%. Finally, 7,320 polymorphic markers were scanned in the RIL population, with a polymorphism frequency of 8.97%. Since the marker sequences of this SNP chip were known, sequence alignment can be used in evaluating marker effectiveness. Future work will enable construction of a high-density genetic linkage map for the RIL population.

Integration of GWAS and Bi-Parental Linkage Analysis
Association analysis and traditional linkage mapping can be used in a complementary manner for gene identification and validation (Nordborg and Weigel, 2008). Using germplasm and RIL populations of soybean, Korir et al. (2013) detected five loci associated with aluminum resistance in both populations. The combination of the two methods improved the efficiency of screening for aluminum resistance candidate genes in soybean. Li et al. (2014) detected 22 seed weight and silique lengthrelated QTLs in rape using a bi-parental population. Loci uq.A09-1 and uq.A09-3 were significantly associated in a germplasm population grown in multiple environments. Maccaferri et al. (2016) identified three major QTL clusters controlling root length and mass, including RSA_QTL_cluster_5#, RSA_QTL_cluster_6#, and RSA_QTL_cluster_12#, in two RIL populations and a germplsm set of durum wheat. The sequences surrounding these QTL will permit functional marker development and gene cloning. To validate association loci detected in a cultivar set we used the same SNP markers to scan the Yangmai 13/C615 RIL population. Among the 47 associated loci, 12 (25.53%) were significantly associated (P < 0.05) with KNPS in the RIL population grown in a different set of environments. In particular, BS00022896_51-2A TT , BobWhite_c10539_201-2D AA , Excalibur_c73633_120-3B GG , and Kukri_c35508_426-7D TT were significantly associated with KNPS in all conditions ( Table 4).
Among the 47 associated loci 35 (74.47%) were not validated in the RIL population. The main reason for this was the restricted bi-allelic polymorphism in a single segregating population. New strategies for combining linkage mapping and association analysis have been reported. For example, nested association mapping (NAM) is considered the most effective method to explain the genetic basis of quantitative traits for low-level LD species. NAM more effectively and economically scans at the genome-wide level, and helps to integrate molecular variation at the molecular level with that of complex phenotypic traits Saade et al., 2016).
In this study with the iSelect wheat 90K SNP chip we combined data from association mapping based on a germplasm set with linkage analysis of a RIL population to discover and validate QTLs for KNPS in wheat. Our findings theoretically permit cloning of candidate genes for KNPS. Moreover, the more important loci discovered here can be preferentially targeted in marker-assisted selection for high yield. Thus, the combination of association analysis and linkage mapping, development and application of powerful statistical models, and application of high-density SNP markers will promote research on the genetics of complex quantitative traits in crop species.