The Genetic Architecture of Grain Yield in Spring Wheat Based on Genome-Wide Association Study

Uncovering the genetic architecture for grain yield (GY)–related traits is important for wheat breeding. To detect stable loci for GY-related traits, a genome-wide association study (GWAS) was conducted in a diverse panel, which included 251 elite spring wheat accessions mainly from the Northeast of China. In total, 52,503 single nucleotide polymorphisms (SNPs) from the wheat 55 K SNP arrays were used. Thirty-eight loci for GY-related traits were detected and each explained 6.5–16.7% of the phenotypic variations among which 12 are at similar locations with the known genes or quantitative trait loci and 26 are likely to be new. Furthermore, six genes possibly involved in cell division, signal transduction, and plant development are candidate genes for GY-related traits. This study provides new insights into the genetic architecture of GY and the significantly associated SNPs and accessions with a larger number of favorable alleles could be used to further enhance GY in breeding.

improvement of yield potential. Also, MAS is a key technique to increase the yield of wheat (He et al., 2010;Rasheed et al., 2016).
The effectiveness and reliability of MAS depends on the number of available genes and tightly linked markers for target traits. Until now, more than 70 genes have been cloned in wheat, among which 40 are associated with GY and related traits (Cui et al., 2014;Rasheed et al., 2016;Nadolska-Orczyk et al., 2017;Li et al., 2018;Wang et al., 2019). For all the cloned genes, about 150 functional markers or kompetitive allele-specific PCR (KASP) were developed (Rasheed et al., 2016). Besides this, more than 100 loci identified by genome-wide association study (GWAS) or biparental linkage mapping for GY-related traits are reported (Cui et al., 2014;Gao et al., 2017;Würschum et al., 2017;Li et al., 2018). However, identifying the novel genes or loci for GY is still important for wheat production.
Single nucleotide polymorphisms (SNP) provide an effective way to identify candidate genes for various traits (Zhu et al., 2008;Wang et al., 2014;Rasheed et al., 2016). Recently, the wheat 55, 90, and 660 K SNP arrays are gradually replacing simple sequence repeats (SSR) and diversity array technology (DArT) markers in genetic analysis in yield, disease resistance, end-use quality, and biotic or abiotic stress tolerance-related traits Li et al., 2016;Liu et al., 2016Liu et al., , 2017Valluru et al., 2017;Quan et al., 2021). Linkage analysis based on biparental populations and association mapping based on natural populations are two main ways to uncover the genetic analysis of complex traits (Zhu et al., 2008;Liu et al., 2017). Compared with linkage analysis, association mapping is based on linkage disequilibrium (LD) and offers an effective and reliable approach to uncover the genetic architecture of complex traits (Zhu et al., 2008;Liu et al., 2017). GWAS uses the natural germplasms, including wild types, landraces, released cultivars, and improved accessions, as materials and bypasses the time of developing biparental populations (Sela et al., 2014;Shi et al., 2017). Furthermore, traditional linkage analysis focuses on specific traits, whereas association mapping can be used to analyze various traits based on the same genotype data (Zhu et al., 2008). Nowadays, GWAS is commonly applied in genetic analysis of complex traits in wheat, such as GY-related traits, disease-related traits (stripe rust, leaf rust, or powdery mildew), and biotic and abiotic stress (Cui et al., 2011(Cui et al., , 2014Azadi et al., 2015;Beyer et al., 2019;Quan et al., 2021).
In China, Heilongjiang and Jilin are the main zones for spring wheat. In this study, 251 spring wheat accessions mainly originating from Heilongjiang andJilin (1930-2020s) were selected. The aims of this study were to (1) detect the loci for GY and related traits in spring wheat, and (2) search for candidate genes for GY-related traits for further study.

Plant Materials and Field Trials
The diverse panel used in the present study contained 251 varieties mainly from Heilongjiang or Jilin province of China (Supplementary Table 1). The diverse panel was grown at Haerbin and Keshan in Heilongjiang province during the 2019 and 2020 cropping seasons. A randomized complete block design with three replicates was employed in field trials. Each plot comprised four 2.0-m rows spaced 20 cm apart with 40 seeds in each row. Agronomic management was performed according to local practices at each location.

Phenotyping and Statistical Analysis
Six phenotypic traits related to GY were evaluated in all four environments, including the GY, spike number per unit area (SNU), SN, spike length (SL), KNS, and TKW. The middle two lines of plants were harvested in each plot at physiological maturity, and GY was measured after the seed water content dried to 14% and was expressed as kg ha −1 . The investigation of the other five traits and statistical analysis were according to Li et al. (2018). BLUP estimation for six traits among four environments was calculated using the MIXED procedure (PROCMIXED) in SAS v9.3 (SAS Institute) 1 following the formula: Of these, y is the observed phenotype, Xb is the fixed effects (environment), Zu is the random effect (genotype), and e is the residual effect.

Genotyping, Population Structure, and Linkage Disequilibrium
The 251 accessions were genotyped using the wheat 55 K SNP arrays. SNP markers with missing data > 20% and minor allele frequency (MAF) < 0.05 were removed to avoid spurious alleles. The physical position for subsequent GWAS analysis followed the IWGSC RefSeq v2.1 2 .
Polymorphic and evenly distributed markers (one marker/LD block) were used to conduct population structure analysis. In this study, after filtering by MAF and missing data, 3000 polymorphic SNP markers (about 5 Mb for the whole genome of common wheat according to previous studies) evenly distributed on 21 chromosomes were analyzed in Structure v2.3.4 (Pritchard et al., 2000) 3 .
To verify the result, PCA and neighbor-jointing (NJ) trees were also estimated using the software Tassel v5.0 (Breseghello and Sorrells, 2006), respectively. After filtering using the Tassel v5.0, 5000 evenly distributed SNP markers were chosen to calculate LD for whole genomes using the full matrix and sliding window options in Tassel v5.0 (Breseghello and Sorrells, 2006). The details for population structure and LD decay analysis followed Liu et al. (2017). Variance components were used to calculate broad sense heritability (h b 2 ) of GY-related traits as h b 2 = σ g 2 /(σ g 2 + σ ge 2 /r + σ ε 2 /re), where σ g 2 , σ ge 2 , and σ ε 2 represent the genotype, genotype × environment interaction, and residual error variances, respectively, and e and r are the numbers of environments and replicates per environment, respectively.

Genome-Wide Association Mapping
To control background variation and eliminate the spurious marker-trait associations (MTAs), the mixed linear model (MLM, PCA + K model) in Tassel v5.0 was used in consideration of the kinship matrix and population structure as follows: Of these, y is the vector of observed phenotype, µ is the mean, x is the genotype, β is the effect of the SNP, u is the random effects due to genetic relatedness with Var (u) = σ 2 g K and Var (e) = σ 2 e, and K is the kinship matrix across all genotypes. The kinship matrix was treated as a random-effect factor and calculated by the software Tassel v5.0, whereas the PCA was considered as a fixedeffect factor and inferred by the Tassel V5.0 in MLM analysis. Due to Bonferroni-Holm correction for multiple testing (α = 0.05) being too conserved, and no significant MTAs were detected with this criterion, markers with an adjusted -log10 (P-value) ≥ 3.0 were regarded as significant markers. Manhattan plots and Q-Q plots were drawn based on R language (R 3.6.5).

The Identification of Candidate Genes
Loci existing in two or more environments are considered stable. To identify candidate genes, the flanking sequences corresponding to the SNP markers (including the SNPs located in the LD decay interval for peak markers) significantly associated with GY-related traits are used in BLASTn and BLASTx searches against NCBI 4 and ENA databases. Also, the annotation information for IWGSC 2.1 was also used to identify candidate genes.

Marker Coverage and Genetic Diversity of the Physical Map
After filtering unqualified markers, 52,503 polymorphic SNPs were employed for construction of physical map and GWAS analysis (Supplementary Table 3 and Supplementary Figure 2). Among the polymorphic SNP markers, 34.9, 35.6, and 29.6% were from the A, B, and D genomes, respectively, indicating that the D genome has the lowest polymorphism (Supplementary Table 3). Of these, the chromosome 6B had more SNPs at 2691, whereas the 4D possessed only 1118 SNPs. The total length of the physical map is 14,058.8 Mb with an average marker density of 0.273 Mb per marker.

Population Structure and Linkage Disequilibrium
Population analysis indicates that all 251 wheat accessions were divided into three subgroups: subgroups I, II, and III. Of these, subgroup I contained 126 varieties mainly from Heilongjiang province ranging from the 1950s to 1980s, such as Kehong, Kehan 8, Kejian, and Xinshuguang 5; subgroup II had 75 varieties mainly from Heilongjiang province ranging from the 1990s to 2010s, such as Kechun 2, Kechun 9, Longmai 2, and Longmai 8; and subgroup III comprised 50 varieties mainly from Jilin province of China and foreign counties (United States, Canada, and Japan) (Figure 1). Furthermore, the average LD for the whole genome was 8 Mb (Supplementary Figure 3).

Candidate Genes
In this study, six candidate genes for GY traits were identified ( Table 3). Two cytokinin ribosides (TraesCS2B02G397600 and TraesCS3B02G281000) were identified in the LD decay of the loci on 2B (592.8 Mb) and . Another gene encoding an E3 ubiquitin transferase (TraesCS3A02G344600) was identified in the LD decay of the loci on chromosome 3A (596.0 Mb). For the loci on chromosome 3A (696.4-700.9 Mb) and 6A (2.5-4.0 Mb), candidate gene (TraesCS3A02G459800) for F-box proteins and serine/threonine-protein kinases (TraesCS7B02G328800) were identified, respectively. Besides this, the gene TraesCS6A02G301800 encoding trehalose 6-phosphatase (T6P) was identified as the candidate gene for the loci 7B (586.4 Mb).

DISCUSSION
The 251 spring wheat accessions were divided into three subgroups (Figure 1), and the characterization of the subgroups was largely consistent with geographic origins, released years, and pedigrees. For example, most of the accessions from Heilongjiang province ranging from the 1950s to 1980s belonged to subgroup 1, the accessions from Heilongjiang province ranging from the 1990s to 2010s belonged to subgroup 2, and subgroup 3 mainly included the accessions from the Jilin province and foreign counties. A significant population structure existed in the diverse panel, and previous studies indicate that the lack of appropriate correction for population structure can lead to spurious MTAs (Zhu et al., 2008). Thus, to eliminate spurious MTAs, an MLM model with subpopulation data (Q) (fixed-effect factors) and kinship matrix (random-effect factor) were conducted. Also, the LD decay influenced several factors, such as population structure, allele frequency, recombination rate, and selection, and seriously affects the precision of association mapping. In this panel, the LD decay for the whole genome was about 8 Mb, consistent with previous reports , and indicates that the number of markers is enough for the subsequent association analysis.

Comparison With the QTL or Gene in Previous Studies
The genes or loci associated with GY-related traits were extensively reported previously. In this study, association of GY and related traits were performed based on the wheat high-density physical map. GY is a typical quantitative inheritance complex trait and significantly influenced by various environments (Gao et al., 2015;Li et al., 2018). The GY-related loci (gene) is distributed on all 21 chromosomes in wheat (Kumar et al., 2007;Reif et al., 2011;Lee et al., 2014;Lopes et al., 2015;Sukumaran et al., 2015;Gao et al., 2017;Li et al., 2018). Azadi et al. (2015) report a QTL for GY-related traits on chromosome 1A, which is tightly linked with the SSR marker gwm357 and located between the two GY QTL mapped by Cuthbert et al. (2008) and Huang et al. (2004). Also, a locus for KNS at chromosome 1A (around 26.8-40.5 Mb) is identified Li et al., 2019). According to the IWGSC V2.0 and the consensus map by Maccaferri et al. (2015), the loci in 1A identified by Azadi et al. (2015) and Li et al. (2019) Li et al. (2019), the locus at 2A (32.3-33.9 Mb) was overlapped with the QTL for GY at 2A (34.7 Mb) in this study; the locus at 3A (22.9-39.3 Mb) were nearly with the QTL for GY at 3A (10.9-21.9 Mb), whereas the locus at 3A (700.1-705.1 Mb) coincides with the locus for KNS on 3A (696.4-700.9 Mb) in our study. Besides this, Azadi et al. (2015) and Gao et al. (2015) identified a locus at the 3A chromosome nearly the 3A locus (696.4-700. Among the 38 loci for GY and related traits, 10 loci discussed above (1A: 28. 2A: 30.2 34.7 Mb, Mb) should be the same as the QTL reported in previous studies. The stable loci validated by both GWAS and QTL mapping between ours and previous studies indicate that they are widespread in varieties and may be more powerful and stable in various varieties. Moreover, the methods of GWAS used in the present study are proven to be reliable and efficient in detecting loci for GY and related traits. Different QTL should be used in different regions. According to the results from association analysis, most of the loci (28) identified in this study existed on two or more environments (including BLUP), indicating high stability, and could be applied in various regions; these loci are only identified in an individual environment (30.2 Mb on 2A, 544.2-544.9 Mb on 5D, 615.2 Mb on 6A, 291.1 Mb on 6D, and 580.8-581.0 Mb on 7B) could be used in a specific environment, whereas those loci (536.4-536.8 Mb on chromosome 6A) only detected in BLUP may provide new insights into the genetic mechanism of GY-related traits.

Candidate Gene Analysis
To identify the candidate genes for GY-related traits, the flanking sequences of SNP markers (including the markers from the LD decay interval) significantly associated with GYrelated traits were used as queries to BLAST against the National Center for Biotechnology Information (NCBI) and European Nucleotide Archive (ENA) databases. In total, six candidate genes were identified ( Table 3) for further research. Of these, two cytokinin ribosides (TraesCS2B02G397600 and TraesCS3B02G281000) were identified in the LD decay of the loci on chromosome 2B (592.8 Mb) and . The cytokinin is a positive regulator of shoot growth and negative regulator of root growth (Han et al., 2014) and has been strongly implicated in many aspects affecting GY-related traits, particularly the kernel number and size (Yamburenko et al., 2017). Besides this, cytokinin also plays crucial roles in the response to biotic and abiotic stressors (Cortleven et al., 2019). TraesCS3A02G344600, coded an E3 ubiquitin transferase, which were identified for the loci on chromosome 3A (596.0 Mb). Previous studies report that the E3-Ubiquitin protein ligases are a large protein family that is important in plant growth and development (Craig et al., 2009;Park et al., 2010). Wang et al. (2020) indicates that a E3 ligase gene, TaSDIR1-4A, contributes to the determination of grain size in common wheat. For the loci on chromosome 3A (696.4-700.9 Mb) and 6A (2.5-4.0 Mb), candidate genes for F-box proteins (TraesCS3A02G459800) and serine/threonine-protein kinases (TraesCS7B02G328800) were identified, respectively. F-box proteins are a large protein family and plays crucial roles in cell-cycle progression, transcriptional regulation, flower formation, signal transduction, and many other cellular processes in plants (Hong et al., 2012;Kaur et al.,  2017; Marcotuli et al., 2018;Guérin et al., 2021). Furthermore, AX-108951749 on 2B and IWA2223 on 5AL encodes the serine/threonine-protein kinases. The serine/threonine-protein kinases play crucial roles in cell-cycle progression, transcriptional regulation, flower formation, and signal transduction (Rehman et al., 2019;Jia et al., 2020). The gene TraesCS6A02G301800 encodes trehalose 6-phosphatase (T6P) on the loci 7B (585.6 Mb), which regulates carbon assimilation and sugar status in plants.
In addition, previous studies report that T6P has also been demonstrated to play an essential role in plant development under abiotic stress (Diab et al., 2013;Gahlaut et al., 2019).

Potential Implications in Wheat Breeding
The conventional breeding approach has led to improved GY on wheat, and breeding selection is time-consuming and not very efficient (He et al., 2010). Significant additive effects are reported between GY traits and a number of favorable alleles, indicating that pyramiding favorable alleles will improve GY traits (Li et al., 2018). The markers associated with GY detected should facilitate MAS. Loci with pleiotropic and consistent effects across each environment should be amenable to MAS. Besides this, the loci validated between ours and previous studies by QTL-mapping or GWAS indicated these loci are stable in various varieties and should be applied in further study. In this study, accessions with agronomic characters and more favorable alleles, such as Kefeng 6, Beimai 15, Kehan 18, Longfumai 6, Longfumai 196, Kechun 7, Longmai 23, Kechun 1, Longfumai 4, and Longmai 13, are recommended as parental lines for improvement of GY-related traits in wheat breeding.

CONCLUSION
In this study, association mapping for GY and related traits (SNU, SL, SN, KNS, and TKW) were conducted in a diverse panel, including 251 spring wheat varieties mainly from China. In total, 38 loci were identified, and each explained 6.5-16.7% of the phenotypic variations. Of these, 12 are overlapped with known genes or QTL, and 26 are likely to be new. Besides this, six candidate genes for GY-related traits involved in grain development, plant hormone signal transduction, and starch biosynthesis were identified. The stable loci and associated markers and varieties with favorable traits and alleles could be used in further wheat breeding. These new loci provide a new sight in genetic architecture of GY-related traits.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

AUTHOR CONTRIBUTIONS
YL and LY designed the research. JT analyzed the physiology data. YL and WY drafted the manuscript. YL, YS, JC, and CT performed the experiment. HZ and LY revised the manuscript. All authors have read, edited, and approved the current version of the manuscript.

FUNDING
This work was funded by the "Key Projects of the Ministry of Science and Technology: Mutation Breeding of Main Crops" (2016YFD0102101). The Heilongjiang Agricultural Innovation Project (HNK2019CX04-06).