Genome-wide association study of yield-related traits in common wheat (Triticum aestivum L.) under normal and drought treatment conditions

The primary goal of modern wheat breeding is to develop new high-yielding and widely adaptable varieties. We analyzed four yield-related agronomic traits in 502 wheat accessions under normal conditions (NC) and drought treatment (DT) conditions over three years. The genome-wide association analysis identified 51 yield-related and nine drought-resistance-related QTL, including 13 for the thousand-grain weight (TGW), 30 for grain length (GL), three for grain width (GW), five for spike length (SL) and nine for stress tolerance index (STI) QTL in wheat. These QTL, containing 72 single nucleotide polymorphisms (SNPs), explained 2.23 – 7.35% of the phenotypic variation across multiple environments. Eight stable SNPs on chromosomes 2A, 2D, 3B, 4A, 5B, 5D, and 7D were associated with phenotypic stability under NC and DT conditions. Two of these stable SNPs had association with TGW and STI. Several novel QTL for TGW, GL and SL were identified on different chromosomes. Three linked SNPs were transformed into kompetitive allele-specific PCR (KASP) markers. These results will facilitate the discovery of promising SNPs for yield-related traits and/or drought stress tolerance and will accelerate the development of new wheat varieties with desirable alleles.


Introduction
Wheat (Triticum aestivum L.), one of the most widely cultivated and most important cereal crops, provides approximately 20% of the dietary calories and proteins for human nutrition Li et al., 2022a). However, climate stress and depleting fresh water for agricultural irrigation have severely affected wheat production, and drought stress is a major threat to wheat yield (Ahmed et al., 2022). Wheat is particularly susceptible to drought-induced stress throughout the growth period, and therefore, mining drought-stable QTL is vital for increasing wheat yield.
The rapid development of sequencing and high-density marker genotyping technologies has made GWAS (genomewide association study) an effective method of identifying the quantitative variation of complex crop traits (Nordborg and Weigel, 2008;Hayes, 2013). The GWAS technique has high resolution and accuracy and has been widely used to study agronomic traits (Pang et al., 2020) and biotic and abiotic resistance (Kang et al., 2020;Shan et al., 2022) in wheat. In addition, GWAS performs genome-scale sequencing of numerous cultivars with diverse genetic backgrounds, thus, accelerating the genetic dissection of complex traits in crops (Juliana et al., 2022;Shan et al., 2022).
In this study, 502 Chinese wheat accessions were genotyped using the wheat 15K SNP array. The TGW, GL, GW, SL and STI (stress tolerance index) phenotypes were analyzed under normal (NC) and drought treatment (DT) conditions. Next, GWAS was performed using multi-year experimental data to identify high confidence loci associated with yield-related traits in a hexaploidy wheat genome under the two conditions. The findings will improve the understanding of the genetic mechanism of wheat yield-related and drought resistance traits and provide new genetic loci for breeding new high-yielding and stress-resistant wheat varieties.

Materials and methods
Materials and field experiment Five hundred and two hexaploidy wheat collections in China were used in this experiment (Supplementary Table 1). The wheat collections were mainly from the Winter Wheat Region of the Yellow and Huai Valleys, which accounted for approximately 75% of the wheat produced in China. This study was established in the Dishang Experimental Station of the Institute of Cereal and Oil Crops, Hebei Academy of Agriculture and Forestry Sciences, Shijiazhuang, Hebei Province, China, located at 38°N and 114°83′E. The field experiments were conducted over three consecutive years (2018 -2019, 2019 -2020, 2020 -2021). The average rainfalls in the October -May season of 2018 -2019, 2019 -2020, and 2020 -2021 were 101, 131, and 75 mm, respectively (Supplementary Figure 1). Plants under normal conditions (NC) were irrigated twice at the jointing and booting stages in 2018 -2019 (NC19), 2019 -2020 (NC20), and 2020 -2021 (NC21). In contrast, drought-treated plants were not irrigated throughout the growth period in 2018 -2019 (DT19), 2019 -2020 (DT20), and 2020 -2021 (DT21). All plants were grown in rows 3 by 0.22 m in length and width at a sowing rate of 84 seeds per row in randomized complete blocks. Trials were fertilized and maintained free from weeds, insects, and diseases.

Phenotyping and statistical analysis
Five randomly selected spikes were sampled for manual measurement, and then to generate one average value. The grain traits (TGW, GL, and GW) were recorded using a scaled-camera-assisted phenotyping system (Wanshen Detection Technology Co., Ltd., Hangzhou, China). The stress tolerance index of each accession was calculated using the formula: STI = (TGW.DT × TGW.NC)/(X.NC 2 ), where TGW.NC and TGW.DT were the TGW for each accession under normal and drought treatment conditions, respectively; and X.NC was the average TGW of all accessions under normal conditions (Fernandez, 1992).
All statistical analysis was conducted in R v4.1.0 (Yu et al., 2019)  year effect, and the residual error, respectively. y and r are the number of years and replications, respectively.

Population structure, kinship, and linkage disequilibrium
The phylogenetic tree was generated as previously described . The principal component analysis (PCA) was constructed using PLINK v1.9, and the percentage of variance explained by the top 10 PCs was plotted to determine the optimal number of PCs. Furthermore, the relative kinship matrix (K-matrix) between accessions was calculated using TASSEL v5.2.73 (Bradbury et al., 2007). Linkage disequilibrium (LD) was estimated as the squared allele frequency correlation (r 2 ) using PLINK v1.9. The LD decay distance was obtained by constructing a scatterplot of r 2 values against SNP pairs distance and fitting the points to a smooth curve using the R package "ggplot2".

Genome-wide association mapping
The mixed linear model (MLM) (PCA+K) was used for GWAS using the GAPIT package (Wang and Zhang, 2021) in R software. The first five principal components were included in the GWAS model. The mixed linear model accounts for false positives caused by the population structure and relative kinship. For this reason, the P values with cutoff set at 0.05 was highly restrictive. Considering the potential risk of type II error, and combining the GWAS results of the three years, the -log 10 (P value) ≥ 3.0 were regarded as significant marker-trait associations (MTAs), as shown in Manhattan plots . Important P value distributions (observed P values plotted against expected P values) are shown in Q-Q plots.

KASP maker development
The SNP makers highly associated with grain and spike traits were selected and converted to KASP (kompetitive allelespecific PCR) markers. The KASP markers were used for PCR amplification of wheat DNA in 3 mL final volumes containing 40 -50 ng of genomic DNA, 0.75 mL of 2 × KASP Master Mix (LGC Genomics, product code: KBS-1050-112), and 0.0417 mL of assay primer mix (12 mM of each allele-specific primer and 30 mM of the common primer). The cycling conditions followed: 94°C for 15 min, ten cycles of 94°C for 20 s, touchdown starting at 61°C for 20 s (decreasing 0.6°C per cycle), 26 cycles of 94°C for 20 s, and 55°C for 1 min. The endpoint fluorescence data were visualized with a microplate reader (Omega, BMG LABTECH, Germany) and analyzed by the Kluster-Caller software (LGC Genomics). The Student's ttests were used to compare allelic effects of the phenotypic data and the effectiveness of the associated SNPs using KASP markers. The primers used in this research were listed in Supplementary Table 2.  Table 1). The correlations were analyzed between the three crop seasons for TGW, GL, GW, and SL (Supplementary Figure 3). For all the detected phenotypes, positive correlations were found among different crop seasons under both normal and drought treatment conditions. Therefore, we compared the phenotypes under the different conditions based on the average values. All genotypes exhibited significant differences for each trait, with the coefficients of variation (CV) ranging from 3.83% to 16.39%. Except for GL, the normal and drought treatment conditions were significantly different (P < 0.001) for TGW, GW, and SL. The average TGW under the normal conditions was 48.62 g (ranging from 31.31 to 65.76 g), whereas the average TGW under the drought treatment conditions was 46.68 g (ranging from 32.40 to 63.12 g). Correspondingly, the average STI was 0.97 (ranging from 0.54 to 1.45). Traits GW and SL decreased from 3.62 to 3.55 mm and 8.22 to 7.75 cm under the DT conditions compared to the normal conditions. In contrast, the average GL differed slightly (0.03 cm) between the two conditions (Table 1). Therefore, drought treatment environmentally stressed the plants, reducing their spike and grain yield. All detected traits showed high broad sense heritability (h 2 , 76.16 -95.66%) under the two conditions. We analyzed the correlations between the four traits ( Figure 2). Under the normal conditions, any two of TGW, GL, and GW were positively correlated (r = 0.61*** between TGW and GL, r = 0.78*** between TGW and GW, and r =  0.35*** between GL and GW). Moreover, SL was positively correlated with TGW and GL (0.098* and 0.18***) respectively and uncorrelated with GW ( Figure 2A). The correlations between TGW, GL, and GW were similar under NC and DT conditions. Additionally, SL was weakly correlated with GL (r = 0.085*) but uncorrelated with TGW and GW under DT ( Figure 2B).

Statistical analysis of genotypes
All 502 accessions were genotyped using the wheat iSelect 15K array. After filtering (missing rate > 20% and minor allele frequency < 0.05), 13,705 high-quality SNPs were retained and used for genetic analysis. These SNPs were distributed on all 21 wheat chromosomes, with 4916, 5326, and 3463 SNPs in the A, B, and D sub-genomes. Chromosome 3B had the most SNPs (1024) but the shortest average distance between SNPs (833 kb). However, 4D had the least (264 SNPs and 2056 kb) (Supplementary Figure 4).

Population structure and LD analysis
We used different approaches to cluster the accessions into different groups and assessed the similarity of the results. The neighbor-joining tree divided the panel into three major groups ( Figure 3A). Similarly, PCA classified the panel into three groups despite the numerous admixtures. These results revealed a weak population structure since the first three principal components collectively explained only approximately 19.31% of the total variance, i.e., 9.73, 5.37, and 4.21% for PC1, PC2, and PC3, respectively ( Figure 3B). Overall, the above methods classify the populations into three subgroups.
The filtered SNP data were used to calculate LD. The average r 2 values for the A, B, and D sub-genomes and the whole genome gradually decreased with increasing pairwise distance ( Figure 3C). When the cutoff threshold for r 2 was half of its maximum value, the A sub-genome had the longest LD decay distance (approximately 12 Mb) compared to the B and D subgenomes (> 10 Mb). The LD decay distance for the entire genome was approximately 12 Mb.

GWAS of traits under two conditions
We performed GWAS for all the traits with the MLM in different environments under both conditions. We defined the significant and repetitive SNPs in at least two environments as true and reliable association loci. The purpose was to eliminate the influence of the environmental background and explain the true components of genetic variation. The Q-Q and Manhattan plots based on the average values of the individual traits are shown in Figures 4, 5. The GWAS identified 666 and 821 SNPs in different environments under NC and DT conditions, respectively, including 67 significant SNPs in 51 QTL from the multiple environments (Tables 2, 3 and Supplementary Table 3). In addition, 311 SNPs were detected for STI in different environments. Among them, 18 high confidence SNPs in nine QTL for STI were detected under more than one environment (Table 4 and Supplementary Table 4).

FIGURE 2
Phenotypic performances, distribution, and correlation coefficients for the 502 wheat accessions using the average phenotypic data under normal (A) and drought treatment conditions (B). The frequency distribution of the phenotypic data for each trait is shown in the histograms.  Table 2, and Supplementary Table 3). One important QTL, qNTGW-2B, had five significant SNPs in multiple environments. qNTGW-2B was mapped to the physical region of 666.47 -671.74 Mb on chromosome 2B, accounting for 2.84 -3.40% of the phenotypic variation. Two QTL associated with TGW were detected on chromosome 4A. The first, qNTGW-4A.1, had four SNPs mapped to the physical position of 605.75 -605.78 Mb and explained 2.54 -3.44% of the phenotypic variation. The second qNTGW-4A.2, located at position of 633.30 -635.76 Mb, was also represented by four SNPs and explained approximately 2.72 -3.03% of the phenotypic variation. Besides, chromosome 7D also had two QTL (qNTGW-7D.1 and qNTGW-7D.  Table 3, and Supplementary Table 3). Among these, qDTGW-5A physically was mapped to the physical region of 546.52 -547.33 Mb on chromosome 5A. This QTL was represented by two SNPs and explained 2.33 -3.72% of the phenotypic variation under drought treatment conditions. Another QTL, qDTGW-5D, at 292.94 Mb on chromosome 5D and qDTGW-6A at 328.13 Mb on chromosome 6A explained 2.44 and 3.08% of the phenotypic variation, respectively. Furthermore, two QTL represented by SNPs AX-109375057 and AX-108770812 were located on chromosomes 4A and 7D under both conditions. qDTGW-4A on chromosome 4A explained 3.26% of the phenotypic variation under the DT conditions, whereas qDTGW-7D on chromosome 7D explained 2.69% of the phenotypic variation.  Finally, five QTL (qNGL-5B.1, qNGL-5B.2, qNGL-5B.3, qNGL-5D.1, and qNGL-5D.2) located on chromosomes 5B and 5D were represented by five SNPs and accounted for 2.59 -3.85% of the phenotypic variation.

Grain width
Under the normal conditions, one QTL, qNGW-4A, was detected at 38.36 Mb on chromosome 4A and accounted for

Stress tolerance index
For STI, nine high confidence QTL were detected under more than one environment. These QTL involved 18 SNPs distributed on six chromosomes (including 1B, 2A, 4A, 4B, 6A and 7D) ( Figure 5E, Supplementary Figure 9,

Additive effects of the superior alleles on phenotypes
The marker alleles associated with high TGW, GL, GW, and SL were considered "superior alleles", and the marker alleles with the opposite effects as "inferior alleles" to ease the description of allelic effects. The allelic effect was verified using the highest -log 10 (P) SNPs within the detected QTL in the association population by examining whether the differences in phenotypic values (grouped by polymorphism) reached a significant level (Figure 6). The number of superior alleles in each wheat line was calculated based on representative SNPs for the loci detected using the average values for grain and spike traits across all environments. The patterns of relationship were similar for the detected traits,  Average phenotypic values of accessions with different alleles in multi-environment significant loci under normal and drought treatment conditions for thousand-grain weight (A, B), grain length (C, D), grain width (E, F) and spike length (G, H). The P value was determined by the two tailed Student's t-test. *P < 0.05, **P < 0.01. Inf represents the inferior allele and sup represents the superior allele.

KASP development
Three significant SNP markers (AX-109290429, AX-110418888, and AX-109369427) on chromosomes 2A, 3B, and 5A were identified in multiple environments were converted into KASP markers. The KASP markers could accurately classify different alleles. The results showed that different alleles carrying FAM and HEX fluorescence were distributed around the X and Y axes, respectively (Supplementary Figure 10). These KASP marker-based allelic typing results were consistent with those on the 15K SNP array.

Discussion
The drought stress reduces wheat yields Wheat is particularly vulnerable to drought stress during growth. This study investigated the effects of drought stress on yield-related traits of wheat. Under the drought treatment conditions, TGW, GW, and SL were reduced by 4, 2, and 6% compared to the normal conditions ( Figure 1 and Table 1). However, the broad sense heritability of TGW, GL, GW, and SL was relatively high (76.16 -95.66%) under both conditions, consistent with previous studies (Sukumaran et al., 2018;Wolde et al., 2019;Ji et al., 2021;Wang et al., 2022). Furthermore, the association panel showed significant phenotypic variation in TGW, GL, GW, and SL under normal and drought conditions. These results highlight the potential of these germplasms to identify the alleles associated with plant adaption to harsh environments for improving elite wheat yield.

Identifying and comparing SNPs with previous findings
The association results were validated using MLM, which considers population structure and kinship to define the significant SNPs in over two environments under normal and drought conditions. Therefore, 37 and 39 significant SNPs (in 51 QTL) were considered reliable MTAs in multiple environments associated with the yield-related traits under NC and DT conditions (Tables 2, 3). Eighteen MTAs (in nine QTL) associated with STI were identified in more than one environment (Table 4). Furthermore, we identified the QTL for the stability of grain and spike traits by comparing the MTAs detected under normal and drought treatment conditions simultaneously with those detected only under normal conditions.
A total of 15 QTL containing 17 SNPs associated with GL were detected on chromosomes 1B, 2A, 2D, 3B, 3D, 4B, 5B, and 5D under normal conditions. Among these loci, qNGL-3B.2, associated with GL, was marked at 785.43 -793.05 Mb on chromosome 3BL. Previous studies have also identified QTL that controls GL at adjacent positions (Kumar et al., 2016). We identified three QTL on chromosome 5B, which explained 9.34% of phenotypic variation in GL. The first QTL, qNGL-5B.1, on chromosome 5BS was mapped at a similar position with Qgl.caas-5BS, associated with GL (Yang et al., 2020). The second QTL was located at the neighboring chromosomal region of the grain-length-related QTL, QKL.caas-5BS, detected in the Chinese bread wheat recombinant inbred population (Li et al., 2018). The third QTL, qNGL-5B.3, on chromosome 5BL shared a similar region with QKl5B.2-16 for GL . Notably, the SNP AX-109290429 on chromosome 2AL was significantly associated with GL, TGW and STI, suggesting that this locus is highly valuable for these traits. Position-based comparisons of the SNPs identified in this study with previous reports identified another ten new and preliminary loci associated with GL on chromosomes 1B, 2D, 3B, 3D, 4B, and 5D.
Moreover, there were several QTL for SL detected on chromosome 6B in the previous studies Gao et al., 2015;Deng et al., 2017;Li et al., 2018;Cao et al., 2019). We also detected a credible QTL at the physical position of 528.70 Mb on chromosome 6B. However, this QTL for SL was not previously reported, suggesting that it may be new.

Drought treatment conditions
The SNPs detected under DT differed from those under the normal conditions (Table 3 and Supplementary Table 3) owing to the effects of the drought stress. We also detected five QTL represented by six SNPs for TGW under DT. Most notably, two of these SNPs (AX-109375057 and AX-108770812) were associated with TGW.NC, TGW.DT and STI, suggesting that these SNPs' effects on TGW and STI were extremely important and unaffected by environmental changes (Tables 2-4 and  Supplementary Tables 3, 4). Moreover, the QTL qDTGW-5A for TGW was located at adjacent chromosomal regions (Li et al., 2018;Cao et al., 2020). Furthermore, qDTGW-6A was marked at an approximately equivalent chromosomal region in a previous study (Wang et al., 2011). Comparisons between qDTGW-5D discovered in this research with other QTL identified in previous studies showed that qDTGW-5D, located on chromosome 5D, was a newly detected QTL regulating TGW.
For GL, nine QTL were detected under DT conditions, and six others under NC and DT conditions. Previously, qDGL-7A for GL was only detected under DT conditions in the region near SNP S7A_717859384, associated with GL (Gill et al., 2022). For GW, we detected two QTL on chromosome 6A under DT. These QTL were marked at a near location similar with a previously detected QTL for GW (Li et al., 2022b). We discovered four QTL for SL on chromosome 2AS, 2DS, 3DL, and 5BL under DT. Of these, qDSL-2D, qDSL-3D, and qDSL-5B were located at similar regions with QTL for SL from previous studies (Ma et al., 2007;Heidari et al., 2011;Wang et al., 2011;Wu et al., 2014;Yu et al., 2014;Deng et al., 2017;Chai et al., 2019;Gill et al., 2022;Li et al., 2022a). Notably, qDSL-2A for SL was first discovered in the current study by comparing this study with previous research.
A total of nine high confidence QTL for STI were detected, out of which, seven QTL (qDRI-1B.2, qDRI-2A, qDRI-4A.1, qDRI-4A.2, qDRI-6A, qDRI-7D.1, qDRI-7D.2) were co-located with QTL for TGW under NC or DT conditions. Therefore, these seven co-located QTL can be used to synergistically improve the yield and drought resistance of wheat. To the best of our knowledge, only a few QTL for STI, which was calculated from TGW, have been detected in wheat (Negisho et al., 2022). Nevertheless, they were located at different chromosomal regions with these QTL for STI detected in this study.
Many agronomic traits are complex quantitative traits that are susceptible to environmental factors. Therefore, this study detected eight stable MTAs (two for TGW and STI, six for GL) under normal and drought conditions. The MTAs detected are important for both normal and drought treatment conditions, providing insights into the genetic basis of the stability of agronomic traits. These MTAs can be used to develop useful markers for genetic improvement in wheat breeding.

Validation of multi-environment significant SNPs in the association population
This study identified numerous SNPs closely associated with TGW, GL, GW, SL and STI. Among these, three SNP markers were developed as KASP markers, making the genotype identification process more efficient, inexpensive, fast, convenient, and greatly accelerating the process of molecular marker-assisted selection. Under normal conditions, accessions with superior alleles for TGW at eight SNP loci showed TGW values of 49.29 to 50.17 g. In contrast, accessions with inferior alleles at these loci showed TGW values of 46.45 -48.54 g. Under DT conditions, accessions with superior alleles for TGW at five SNP loci showed TGW values of 47.36 -49.03 g compared to 44.93 -46.55 g for accessions with inferior alleles. Nevertheless, the frequency of superior alleles of SNP AX-109334618 on chromosome 6A was only about 5%, indicating that this SNP has not been fully utilized. Furthermore, this SNP was associated with not only TGW but also STI, suggesting that it is worth considering as a major genetic locus for improving TGW and drought resistance ( Figures 6A, B).
Under normal conditions, accessions with superior alleles for GL at 15 SNP loci showed significantly higher GL values (6.87 -7.07 mm) than those with inferior alleles (6.68 -6.84 mm). Moreover, under DT conditions, the GL values of accessions with superior alleles for this trait at 15 SNP loci ranged from 6.89 to 7.12 mm compared to 6.64 -6.87 mm for accessions with inferior alleles at the same loci. However, superior alleles at some loci, such as AX-109290429, AX-109425314, AX-110418888, AX-109500294, and AX-110521341, were present in < 12% of the accessions, suggesting that these loci require more attention when studying the improvement of GL ( Figures 6C, D). Under normal conditions, accessions with a superior allele at AX-95629274 showed 3.65 mm GW values than the 3.60 mm for accessions with an inferior allele at this locus. Furthermore, accessions with superior alleles at two SNP loci showed higher GW values (~3.53 mm) than accessions with inferior alleles (~3.56 mm) under DT conditions ( Figures 6E, F).
However, accessions with the superior allele of SNP AX-109337721 showed significantly higher SL values (~8.17 cm) than those with the inferior allele (~8.52 cm) under normal conditions. Notably, the superior allele of this SNP was < 14% frequent among accessions, indicating that this SNP is underutilized and a probably significant genetic locus for improving SL. Finally, the SL values of accessions with superior alleles for this trait at the four SNP loci were 7.88 -8.01 cm compared to 7.55 -7.70 cm for accessions with inferior alleles under DT conditions. Nevertheless, the frequencies of superior alleles for AX-108747720, AX-111027124, and AX-111124661 were < 26%, indicating these loci require more attention when improving SL ( Figures 6G, H).
The TGW of the detected varieties from different decades increased with increasing frequency of superior alleles under both conditions. Furthermore, increased TGW was strongly selected for breeding before the 2010s. However, after 2020, the average TGW of the accessions in this study decreased slightly, as did the corresponding number of superior alleles in both conditions (Supplementary Figure 11). Altogether, the elite alleles at these loci detected under two conditions can be integrated into wheat varieties using molecular breeding to improve wheat yield.

Conclusion
In summary, this study identified high confidence loci for yield-related and drought-resistance-related traits in normal and drought-stressed common wheat by GWAS and developed three KASP markers associated with these traits, which is a major step to accelerate the breeding of high-yielding and droughtresistant varieties.

Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

Author contributions
HL and YJZ conceived and designed the experiments. JZ, LS and HG performed most of the experiments and wrote the manuscript. MH, LM, XC and JW provided technical assistance and conducted the collection and maintenance of wheat germplasm. YZ, QL and PW involved in discussion and participated in field trials. All authors contributed to the article and approved the submitted version.