Genome-Wide Association Mapping for Yield and Related Traits Under Drought Stressed and Non-stressed Environments in Wheat

Understanding the genetics of drought tolerance in hard red spring wheat (HRSW) in northern USA is a prerequisite for developing drought-tolerant cultivars for this region. An association mapping (AM) study for drought tolerance in spring wheat in northern USA was undertaken using 361 wheat genotypes and Infinium 90K single-nucleotide polymorphism (SNP) assay. The genotypes were evaluated in nine different locations of North Dakota (ND) for plant height (PH), days to heading (DH), yield (YLD), test weight (TW), and thousand kernel weight (TKW) under rain-fed conditions. Rainfall data and soil type of the locations were used to assess drought conditions. A mixed linear model (MLM), which accounts for population structure and kinship (PC+K), was used for marker–trait association. A total of 69 consistent QTL involved with drought tolerance-related traits were identified, with p ≤ 0.001. Chromosomes 1A, 3A, 3B, 4B, 4D, 5B, 6A, and 6B were identified to harbor major QTL for drought tolerance. Six potential novel QTL were identified on chromosomes 3D, 4A, 5B, 7A, and 7B. The novel QTL were identified for DH, PH, and TKW. The findings of this study can be used in marker-assisted selection (MAS) for drought-tolerance breeding in spring wheat.


INTRODUCTION
Wheat (Triticum aestivum L.) is a major crop worldwide contributing about 20% of calories to the human population. Current genetic and genomic improvements in wheat have helped increase its production; however, further improvements are essential to increasing wheat productivity to feed the world's population, which is projected to reach over nine billion by 2050 (Hertel, 2011;Sapkota et al., 2019). Wheat production is often reduced by several biotic and abiotic stresses including drought and heat. Plant breeding has improved crop resistance to both biotic and abiotic stresses; nevertheless, the progress is slow and the yield gap between stress-prone areas and favorable production regions of major crops, including wheat, is high (Edae et al., 2014). Therefore, breeding efforts are being focused on dissecting the genetics of abiotic stresses including drought in wheat to develop knowledge and resources to speed up the development of climate-resilient wheat cultivars.
Drought poses a major threat to crop yield, highlighting the urgent need to develop drought-tolerant cultivars ). The majority of countries worldwide experience drought problems, even those in humid regions as they often have dry spells at some point. Obviously, drought is more severe in arid areas with minimal rainfall (Sun et al., 2006). North Dakota is the biggest producer of hard-red spring wheat (HRSW) in the USA (North Dakota Wheat Commission, 2016). The state, especially the semiarid western half, experiences frequent droughts (Climate Change and the Economy, 2008). Consequently, HRSW, a major cash crop for ND and the USA, is regularly affected by drought in this region. Developing and releasing drought-tolerant HRSW cultivars is critical to countering ND drought conditions, but this cannot be done without understanding the genetics of drought tolerance for HRSW in northern USA.
Quantitative trait locus (QTL) analysis allows genetic dissection, which can be a sound approach for understanding the molecular basis of drought tolerance in HRSW. In the past, several QTL mapping studies for drought tolerance in wheat were conducted (Kirigwi et al., 2007;Alexander et al., 2012;Ibrahim et al., 2012a,b). These studies have used different types of markers, including SSRs, EST-STS, and DArTs. However, almost all of these studies were based on low-resolution molecular maps consisting of only 102-690 markers. The number of markers in the previous studies seems insufficient to saturate the wheat genome due to its large size of 17 gigabase pairs (Brenchley et al., 2012). Also, drought tolerance is a quantitative trait adopting different mechanisms (Blum, 1988) and should have several QTL distributed throughout the whole genome. A highresolution map can provide a more complete genetic dissection of drought tolerance and also a successful application of associated molecular markers through marker-assisted selection (MAS) programs. The Infinium iSelect 90K assay (Wang et al., 2014), with more than 81,000 gene-associated SNPs to assess polymorphism in bread wheat, provides a better means to identify SNPs tightly linked to drought tolerance.
Biparental QTL mapping, even when using high-density linkage maps, suffers some limitations. The biparental population has fewer recombination events and therefore has low resolution. By comparison, association mapping (AM) exploits a broader population and multiple alleles and has a better resolution of the QTL (Yu and Buckler, 2006). A few AM studies on drought tolerance conducted in the past have used a small number of markers (Dodig et al., 2012;Edae et al., 2013Edae et al., , 2014Ballesta et al., 2020;Maulana et al., 2020), which seems insufficient to explore the variation in wheat, efficiently. Dodig et al. (2012) used 46 SSR markers, and Edae et al. (2013) used 78 DArT markers. Maulana et al. (2020) used greenhouse for phenotyping for the association study, whereas Ballesta et al. (2020) used field experiments for their study but showed the association only for chromosome 4A. Furthermore, to our knowledge, no genetic studies were conducted on HRSW germplasm to elucidate the QTL associated with drought-related traits in the region. Therefore, the present study was undertaken to identify genes/QTL associated with yield and agronomic traits evaluated under drought conditions in field experiments using an association mapping approach combined with high-density SNP marker assay.

Plant Materials
In 2012, a germplasm panel comprising of 350 HRSW inbred lines developed by the HRSW breeding program at North Dakota State University (NDSU) and different cultivars with varying drought tolerances was used for this study (Supplementary Table 1). Eleven more accessions were added for the experiments conducted in 2013 and 2014 (Supplementary Table 2). These lines were developed over time from different crosses, and pedigree selections for different purposes such as drought tolerance, disease resistance, quality, and yield were used for this study.  (2014), respectively, during the growing season (North Dakota Agricultural Weather Network, 2015). The total rainfall in the experimental sites during the growing period was considered to assess the drought condition ( Table 1). The water holding capacity of the experimental sites was achieved from the soil type (Frazen, 2003). In 2012, the experiment was conducted in a randomized complete block design (RCBD) with two replicates, whereas a simple lattice design was used in 2013 and 2014. The plots had an area of 2.44 × 1.22 m and seven rows with a 15.24 cm gap between them in 2012 and 2013. The plot size in 2014 was 2.44 × 1.42 m, but the number of rows was still seven with a larger gap (17.78 cm) between them.

Field Experiments and Data Collection
The phenotypic data was collected on DH, PH, YLD, TW, and TKW. The heading date (DH) was recorded when more than 50% of the plants in the plot were starting to flower. Plant height (PH) was measured in the middle of the plot from plant base to tip excluding the awn. Yield per plot (YLD) was converted to yield/ha for further analysis. Similarly, kg/0.5-pint cup was converted to kg/m 3 as the test weight (TW) for further analysis. A thousand kernels were counted using a seed counter and were weighed for thousand kernel weight (TKW).

Single-Nucleotide Polymorphism Genotyping
Genomic DNA was isolated from lyophilized young leaves of each genotype using the DNeasy Plant Mini Kit (Qiagen, Valencia, CA, cat. no. 69106). The quality of the DNA was checked on 0.8% agarose gel. The NanoDrop 1000 spectrophotometer (NanoDrop Technologies Inc., Wilmington, DE) was used to check the DNA concentration. The accessions of the AM panel were genotyped using the Illumina 90K iSelect wheat SNP assay (Wang et al., 2014) in the Small Grains Genotyping Lab at USDA-ARS, Fargo, ND. The Illumina iSelect 90K assay produced data for 81,587 SNPs. The analyses of SNP genotyping, clustering of the SNP alleles, and calling of the genotypes were performed with Genome Studio v2011.1 (www.illumina.com). The minimum number of points used in the cluster was 10 (Wang et al., 2014). Monomorphic SNPs and SNPs having more than 20% missing genotypic data and 10% heterozygosity were excluded. The best linear unbiased prediction (iBLUP) method (Yang et al., 2014b) was used to impute the missing genotypic data for the remaining SNPs. The polymorphic SNPs selected after filtering based on the above mentioned criteria were screened for their positions on the chromosomes based on the wheat consensus genetic map (Wang et al., 2014).
Phenotypic Data Analysis (ANOVA, Descriptive Statistics, and Frequency Distribution) The ANOVA Proc MIXED procedure was used (SAS Institute, 2004) to analyze the phenotypic data from 2012, whereas for 2013 and 2014, the Proc LATTICE was used. The accessions of the AM panel were considered as fixed effects, and environments and blocks were considered as random effects in the ANOVA Proc MIXED procedure. The mean values were separated using the F-protected least significant difference (LSD) value at the P ≤ 0.05 level of significance. CORR procedure of SAS (SAS Institute, 2004) was used to calculate Pearson correlations between traits for each environment. The phenotypic data with a low coefficient of variance (CV) value and significant differences among entries were used for further analysis. The locations that did not show significant differences for most of the traits and with a high CV were not included for further analysis and reporting.

Marker-Trait Association Analysis
Population structure was calculated using markers with pairwise R 2 < 0.5 for all pairwise comparisons. To assign the subpopulation membership for each genotype, STRUCTURE software version 3.2 was used (Pritchard et al., 2000). We used an admixture model with independent allele frequencies, a burnin of 100,000, and an MCMC replication of 500,000 for K = 1-10 with five replications. The delta k calculated from the STRUCTURE software was used to select the optimum number of subpopulations. The number of subpopulations (k) was plotted against the delta k calculated using the STRUCTURE software. Pairwise linkage disequilibrium (LD) between markers in the null model was calculated as the squared allele frequency correlation (R 2 ) in the R-package (Lipka et al., 2012) after filtering for minor allele frequency (MAF) ≥ 5%. Genome-wide LD decay was estimated by plotting R 2 against the corresponding pairwise genetic distance (cM) (Wang et al., 2014). AM analysis was conducted using the software TASSEL v.5.0 (Bradbury et al., 2007). The mixed linear model (MLM) with PC + Kinship (K) was used for AM, where the genotypic data were filtered for minor allele (≤ 5%) frequency. A total of 14,816 filtered SNPs were used for further AM study. The initial cutoff point for marker-trait association (MTA) was considered at p ≤ 0.001. Then, this cutoff was subjected to Bonferroni correction (Yang et al., 2014a) to get the threshold (p ≤ 3.4 * 10 −6 ). Only the markers identified to be associated in at least two environments were reported.

Candidate Gene Analysis
For candidate gene analyses, the sequences of the markers showing MTAs were obtained from the T3/Wheat database (Blake et al., 2016) and their physical positions were extracted using the BLAST search against Chinese_Spring_IWGSC_RefSeq1.0 (Appels et al., 2018, Alaux et al., 2018 to identify the most proximal gene. The physical position of each marker was utilized to identify if it represents the perfect marker in the annotated gene space of Chinese Spring. For markers anchored in the non-gene space, flanking genes were obtained by manual IWGSC RefSeq1.0 (Blake et al., 2019) genome scanning in the GrainGenes Genome Browser (https:// wheat.pw.usda.gov/GG3/genome_browser). For each MTA, the linked gene set can be extracted from the GrainGenes database. Finally, annotation of identified genes was included to predict their function. In silico expression analysis was carried out in the Wheat Expression Browser expVIP (Ramírez-González et al., 2018, Borrill et al., 2016 dataset for drought and heat stress and PEG to stimulate drought.

Phenotypic Analyses
Significant differences among genotypes were found in the environments of Casselton Table 2). The rest of the phenotypic data was not analyzed further. The seeds of Minot 2013 could not be cleaned due to Fusarium head blight infection, and hence, YLD, TW, and TKW could not be reported for that environment. Also, TKW for Minot 2014 and Hettinger 2014 was not reported. The phenotypic analyses of the data showed that the heading date had a highly significant negative correlation with YLD, TW, and TKW in all environments ( Table 2). Heading date showed a significant positive correlation with PH in four environments. Plant height had a significantly negative correlation with YLD in four environments including the overall mean. It had a significantly positive correlation with TW in six environments, whereas it did not show any correlation with TKW. The yield had a strong positive association with TW in five environments. Also, it showed a strong positive association with TKW in all the environments including the overall mean. Again, TW had a strong positive correlation with TKW in three environments ( Table 2). The frequency distributions of the phenotypic data for DH, PH, YLD, TW, and TKW showed a continuous distribution, a characteristic of the typical quantitative traits (Figure 1).

Marker Distribution, Population Structure, and LD
A total of 17,514 polymorphic SNPs were selected after the filtering-based criteria mentioned earlier in the material and methods section (Supplementary Table 3). An additional 2,756 SNPs were excluded for lacking map positions on the consensus hexaploid wheat maps (Wang et al., 2014). Out of 14,816 SNP markers used in the AM study, 7,848 were located on the Bgenome, 5,503 on the A-genome, and 1,465 markers on the Dgenome. The D-genome had the lowest density of markers, with an average distance of 0.87 cM between two markers. The number of markers on individual chromosomes ranged from 56 (4D) to 1,433 (2B). The average number of markers per chromosome was 705.52 ( Table 3). The number of subpopulations (k) was plotted against the delta k calculated using software STRUCTURE. The peak of the broken line graph was observed at k = 7, indicating that the natural population can be divided into seven subpopulations (Figure 2). The association was analyzed using five principal components (PC), which captured 25% of the variation. Genome-wide LD decay was estimated by plotting the R 2 value against the corresponding pairwise genetic distance (cM) and the LD heat map was created for the whole genome (Supplementary Figure 1). The LD pattern varied by chromosome even after controlling for population relatedness. Overall, the A and B genomes showed high LD compared to the D-genome. The LD dropped at an approximate genetic distance of 10 cM, and therefore, ±10 cM was used to establish confidence intervals for QTL regions. Furthermore, SNP markers with a pairwise R 2 ≥ 0.7 were considered as a single locus.

Identification of QTL and Associated Candidate Genes
In this study, we detected a total of 69 QTL involved with drought tolerance on all chromosomes except 5D using AM (Table 4; Supplementary Figure 2). Twenty QTL six were associated with DH. These QTL explained 5.6-11.33% of phenotypic variation (PV). Five of those QTL explained >10% of PV and therefore were considered major QTL. Twelve of the QTL were identified to be constitutive, and eight of the QTL were identified exclusively in drought-prone environments. Similarly, a total of 20 QTL six were associated with PH. These QTL explained 4.54-48.01% of PV, with major effects (>10% PV). Sixteen QTL were identified as constitutive, three were identified in the control environments, and one was identified in the drought environment (Table 4). Seventeen QTL were identified for YLD. These QTL explained 4.11-12.04% of PV (Table 4). Only one QTL, located on chromosome 4B, had a major effect. Sixteen QTL were identified as constitutive, and the remaining QTL was identified in the drought-prone experimental sites. Five QTL were associated with TW. All of these QTL had minor effects, explaining 3.7-7.66% of PV. All of the QTL identified were constitutive. Seven QTL were identified for TKW, all of which had minor effects, explaining from 5.2 to 9.2% of PV. One QTL among them was constitutive, and the remaining six were identified in the drought-prone environments ( Table 4).
The sequence of 69 markers associated with QTL was mined from Wang et al. (2014) to identify their physical position in the Chinese_Spring_IWGSC_RefSeq1.0 (Appels et al., 2018). Out of 69, seven markers (BobWhite_c47948_76, D_contig00840_473, Kukri_c15043_326, Excalibur_c25353_1171, Ex_c3115_2742, Excalibur_rep_c92985_510, Excalibur_c56240_176) were manually anchored on the genome. A total of 47 SNPs were anchored in the predicted gene space, and 22 were found in the intergenic region (Supplementary Table 4). For these intergenic SNPs, we identified the most proximal flanking genes representing the possible linked gene contributing to the phenotype. Thus, a total of 91 AM-QTL-associated most proximal genes along with their predicted function and genome orientation ( Table 5) were mined from the genome. Using the expVIP analysis (Ramírez-González et al., 2018, Borrill et al., 2016, we explored the in silico expression of these genes affected by drought stress (Supplementary Figure 3) which could be used to prioritize the candidate genes. However, for breeding purposes all the MTAs are valuable.

Association Analyses
Studies are conducted to dissect the genetics of drought tolerance in many crops including wheat, and it is well-known that drought tolerance is a complex quantitative trait affected by genetic and environmental factors (Gahlaut et al., 2019). In this study, the iBLUP method (Yang et al., 2014b) was used to impute missing genotypic data as it was reported to tolerate a high rate of missing data especially for rare alleles, compared to the common imputation methods. High-density singlenucleotide polymorphism (SNP) genotyping arrays explore genomic diversity and MTAs very efficiently (Wang et al., 2014). Infinium iSelect 90K assay uses more than 81,000 gene-associated SNPs to reveal polymorphism in allohexaploid wheat populations  (Wang et al., 2014;Kumar et al., 2016Kumar et al., , 2019. Higher genome coverage and resolution in the dissection of wheat's agronomic traits are possible using this genotypic tool (Kirigwi et al., 2007;Alexander et al., 2012;Ibrahim et al., 2012b). The marker density found in this study (0.49 cM/marker) was in agreement with the previous studies using the 90K Infinium iSelect assay (Wang et al., 2014;Ain et al., 2015;Kumar et al., 2016). The MLM model used in this association study has been proven to be very efficient for genome-wide association studies (GWAS) and can be used with either structure (R) or principal component (PC) analyses. This study used five PCs, which captured 25% of the variation. The MLM model, which accounts for both structure and relatedness (PC + K), was used for the marker-trait association study. Determining the threshold for the p-value is crucial. A liberal threshold will declare a false-positive association (a type I error), whereas a too stringent threshold is likely to miss a true association (a type II error). Taking this into consideration, the initial cutoff was chosen as p ≤ 0.001, which was not very stringent. Then, the threshold (p ≤ 3.4 * 10 −6 ) was determined using the Bonferroni correction (Yang et al., 2014a), which was very stringent. The MTAs identified at the initial cutoff and the threshold were reported if they were identified in at least two environments. This repetition of the MTA further minimized any false associations.

Use of Secondary Data to Assess Drought Conditions
Drought can be assessed by variable weather conditions, soil moisture, and crop conditions over a particular growing season (Lanceras et al., 2004). Therefore, rainfall data were collected, and the soil types of the experimental sites, which reflect soil moisture, were taken into consideration to assess drought conditions for this study. The total amount of rainfall was collected from planting date to plant physiological maturity. The dates for the physiological maturity of the plants were calculated by adding 30 days to DH (Simmons et al., 1914).

Days to Heading
Several major and minor QTL were revealed for DH, which indicated the quantitative nature of the trait. The eight QTL for DH, identified exclusively under drought conditions, could play a vital role in drought tolerance. Also, the constitutive QTL can be used for drought tolerance breeding in wheat. Some of these QTL (exclusively expressed under drought conditions, called constitutive QTL) identified in this study likely correspond with some already reported QTL associated with drought tolerance; however, further studies such as allelism test is warranted to determine their relationship. Malik et al. (2015) identified three adjacent QTL on chromosome 2A for drought tolerance related to the photosynthetic rate, cell membrane stability, and relative water content. The QTL QDH.ndsu.2A.1 in this study likely represent one of those QTL previously reported (Malik et al., 2015;Gahlaut et al., 2019), but further studies are required to determine their relationship. Two QTL identified in this study on the chromosome 3A which were important for drought tolerance, QDH.ndsu.3A.1 and QDH.ndsu.3A.2, could represent the QTL QHea.T84-3A which was earlier found to increase DH under both drought and non-drought conditions (Ibrahim et al., 2012b). Chromosomal arm 3AL also harbors a gene for earliness per se (Edae et al., 2014), associated with enhanced response to abscisic acid (ERA1), which provides drought tolerance (Edae et al., 2014). The gene ERA1, also located on chromosome 3B, could represent the QTL QDH.ndsu.3B identified in this study which is closely associated with TraesCS3B01G356000 (Table 5), putatively encoding for inositol-1,4,5-trisphosphate 5-phosphatase (InsP 3 ). InsP 3 is reported to be a second messenger in plants responding to many stimuli and has been shown to affect drought tolerance, carbohydrate metabolism, and phosphate-sensitive biomass increases in tomato (Khodakovskaya et al., 2010). In wheat, differential expression of the phospholipase C gene regulating the inositol-1,4,5-triphosphate (IP 3 ) signal transduction pathway possibly results in the quick sensing of drought stress . Kamran et al. (2013) identified a QTL, QFlt.dms-4A.1, for reduced DH at 61.2 cM on chromosome 4A, which may represent the constitutive QTL QDH.ndsu.4A.1 identified in this study. The constitutive QTL QDH.ndsu.2B.2 is located in the same region as the earlier reported QTL QCrs- (Ibrahim et al., 2012a), which deteriorate the number of root crossing in both water regimes. A QTL for drought tolerance on 4AL reported earlier by Alexander et al. (2012) may represent the QTL QDH.ndsu.4A.2, which was identified exclusively for drought-prone environments in this study. The constitutive QTL QDH.ndsu.6B was located in the same genomic location as the QTL QHea+, which was reported to reduce DH under both water conditions (Ibrahim et al., 2012b). Huang et al. (2006) reported a QTL for days to maturity, QDtm.crc-2D, which corresponded with the constitutive QTL in this study, QDH.ndsu.2D, representing the kinase family protein. However, the SNP markers associated with sucrose-phosphate synthase (TraesCS3A01G425500) and vacuolar protein sorting-associated protein (TraesCS5A01G259200, TraesCS3A01G317000) in this group (Table 5) may represent the important abiotic stress genes controlling the plant height. The QTL QDH.ndsu.5B.2 and QDH.ndsu.7B identified in this study seem to be novel.

Plant Height
The QTL QPH.ndsu.5B could represent the ortholog to the GA-insensitive dwarf gene, GID1L2, in rice, indicating the synergistic relationship of rice and wheat (Zanke et al., 2014). The major QTL for PH, QPH.ndsu.6B.1 and QPH.ndsu.6B.2, have been identified on wheat chromosome 6B, and several previous studies also reported QTL for PH on a similar location (Zanke et al., 2014;Gahlaut et al., 2019;Abou-Elwafa and Shehzad, 2021). The major QTL QPH.ndsu.4B could represent the reduced height gene Rht-B1 (Wilhelm, 2011), which was reported to be on the short arm of chromosome 4B. This gene encodes the DELLA protein that reduces a plant's sensitivity to gibberellin (GA), thereby reducing stalk length and making the plant semi-dwarf.    in a receptor-like kinase gene (TraesCS7A01G540200) and QPH.ndsu.3D.1 in a subtilisin-like protease (  Edae et al. (2014). Also, the QTL QYL.ndsu.5B corresponded with the QTL QYld * , which was reported to improve YLD under drought stress (Ibrahim et al., 2012b). The QTL QYL.ndsu.1B.2 had the same genomic location as the constitutive QTL for the green leaf area reported by Edae et al. (2014). Ibrahim et al. (2012a) reported a QTL, QTgw+, which improved thousandgrain weight under both water conditions and could represent the QTL QYL.ndsu.1D. The QTL QYL.ndsu.2A likely coincides with the YLD QTL QGY.caas-2A (Li et al., 2015) or a QTL identified by Mathew et al. (2019) evaluated under drought stress conditions. Huang et al. (2006) identified the QTL QTgw.crc-6A for TKW that seems to be present at the same location as the QTL QYL.ndsu.6A identified in this study. The QTL QYL.ndsu.7D corresponded with the QTL QHi+, which was reported to improve the harvest index under both water conditions (Ibrahim et al., 2012b).

CONCLUSIONS
This study revealed 69 QTL, which included 50 constitutive QTL, three QTL identified for the control water regime, and 16 QTL exclusively under the drought conditions (Table 4; Supplementary Figure 2). These 16 QTL could be used for developing lines suitable for drought conditions. We also reported the QTL-associated genes, their physical positions, and predicted functions along with in silico expression prediction in abiotic stress conditions (Supplementary Figure 3). Chromosome 5B, 6B, and 4B seem to be very important for drought tolerance by reducing PH and increasing YLD and YLD-related traits. Several identified QTL occupied genomic regions reported earlier for earliness per se, drought tolerance, and reduced height. The consistency of some QTL in the different environments indicated their validity. Overall, this study provides valuable genetic and genomic resources to the breeders to design programs to breed drought-tolerant wheat cultivars, combining traditional and genomics-based approaches.

DATA AVAILABILITY STATEMENT
The list of markers and their genotypic data is available at figshare, doi: 10.6084/m9.figshare.14195348 and also in the Supplementary Table 3.

AUTHOR CONTRIBUTIONS
SR: data collection, analysis, and major write-up of the manuscript. SM: data collection and analysis. AK: data analyses, manuscript write-up, and review. EE, SK, and SSi: data analysis and manuscript review. MA and AM: data and manuscript review. SSo and SSa: data analysis and writeup. MM: conceptualization and design of the experiment, phenotypic data collection, and manuscript write-up and review. All authors contributed to the article and approved the submitted version.