Multi-Locus GWAS for Grain Weight-Related Traits Under Rain-Fed Conditions in Common Wheat (Triticum aestivum L.)

In wheat, a multi-locus genome-wide association study (ML-GWAS) was conducted for the four grain weight-related traits (days to anthesis, grain filling duration, grain number per ear, and grain weight per ear) using data recorded under irrigated (IR) and rain-fed (RF) conditions. Seven stress-related indices were estimated for these four traits: (i) drought resistance index (DI), (ii) geometric mean productivity (GMP), (iii) mean productivity index (MPI), (iv) relative drought index (RDI), (v) stress tolerance index (STI), (vi) yield index, and (vii) yield stability index (YSI). The association panel consisted of a core collection of 320 spring wheat accessions representing 28 countries. The panel was genotyped using 9,627 single nucleotide polymorphisms (SNPs). The genome-wide association (GWA) analysis provided 30 significant marker-trait associations (MTAs), distributed as follows: (i) IR (15 MTAs), (ii) RF (14 MTAs), and (iii) IR+RF (1 MTA). In addition, 153 MTAs were available for the seven stress-related indices. Five MTAs co-localized with previously reported QTLs/MTAs. Candidate genes (CGs) associated with different MTAs were also worked out. Gene ontology (GO) analysis and expression analysis together allowed the selection of the two CGs, which may be involved in response to drought stress. These two CGs included: TraesCS1A02G331000 encoding RNA helicase and TraesCS4B02G051200 encoding microtubule-associated protein 65. The results supplemented the current knowledge on genetics for drought tolerance in wheat. The results may also be used for future wheat breeding programs to develop drought-tolerant wheat cultivars.


INTRODUCTION
Wheat (Triticum aestivum L.) is one of the major cereal crops and contributes about 30% (760 million tons) of world grain production (Food Agriculture Organization of the United Nations, 2021). However, the rate of increase in annual wheat production has recently decreased from 3% during the 1970's and 1980's to 0.9% in recent years causing concern. This rate must increase to ∼2% to meet the projected demand of 50-60% additional wheat by 2050 (Ray et al., 2013). This may be challenging owing to a variety of abiotic stresses that impact wheat yield. Among the abiotic stresses, drought is the most important causing yield losses of up to 40% (Zampieri et al., 2017). Recently, an assessment by the UN's Intergovernmental Panel on Climate Change (IPCC) predicted that the global surface temperature would increase by 1.5 • C in the next 20 years (by 2040), causing extreme drought in several wheat growing regions including South Asia (IPCC, 2021) which inhabits one-fourth of the global population and where wheat is a lifeline for millions. In most parts of India, wheat is already being grown under restricted (one or two) irrigation (Joshi et al., 2007a). Current trends indicate that such regions may further expand due to climate change (Joshi et al., 2007b;Kumar et al., 2012). Therefore, understanding the genetic systems that provide tolerance to drought stress is a priority for wheat breeding to sustain wheat production and productivity.
Only two ML-GWAS are available for wheat, where MTAs were reported for agronomic traits across drought stress environments Li et al., 2019). The present study is a continuation of one of these studies , where 19 MTAs associated with yield and related traits under drought were reported. The other study involved the use of 277 wheat accessions leading to the identification of an important QTL on chromosome 6A for grain yield under drought (Li et al., 2019). We speculate that these two studies did not exhaust the possibility of identification of all possible MTAs, thus warranting further studies to facilitate the identification of more robust MTAs for marker-assisted selection (MAS). Therefore a ML-GWAS was conducted using an association panel of 320 spring wheat accessions to detect novel MTAs and important CGs for drought stress tolerance.  2011-12 and 2012-13, respectively; thus providing following four environments: Meerut IR (E1), Meerut RF (E2), Powarkheda IR (E3), and Powarkheda RF (E4). Meerut and Powarkheda fall under mega environment (ME)-1 and (ME)-5 respectively, and the soil types of these two locations are deep clay soil and deep loam soil, respectively. At each location, evaluation of WAM panel under both IR and RF conditions minimized the effect of differences in soil types and other environmental conditions on the phenotypic variations at the two locations.

Plant Material and Field Experiment
The panel was grown in a simple lattice design with two replications. Each plot, representing an individual genotype, consisted of three 150-cm-long rows with row to row distance of 25 cm. Five irrigations were given under IR condition, while only one irrigation (for sustaining the crop) was given under RF condition (21 days after sowing) to ensure water stress. The details of the experimental locations, sowing dates, meteorological data, and rainfall data are summarized in Supplementary Table 1.

Phenotypic Evaluation and Data Analysis
Phenotypic data were recorded for the following four traits: (i) days to anthesis (DTA): recorded as the number of days from planting until anthers in 70% of the spike in a plot had extruded, (ii) grain filling duration (GFD); difference between the DTA and days to maturity (DTM) which was number of days from planting until 70% of the spikes in each plot had turned yellow), (iii) grain number per ear (GNPE); average number of grains per ear using five representative ears per plot, and (iv) grain weight per ear (GWPE); average weight (g) using five ears per plot.
Statistical analysis for all the four traits was conducted using software SPSS 17.0 (http://www.spss.com) to obtain values of range, mean, standard deviation, coefficient of variation (CV), and analysis of variance (ANOVA). To determine the normal distribution, skewness and kurtosis were also analyzed. Correlations between all pairs of traits were obtained separately for IR and RF conditions.
Genotyping, Population Structure, and LD SNP data was generated using the genotyping-by-sequencing (GBS) approach developed by DArT Pty. Ltd., Yarralumla, Australia. The details of the methodology are described by Sehgal et al. (2015). Population structure and linkage disequilibrium (LD) information were available from our earlier study . Briefly, 9627 SNPs with missing data (<30%) and minor allele frequency (MAF) >5% were used for genotyping. These SNPs were randomly distributed across all the 21 wheat chromosomes spanning 5943.1 cM with an average of 17 SNPs per 10 cM genetic distance. Model-based cluster analysis was performed for population structure using Software STRUCTURE version 2.2 (Pritchard et al., 2000) assuming number of subpopulations (K) to range from 2 to 20, and burnin and Markov Chain Monte-Carlo (MCMC) iteration were set to 50,000 and 100,000, respectively. The actual number of subpopulations was determined using the tool "STRUCTURE Harvestor" following delta K ( K) method (Evanno et al., 2005). LD was estimated using software TASSEL v. 5.0 (Bradbury et al., 2007); mean LD values were obtained for the whole genome as well as for individual chromosomes.

Multi-Locus Genome Wide Association Mapping
SNPs with no more than 30% missing data and >5% minor allele frequency were utilized for GWAS. Principle component analysis (PCA) was conducted using TASSEL v5.0, and first three components were incorporated as a covariate in association test model. Fixed and random model Circulating Probability Unification (FarmCPU), developed by Liu et al. (2016) was used for ML-GWAS. This method is believed to be the most efficient and eliminates confounding issues arising due to population structure, kinship, multiple testing, etc. The method utilizes both Fixed Effect Model (FEM) and a Random Effect Model (REM), iteratively. FEM allows identification of associated markers (MTAs) that are described as pseudo-quantitative trait nucleotides (pseudo-QTNs) and are used as covariates, in REM, which allows identification of QTNs; in this study, QTNs are described as MTAs. Bonferroni-correction was built in within FarmCPU, so that a default P-value threshold (0.01) was used to declare significant MTAs. Quantile-quantile (Q-Q) plots generated through FarmCPU were used to examine model fitting (account for population structure). Phenotypic differences between the two alleles of SNPs identified as MTA were tested using "t-test" analysis.

Joint Effect of MTAs and Identification of Contrasting Genotypes
Wherever more than two SNPs were associated with the same trait, joint effects were estimated. This was done through linear regression performed using all desirable SNP alleles for the trait (independent variable) and corresponding trait values of the genotypes that contained more than one desirable SNP alleles (dependent variable). Contrasting genotypes were identified using phenotypic values for all the four traits under IR and RF conditions. For this purpose, average sum of ranks (ASR) of seven indices was calculated for each genotype. In the case of DTA, genotypes with higher ASR were considered superior, while for GFD, GNPE, and GWPE, genotypes with lower ASR were considered desirable.

Identification of CGs and Expression Analysis
CGs for the associated SNPs were identified by aligning the associated GBS sequences to wheat genome assembly IWGSC refSeq v1.1 available in the Ensemble database (http:// www.ensembl.org/info/docs/tools/vep/index.html). The GO annotation (including molecular function and biological process) of each CG was extracted from the IWGSC website (http://www.wheatgenome.org/). Information about expression of CGs was collected using the online tool Genevestigator (Hruz et al., 2008).

Phenotypic Variation and Effect of Water Stress on Grain Weight-Related Traits
The WAM panel exhibited wide range of phenotypic variation for each of the four traits, under both IR and RF conditions at each of the two locations ( Figure 1A; Supplementary Table 2). Skewness and kurtosis were found within the range of normal distribution (i.e., ±2.0) for all the traits under all environmental conditions. The only exception was GFD in E4, where kurtosis was observed with a value of 2.2. It is also apparent that each of the four traits was adversely affected under RF condition at both the locations (Meerut and Powarkheda; Figure 1A; Supplementary Table 2).
ANOVA suggested that genotypes differed for the four traits. The performance of genotypes differed under IR and RF and at the two locations (Meerut and Powarkheda) for each of the traits. Genotype × location interactions were significant for DTA and GNPE, while genotype × environment interactions (IR vs. RF) were significant for DTA and GFD (Table 1). Correlations were largely significant under IR and RF conditions at both the locations, except those between GNPE and GFD under RF at Powarkheda ( Figure 1B).

Population Structure and LD
The details about population structure and LD are available in our earlier study . In brief, three sub-populations involving 157 genotypes were recognized; the remaining 163 genotypes were admixed. The number of genotypes was 57 in sub-population I, 85 in sub-population II and 15 in sub-population III. Genome-wide LD decay was observed at 3 cM with a range of 2-20 cM in different genomic regions.

Marker-Trait Association
A total of 30 high confidence MTAs were detected for the four traits under one or more environmental conditions. These MTAs involved 27 SNPs distributed over 15 chromosomes (excluding 1D, 4D, 5A, 5D, 7B, and 7D) (   Figure 2). Seven SNPs were associated with GFD, eight with GNPE, and six with GWPE. Effect sizes for associated SNPs were also estimated for individual traits ( Table 2). Based on the effect size, desirable alleles of associated SNPs were also selected ( Table 2). Among the four traits, positive selection appeared desirable for GFD, GNPE, and GWPE, and negative selection for DTA. Higher absolute value of SNP effect size showed higher contribution of SNP on the phenotype. Frequency of desirable alleles ranged from 0.10 to 0.94. Two SNPs exhibited pleiotropic effect. SNP_404 was associated with DTA as well as GFD in E1. Similarly, SNP_1555 was associated with GNPE and GWPE in E3.

Joint Effect of Significant SNPs on Associated Phenotypes
Joint effect of desirable alleles of multiple associated SNPs was determined using linear regression. For DTA, nine SNPs were each associated with the trait in one or more environments. An increase in the number of desirable SNP alleles from four to nine (but not less than four) led to a significant decrease in DTA (Figure 3). Interestingly, significant joint effect of nine SNPs on DTA was observed in all the four environments; however, strength of regression varied across environments and ranged from 0.33 (E4) to 0.50 (E1).
Significant joint effects of associated SNPs were also observed for GFD, GNPE, and GWPE. Trait values for each of these three traits increased with an increase in the number of desirable alleles in all the four environments (Figure 3). The regression coefficients ranged from 0.08-0.21 for GFD, 0.08-0.15 for GNPE, and 0.07-0.36 for GWPE.

Analyses of Trait Indices
Seven different stress-related indices were obtained for each of the four traits to better assess the genetics of drought tolerance at both locations (total of 28 index traits); this allowed the identification of 153 MTAs involving 85 SNPs (Supplementary Table 3). Manhattan and Q-Q plots showing appropriate model fitting for ML-GWAS tests are shown in Supplementary Figure 3. These SNPs were distributed over 18 of the 21 wheat chromosomes (except 4D, 5D, and 7D). A comparison of MTAs for 28 indices, including seven indices for each of the four main traits allowed the identification of 19 common SNPs associated with response to water stress ( Table 2;  Supplementary Table 3). As many as 19 SNPs for DTAs, 13 SNPs for GFD, 34 SNPs for GNPE and 28 SNPs for GWPE showed significant association with one or more indices (Supplementary Table 3).

Contrasting Genotypes for Molecular Breeding Programs
Using ASR values of indices, two contrasting genotypes were selected, which included the superior genotype TX181, and the inferior genotype TX67 (Table 3). TX181 performed better under RF condition for all the four traits ( Table 3). These two genotypes can be used for further studies involving crosses generating segregating populations for fine-mapping of QTLs leading to cloning.
The pattern of decline in trait values in RF condition was also examined to assess the sensitivity of these two genotypes to water stress. For this purpose, per cent decline in trait value under RF was examined. Interestingly, in case of TX181, the reduction in trait value under RF condition was relatively low [GFD, 5.11%; GNPE, 20.39%; and GWPE, 17.28%], when compared with those for TX67 [GFD, 10.62%; GNPE, 47.45%, GWPE, 56.45%]. In case of DTA, where lower value is desirable, larger decline was observed in TX181 (4.16%) relative to TX67 (1.24%) under RF condition. However, the DTA in TX67 was about one month longer than TX181. These observations revealed that TX181 is less sensitive (more tolerant) and TX67 is more sensitive under water stress conditions. We may therefore conclude that these varieties differ widely not only for absolute trait values, but more importantly, for tolerance to drought stress.
The above contrasting genotypes were also examined for the presence of desirable alleles of significant SNPs, assuming that desirable alleles for all SNPs are unlikely to be concentrated in TX181; similarly, undesirable alleles for all SNPs cannot be present in TX67. For some of the SNPs, the two genotypes may not differ. Interestingly, out of 27 significantly associated SNPs for four traits, desirable alleles of 24 SNPs were present in TX181. For one SNP, an undesirable allele was found and for two SNPs genotypic data were missing in TX181. In contrast, in TX67, undesirable alleles were present for 14 SNPs, desirables for 9 SNPs, and for the remaining four SNPs, genotypic data were missing ( Table 3).

Candidate Genes (CGs) Co-Localized With Associated SNPs
The 27 SNPs involved in 30 MTAs (as mentioned above) were used to identify candidate genes (CGs). As many as 10 of the 27 SNPs were co-located within protein-coding genes and were therefore treated as putative CGs. Details of CGs and their corresponding annotation information are provided in Table 4. Eight of these 10 CGs represent those MTAs that were detected either in IR or RF environments; two MTAs were detected in both the environments. GO annotations of the CGs showed  their involvement in protein binding, innate immune response, microtubule cytoskeleton organization, protein self-association, protein phosphorylation and protein folding ( Table 4). Gene expression analysis for the 10 CGs is shown in Supplementary Figures 4-6. This analysis provides further support to their potential involvement in the trait phenotype in different wheat developmental stages and tissues under drought/water stress conditions. The results showed a wide range of expression. For instance, following four CGs had relatively higher expression in all wheat tissues/organ: TraesCS1A02G331000, TraesCS4B02G329500, TraesCS7A02G133300, and TraesCS7A02G176600. Some CGs had tissue-specific expression; for example, TraesCS6D02G394600 expresses in leaf tissues, TraesCS4B02G051200 in root tissues, Traes3B02G596100 in rachis while, TraesCS2D02G574400 in the shoots (Supplementary Figure 4). The CGs also showed varied expression during the different wheat development stages. Interestingly, most CGs had high expression during anthesis to ripening stages except one (Traes3B02G596100), demonstrating their possible role in regulating wheat yield (Supplementary Figure 5).
Under drought/water stress condition, only five of the 10 CGs showed differential expression (≥2-fold), either down-regulated or up-regulated (Supplementary Figure 6). For instance, TraesCS7A02G133300 (up to ∼2-fold), TraesCS4B02G051200 (up to ∼6-fold), and TraesCS1A02G331000 (up to ∼2-fold) were down-regulated. Similarly, TraesCS4B02G329500 (up to ∼2-fold) and TraesCS4A02G019800 (up to ∼4-fold) were up-regulated during drought stress. Interestingly, out of these five, one CG (TraesCS4B02G051200) that encodes a microtubule-associated protein (MAP65), was associated with MTAs that were identified only in RF environment. Another CG (TraesCS1A02G331000) that encodes RNA helicase protein, belonged to MTAs that were identified in both IR and RF environments (Figure 4).

DISCUSSION
In the most major crops including wheat, drought tolerance is a complex polygenic trait involving a large number of minor quantitative trait loci (QTLs; Bernardo, 2008;Gupta et al., 2017) and only a few major QTLs (Bernardo, 2008;Gupta et al., 2012Gupta et al., , 2017. A large number of traits (>40) have been utilized to estimate drought tolerance (our unpublished results). In a recent study on meta QTL analysis for drought tolerance, as many as 340 QTLs identified through at least 14 interval mapping studies have been utilized . As many as 750 MTAs, were also identified using GWAS (Gupta et al., 2017(Gupta et al., , 2020Kumar et al., 2020;Singh et al., 2021). It also seems that the QTLs and MTAs identified so far do not represent the entire genetic variation for a multitude of traits that are involved in providing drought tolerance. It is also known that despite this enormous literature, very few QTLs have been utilized in molecular breeding and pyramiding, and that none of them cloned so far in wheat (Ray et al., 2013;Merchuk-Ovnat et al., 2016;Gautam et al., 2021). Therefore, one would expect that every new study  leads to identification of some novel QTLs and MTAs. Perhaps, metaQTLs and ortho-metaQTLs identified recently and to be identified in future (our unpublished results), may lead to a more fruitful utilization of molecular markers for MAS leading to the development of drought tolerant wheat cultivars.
In the present study, 30 MTAs were detected and compared with earlier studies. Five MTAs (all identified under RF conditions) were co-localized with QTL/MTAs identified earlier using either linkage mapping or LD-based GWAS ( Table 5). We assume that the remaining 25 MTAs are novel. The five co-localized MTAs listed in Table 5, include the following: (i) SNP for DTA on 4B co-localized with a QTL for DTH and GY (QDH.ndsu.4B; Rabbi et al., 2021). (ii) SNP for DTA on 2B co-localized with a SNP for DTH . (iii) SNP for GFD on 5B, co-localized with a QTL (QGfd.ccsu-5B; 40.6-53.4 cM) for GFD . (iv) SNP for GFD on 7A (145.43 cM) co-localized with a marker (wsnp_CAP7_c1321_664478∼IACX7848) associated with DTA and GY (Qaseem et al., 2018). (v) SNP for GWPE on 6B, colocalized with MTAs/QTL for thousand grain weight (TGW) and grain yield, identified in three earlier studies (Mathews et al., 2008;Ahmed et al., 2020;Rabbi et al., 2021). The markers identified in these three earlier studies include a SSR marker gwm132 on 6B (67.10 cM) associated with GY under DS (Mathews et al., 2008). The other two co-localized genomic region at 64.82 cM (QTKW.ndsu.6B) and at 67.24 cM (BS00063801_51) were associated with TGW under DS (Ahmed et al., 2020;Rabbi et al., 2021). These QTLs/MTAs can be utilized for MAS with higher level of confidence. The above genomic regions associated with DTA, GFD, GNPE, and GWPE under drought stress identified in the present study could also be exploited for fine mapping.
The effect size of individual associated SNPs on an individual trait and the association of same SNP with more than one trait also deserve attention. For 25 of 30 MTAs, the proportion of genotypes with desirable allele was higher relative to that of undesirable effect. This could be due to unconscious selection for these desirable alleles during wheat breeding, and those may be utilized in future breeding as well. More important are the SNPs, which have desirable alleles in just a few genotypes, which therefore deserve priority in future breeding. For instance, SNP_1555 located on chromosome 2A that was associated with GNPE having highest SNP effect size (3.39) had a frequency of desirable allele 0.13; additionally, this SNP was also associated with GWPE and showed a similar pattern. This SNP, therefore, appears promising for future wheat breeding efforts.
Interestingly, some SNPs also showed pleiotropic effect, each showing association with two correlated traits. However, no SNP was available to be associated with more than two traits. The two pleiotropic SNPs included, SNP_404 (DTA and GFD) and SNP_1555 (GNPE and GWPE) and deserve further attention in future studies.
The relative merit of MTAs for the seven indices relative to those for individual traits also deserves attention, since indices have been designed to estimate tolerance to drought. SNPs associated with more than one index traits appear to be relatively important since they provide more comprehensive information about the response of genomic regions toward drought stress. Based on seven different drought stress-related indices, we also identified contrasting genotypes for response to drought (Table 3). These contrasting genotypes may be used for fine mapping and to develop improved wheat lines via molecular breeding. Significant joint effect of multiple associated SNPs was observed where genotypes with desirable alleles for many more associated SNPs showed superior phenotype than those having desirable alleles for fewer SNPs (Figure 3); the trait value can be substantially improved through pyramiding of multiple significant SNPs. Sometimes pyramiding of a large number of SNPs becomes problematic due to the requirement of a larger population. In such cases, marker-assisted recurrent selection (MARS) may be followed.
Among the 10 CGs identified using MTAs, two CGs seem to be involved in response to drought stress and, therefore, deserve special attention. The first of these two genes, namely TraesCS4B02G051200 encodes a microtubule-associated protein (MAP65) and the second gene, namely TraesCS1A02G331000 encodes RNA helicase protein. Both these proteins respond to drought stress and therefore their expression level may be used for measuring the level of drought stress. However, TraesCS4B02G051200 that was associated with DTA and was detected using MTA only in the RF environment, while TraesCS1A02G331000 was associated with MTAs detected under both IR and RF conditions. The proteins of MAP65 family are known to be involved in the polymerization of the microtubules (Hamada, 2014) and are indirectly involved in regulating growth and response to abiotic stresses including drought in plants (Zhang et al., 2012;Bhaskara et al., 2017). In our study, the expression of TraesCS4B02G051200 (wheat MAP65) decreased up to ∼6-fold after drought stress ( Figure 4B). This means that this gene can be utilized as an indicator of drought stress, both in wheat breeding and in strategic research.
RNA helicase proteins are multifunctional and are involved in responses to both biotic and abiotic stresses in plants (Pandey et al., 2020). For instance, RNA helicase belonging to Arabidopsis RH8 DEAD-box regulates the ABA-signaling pathway by interacting with 2C protein phosphatase (PP2CA), which also plays a vital role in drought tolerance (Baek et al., 2018). Significant variation in the expression of wheat RNA helicase during drought stress was also observed in the present study ( Figure 4A), suggesting that this gene may also be used for the estimation of drought stress and for improving drought tolerance. Therefore, we hypothesized that these two CGs together may provide drought resilience in wheat. Further investigation involving functional analyses of these two genes may also help in understanding the molecular mechanism of abiotic stress tolerance in crops.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

AUTHOR CONTRIBUTIONS
PG and HB: conceptualization. VG and VJ: methodology and formal analysis. VJ: data curation. VG, VJ, and AJ: writing-original draft preparation. PG, HB, VJ, and VG: writing-review and editing. PG, HB, and AJ: supervision. All authors have read and agreed to the published version of the manuscript.

ACKNOWLEDGMENTS
VG and VJ acknowledge the DST-INSPIRE Faculty Awards received from Department of Science and Technology, Ministry of Science and Technology, India. PG and HB were each awarded Senior Scientist positions by Indian National Science Academy (INSA), New Delhi. Partial financial support received from Indian Council of Agriculture Research (ICAR), India is also gratefully acknowledged. CIMMYT, Mexico provided the seed material and the genotypic data for the present research. We also thank the Dr. P. C. Mishra at Zonal Agricultural Research Stations, Powarkheda for the field experiments in Powarkheda, M.P, India.