The Haplotype-Based Analysis of Aegilops tauschii Introgression Into Hard Red Winter Wheat and Its Impact on Productivity Traits

The introgression from wild relatives have a great potential to broaden the availability of beneficial allelic diversity for crop improvement in breeding programs. Here, we assessed the impact of the introgression from 21 diverse accessions of Aegilops tauschii, the diploid ancestor of the wheat D genome, into 6 hard red winter wheat cultivars on yield and yield component traits. We used 5.2 million imputed D genome SNPs identified by the whole-genome sequencing of parental lines and the sequence-based genotyping of introgression population, including 351 BC1F3:5 lines. Phenotyping data collected from the irrigated and non-irrigated field trials revealed that up to 23% of the introgression lines (ILs) produce more grain than the parents and check cultivars. Based on 16 yield stability statistics, the yield of 12 ILs (3.4%) was stable across treatments, years, and locations; 5 of these lines were also high yielding lines, producing 9.8% more grain than the average yield of check cultivars. The most significant SNP- and haplotype-trait associations were identified on chromosome arms 2DS and 6DL for the spikelet number per spike (SNS), on chromosome arms 2DS, 3DS, 5DS, and 7DS for grain length (GL) and on chromosome arms 1DL, 2DS, 6DL, and 7DS for grain width (GW). The introgression of haplotypes from A. tauschii parents was associated with an increase in SNS, which was positively correlated with a heading date (HD), whereas the haplotypes from hexaploid wheat parents were associated with an increase in GW. We show that the haplotypes on 2DS associated with an increase in the spikelet number and HD are linked with multiple introgressed alleles of Ppd-D1 identified by the whole-genome sequencing of A. tauschii parents. Meanwhile, some introgressed haplotypes exhibited significant pleiotropic effects with the direction of effects on the yield component traits being largely consistent with the previously reported trade-offs, there were haplotype combinations associated with the positive trends in yield. The characterized repertoire of the introgressed haplotypes derived from A. tauschii accessions with the combined positive effects on yield and yield component traits in elite germplasm provides a valuable source of alleles for improving the productivity of winter wheat by optimizing the contribution of component traits to yield.

The introgression from wild relatives have a great potential to broaden the availability of beneficial allelic diversity for crop improvement in breeding programs. Here, we assessed the impact of the introgression from 21 diverse accessions of Aegilops tauschii, the diploid ancestor of the wheat D genome, into 6 hard red winter wheat cultivars on yield and yield component traits. We used 5.2 million imputed D genome SNPs identified by the whole-genome sequencing of parental lines and the sequence-based genotyping of introgression population, including 351 BC 1 F 3 : 5 lines. Phenotyping data collected from the irrigated and non-irrigated field trials revealed that up to 23% of the introgression lines (ILs) produce more grain than the parents and check cultivars. Based on 16 yield stability statistics, the yield of 12 ILs (3.4%) was stable across treatments, years, and locations; 5 of these lines were also high yielding lines, producing 9.8% more grain than the average yield of check cultivars. The most significant SNP-and haplotype-trait associations were identified on chromosome arms 2DS and 6DL for the spikelet number per spike (SNS), on chromosome arms 2DS, 3DS, 5DS, and 7DS for grain length (GL) and on chromosome arms 1DL, 2DS, 6DL, and 7DS for grain width (GW). The introgression of haplotypes from A. tauschii parents was associated with an increase in SNS, which was positively correlated with a heading date (HD), whereas the haplotypes from hexaploid wheat parents were associated with an increase in GW. We show that the haplotypes on 2DS associated with an increase in the spikelet number and HD are linked with multiple introgressed alleles of Ppd-D1 identified by the whole-genome sequencing of A. tauschii parents. Meanwhile, some introgressed haplotypes exhibited significant pleiotropic effects with the direction of effects on the yield component traits being largely consistent with the previously reported trade-offs, there were haplotype combinations associated with the positive trends in yield. The characterized repertoire of the introgressed haplotypes derived from A. tauschii accessions with the combined positive effects on yield and yield component traits in elite germplasm provides a valuable source of alleles for improving the productivity of winter wheat by optimizing the contribution of component traits to yield.

INTRODUCTION
The gap between population expansion and food production is increasing due to marginal improvements in crop yield, which is attributed to a decline in soil fertility, pests and diseases, and climate change (Bailey-Serres et al., 2019). Wild relatives of wheat are a rich source of novel underutilized allelic diversity with a great potential to improve cultivated wheat through introgression (Placido et al., 2013;Zhang et al., 2017;Hao et al., 2020). The introgression from wild relatives into elite wheat cultivars was reported to increase pest and disease resistance (Periyannan et al., 2013;Saintenac et al., 2013), improve resilience toward environmental stress (Peleg et al., 2005;Placido et al., 2013), and increase yield (Pasquariello et al., 2020). The success of introgression breeding, however, could be affected by the negative epistasis between multiple alleles of wild and cultivated wheat (Nyine et al., 2020), especially in the low recombination regions of chromosomes, where the linkage with the negatively selected alleles could reduce the efficiency of selection for beneficial variants (Hill and Robertson, 1966).
Introgression could exhibit pleotropic effects, affecting multiple, often unrelated traits not directly targeted by selection. For example, the introgression from Aegilops ventricosa into the chromosome 7D of wheat was associated with an increase in grain protein content and resistance to eyespot at the expense of reduced yield (Pasquariello et al., 2020). In durum wheat, the introgression of the GNI-A1 gene from wild emmer increased grain weight by suppressing the fertility of distal florets, resulting in a negative correlation between grain number (GN) and grain weight (Golan et al., 2019). The introgression from Agropyron elongatum into the 7DL chromosome arm of wheat that is known to confer leaf rust resistance (Lr19) (Wang and Zhang, 1996) also influences root development, resulting in an improved adaption to water stress (Placido et al., 2013) and salinity (Dvorák et al., 1988), and increased biomass (BM) and yield (Reynolds et al., 2001).
Crop yield is a complex trait determined by many component traits, such as 1,000 grain weight (TGW), GN per spike, spikes per unit area, grain width (GW), area, length, etc. (Del Moral et al., 2003;Zhang et al., 2018;Du et al., 2020). Previous studies have shown that the introgression from wild and close relatives improve the yield of hexaploid wheat by changing yield component traits through the pleotropic interaction between introgressed and background alleles of the hexaploid wheat (Jones et al., 2020). Significant trade-offs between yield, yield components, and yield stability have been reported in wheat. A study by Pennacchi et al. (2019) showed that yield and yield stability have a negative linear relationship in most cases. Other factors such as HD, plant height (PH), and BM influence the source-sink ratio, which in turn affects the harvest index leading to a variation in yield and yield stability (Reynolds et al., 2001(Reynolds et al., , 2020. Balancing the trade-off between yield components is therefore necessary to improve yield, maximize the yield potential, and improve the yield stability in wheat. Sequence-based genotyping generates high-density SNP marker data that could be used to accurately detect the boundaries of genomic segments introgressed from wild relatives (Kuzay et al., 2019;Nyine et al., 2020), providing a unique opportunity to investigate the effect of introgression on tradeoff between the traits affecting total yield. Even though wholegenome sequencing became less expensive, it is still not within the cost range that would allow wheat breeding programs to sequence large populations. Sequencing of founder lines at a high coverage depth and using these genotypes as a reference panel to impute missing and ungenotyped markers in the progeny characterized by low-coverage skim sequencing is a cost-effective alternative. The existing imputation algorithms (Browning and Browning, 2013;Swarts et al., 2014;Davies et al., 2016) provide a highly accurate whole-genome prediction of missing genotypes and were shown to increase the power of genome-wide association (GWAS) scans, thus enabling the identification of SNPs or haplotypes associated with the traits of interest (Li et al., 2010;Nyine et al., 2019). One of the advantages of the increased marker density provided by whole-genome sequencing is the ability to perform association mapping using haplotype information, which improves the detection of quantitative trait loci that would otherwise be missed when using single SNPs (Zhang et al., 2002;Lorenz et al., 2010).
Here, we investigated the impact of the introgression from A. tauschii into hard red winter wheat lines on yield and yield component traits and how haplotypes from A. tauschii at different genomic loci affect the component traits and total yield. For this purpose, we assessed the phenotypic variability of yield and the component traits, BM, and tenacious glume (Tg) traits in an introgression population derived from A. tauschii and hexaploid winter wheat phenotyped under irrigated and rainfed conditions. We applied the SNP diversity data generated by the whole-genome shotgun sequencing at 10× coverage level of the parental lines to impute genotypes in this population (Nyine et al., 2020). This strategy resulted in 5.2 million SNPs that enabled us to identify the introgressed A. tauschii haplotypes and assess their effects on the trait variation through the GWAS mapping and haplotype block effect analysis. This introgression population along with high-density SNP genotype data provides a valuable resource for the effective deployment of A. tauschii haplotypes in winter wheat improvement programs.

Plant Materials
The study population was described in detail by Nyine et al. (2020). In brief, the population was developed by crossing synthetic A. tauschii-wheat octoploid lines with recurrent hexaploid winter wheat lines. The octoploid lines were developed by crossing 5 hexaploid winter wheat lines with 21 diverse A. tauschii accessions. The resulting F 1 hybrid plants regenerated from the rescued embryos were treated with colchicine to produce synthetic octoploids (Dale et al., 2017). The octoploids were backcrossed once to the respective hexaploid wheat parents or to another wheat line. The BC 1 F 1 plants (hexaploid) were selfpollinated and advanced by a single seed descent to the BC 1 F 3 generation. Seeds from individual BC 1 F 3 plants were bulked and grown in single rows in the field at the Kansas State University, Ashland Research Farm near Manhattan, KS, USA in 2016-2017 growing season. A total of 351 lines were selected from the entire population based on the ability to produce sufficient seeds to allow for yield testing, general fitness, threshability, and phenology corresponding to the adapted wheat cultivars.

Field Phenotyping
The population was phenotyped in 2018 and 2019 at Colby (Manhattan, KS, USA), and in 2020 at (Manhattan, KS, USA). In both locations and years, an augmented design was used to establish the trials. Plots were planted using a New Holland sixrow wheat drill. Plot dimensions were 2.5 m long by 0.5 m wide and consisted of three rows with 18 cm row spacing. Starter fertilizer was applied with the seed at planting using granular 18-46-0 diammonium phosphate (DAP) at a rate of 168.1 kg/ha. Additional nitrogen was applied as a topdress in the spring using liquid 28-0-0 urea ammonium nitrate (UAN) at a rate of 67.3 kg/ha. A lateral irrigation system was used at Colby to ensure uniform germination and emergence as well as to provide additional water throughout the growing season in the irrigated treatment. Three hexaploid winter wheat lines welladapted to Kansas environments (checks) and the hexaploid wheat parents were used as controls with at least three biological replications per block. In 2018 and 2019, two complete blocks were established and one block was irrigated (COI18 and COI19, respectively), whereas the other was rainfed/non-irrigated (CO18 and CO19, respectively), simulating optimal and farmer-field growth conditions. In 2020, only one block was grown under rainfed conditions (AS20).
The population was phenotyped for yield and yield components traits, BM traits, and Tg. Agrobase software (Mulitze, 1990) was used to adjust the grain yield (GY) (bushels per acre (BPA)], for spatial variability. The MARVIN seed imaging system [GTA Sensorik GmbH, Neubrandenburg, Germany) was used to assess the grain morphometric traits such as GN per sample, TGW, grain area (GA), GW, and grain length (GL) from the two technical replicates in 2018, and one measurement in 2019 and 2020. In 2019 and 2020, data were collected for the number of spikes per square foot (SPSF) from two random points within a plot. The 1 × 1 ft square frame was dropped over two rows at least one foot away from the plot edges to avoid the border effect. In 2019, only one row within a frame was cut above the ground level for BM determination, whereas in 2020, both rows were sampled. Biomass samples were collected in paper bags and dried for at least 3 weeks at 32 • C (90 • F) before processing. We collected data on aboveground dry BM measured as the total weight of the dry sample without the bag, the number of spikes per sample (SPB), the average spikelet number per spike (SNS) from 10 random spikes, and grain weight after threshing [grain sample weight (GSW)]. During threshing, we scored samples for the presence and absence of the Tg trait depending on the threshability. Harvest index (HI) was calculated as the percentage of GSW relative to BM.
In 2020, data for HD were collected from each plot when approximately 50% of the spikes had emerged from the flag leaves. The number of days to heading were calculated as the difference between the heading and planting dates. After all the plots had completed heading, PH, in centimeters was measured on the same day from two random but representative main tillers per plot for the whole field. PH was measured as the distance from the ground surface to the first spikelet of the spike.

Whole-Genome Shotgun Sequencing of Parental Lines
Genomic DNA of 21 A. tauschii accessions and 6 hexaploid parents used to create the introgression population was extracted from the leaf tissues of 2-week-old seedlings grown in a greenhouse using the DNeasy Plant Mini Kit (Qiagen, Hilden, Germany) following the protocol of the manufacturer. The quality and concentration of the DNA were assessed by using the PicoGreen dsDNA Assay Kit (Life Technologies, Carlsbad, CA, USA).
Genomic libraries for Illumina sequencing were constructed from ∼2 µg of genomic DNA using the PCR-free Illumina protocol at the K-State Integrated Genomics Facility (IGF). The libraries were subjected to size selection using the Pippin Prep system (Sage Scientific, Beverly, MA, USA) to enrich for 400-600 bp fragments. The pooled barcoded libraries were sequenced using the NovaSeq instrument (2 × 150 bp run, S4 flow cell) at Kansas University Medical Center and NextSeq 500 (2 × 150 bp run) at IGF. The PCR-free whole-genome shotgun sequencing libraries generated from 27 parental lines ranged between 200 and 700 bp with an average of 450 bp (Supplementary Figure 1). Approximately 14 billion paired-end reads (150 bp long) were generated from the libraries with an average of 0.54 billion reads per sample (data are available at NCBI SRA database BioProject ID: PRJNA745927). The average number of reads per wheat line corresponded to ∼10× genome coverage of parental lines. The raw reads with the Phred quality score <15, minimum length <50 bp, and adaptor sequencies were filtered out using Trimmomatic v0.38-Java-1.8. The remaining filtered paired-end reads were mapped to the Chinese Spring (CS) RefSeq v1.0 (International Wheat Genome Sequencing Consortium, 2018) using the BWA-mem software v0.7.17. A total of 7.1 billion reads were aligned uniquely to the CS RefSeq v1.0.
The sam files generated by BWA-mem were converted to bam files using the SAMtools v1.10. The Picard Toolkit (http:// broadinstitute.github.io/picard) was used to merge bam files from different lanes and sequencers into one bam file per sample. Reads that were aligned to multiple locations within the genome were identified and removed by SAMtools v1.10. The Picard Toolkit was used to prepare the merged unique aligned read bam files for GATK (McKenna et al., 2010) analysis. The preparatory steps included sorting, adding read groups, marking, and removing duplicate reads. The output deduplicated bam files were realigned around INDELs using GATK v3.7 and recalibrated with 90K SNPs (Wang et al., 2014) mapped to CS RefSeq v1.0 as the reference coordinates. The bam files were split into chromosome parts and indexed to reduce the memory and time required to process the files. GATK v3.7 was used to generate the genome variant call format (gvcf) files for each chromosome part. The gvcf files were split into 100 Mb chromosome windows and stored as genomicsDB using the genomicsDBImport tool in GATK v4.0. Joint genotyping of variants from each database was done using GATK v4.0 HaplotypeCaller. The flag <-allow-oldrms-mapping-quality-annotation-data> was set to enable the processing of gvcf files generated by GATK v3.7. All vcf files corresponding to the A, B, and D genomes were concatenated with concat, a Perl-based utility in vcftools v0.1.13. A custom Perl script was used to convert the CS RefSeq v1.0 part coordinates in the concatenated vcf to full coordinates after which vcf-filter tools v0.1.13 were used to remove INDELs, multi-allelic loci, sites with missing data and minor allele frequency (MAF) below 0.05. The filtered SNPs were phased using the Beagle software v4.1 (Browning and Browning, 2013).
The GATK v4.0 HaplotypeCaller identified ∼99 million variants including SNPs and INDELs from reads uniquely aligned to the D genome of CS RefSeq v1.0. After excluding INDELs, multi-allelic loci, sites with missing data, and MAF <0.05, 20 million SNPs were retained. These were used to impute missing and ungenotyped SNPs in the D genome of the introgression population.

Genotype Imputation
Sequence-based genotyping of the introgression population was performed previously by Nyine et al. (2020). SNPs with MAF <0.01 were excluded from the raw vcf file using vcffilter tools v0.1.13. The program conform-gt (https://faculty. washington.edu/ browning/conform-gt.html) was used to check the concordance of D genome SNP positions between the introgression population and the SNPs from the parental lines genotyped by whole-genome shotgun sequencing. Missing and ungenotyped SNPs in the D genome of the introgression population were imputed from the parental lines using Beagle v.5.0. A custom Perl script was used to filter out the imputed SNPs with genotype probability below 0.7 and MAF <0.05, which resulted in 5.2 million SNPs. The filtered SNPs were used in the downstream analyses such as GWAS mapping and identification of the introgressed haplotype blocks.

Introgressed Haplotype Detection
Genetic divergence between the parental lines affects the probability of an accurate detection of the introgressed segments from wild relatives. We used two introgression families, one created by crossing hexaploid wheat with A. tauschii ssp. strangulata (KanMark x TA1642, aka FAM93) and another one created by crossing hexaploid wheat with A. tauschii ssp. tauschii (Danby × TA2388, aka FAM97) to identify the introgressed haplotype blocks. FAM93 had 21 introgression lines (ILs), whereas FAM97 had 23 ILs. The R package HaploBlocker (Pook et al., 2019) was used to infer haplotype blocks from 5.2 million SNP sites. Recombinant inbred lines from each family were analyzed together with 21 A. tauschii and 6 hexaploid parents. The HaploBlocker parameters used in block calculation were node_min = 2 (default is 5), overlap_remove = TRUE, bp_map, and equal_remove=TRUE. The parameter node_min was used to control the number of haplotypes per node during a clustermerging step of the block calculation function of HaploBlocker. The reduction in node_min was necessary to account for the low number of haplotype variants within these families. To maintain the SNP position in the haplotype block library, a vector of SNPs was provided via the parameter bp_map and prior to block calculation, SNPs in perfect linkage disequilibrium were removed by setting the parameter equal_remove = TRUE. Overlapping haplotypes were removed by setting parameter overlap_remove = TRUE. Custom R and Perl scripts were used to calculate the haplotype block length using the information from haplotype block start and end coordinates. All monomorphic haplotypes between the two parental lines were excluded from a haplotype matrix before calculating the frequency of the introgressed haplotypes per chromosome.

Trait Stability and Heritability
Yield stability is an important trait, which reflects the productivity of the crop under various growth conditions. No single statistic, however, is accurate enough to predict it due to a high variability of genotype and genotype by environment interaction effects. In this study, we used 16 different statistics, including parametric [such as mean variance component ( , and Kang's rank-sum (Kang or KR)] and non-parametric [such as Huhn's and Nassar and Huhn's (S (1) , S (2) , S (3) , S (4) , S (5) , and S (6) ), and the methods of Thennarasu (NP (1) , NP (2) , NP (3) , and NP (4) )] to rank the ILs for yield stability based on their performance across years, locations, and treatments. The description and properties of the statistics are documented at the website: https://manzik.com/stabilitysoft/. The analysis was implemented in R using a script from Pour-Aboughadareh et al. (2019), which is available at: https://github. com/pour-aboughadareh/stabilitysoft. The most stable and/or high yielding lines were identified by sorting them according to their rankings.
Broad sense heritability (H 2 ) and best linear unbiased predictions (BLUPs) for yield and the component traits were calculated from the 2018 and 2019 data. A mixed linear model with restricted maximum likelihood implemented in R package lme4 was used to generate the variance components (var) that were used to calculate H 2 as follows: where Trt refers to irrigated and non-irrigated treatments. BLUPs were extracted from the linear mixed model as the random effects of genotypes.
Multiple comparisons for the effect of treatment on yield and yield components traits in the introgression population relative to the controls were performed using least squares (LS) means with "Tukey" adjustment method and α = 0.05. To further assess the impact of introgressed haplotypes on the traits, lines were sorted in a descending order for each treatment and location/year. The percentage of the ILs that performed better than the best parental lines and checks [percentage of topperforming lines (PTPL)] was calculated for each trait. Similarly, ILs that produced more grains than both parents and checks were identified based on the mean spatial adjusted yield. The percentage increase in yield was calculated as follows: where ȳ ij is the percentage change in mean yield,ȳ i is the mean yield for the high yielding ILs, andȳ j is the mean yield for the controls (parents and checks).

Trade-Off Between Traits
The relationship between yield, yield components, and BM traits were assessed by calculating the Pearson's correlation coefficients.
We compared the trend of correlations from different treatments (irrigated/non-irrigated) and years to determine the extent of trade-off between traits within the introgression population and how the environment influenced them.

GWAS Mapping
The genome-wide association analysis was used to identify the genomic regions with SNPs and haplotypes that have a significant association with the traits. We tested the association of 5.2 million SNPs from the D genome with the traits phenotyped in different treatments and years. A total of 15,967 haplotype block windows were identified from 5.2 million SNPs using the R package HaploBlocker v1.5.2 (Pook et al., 2019). Default parameters for HaploBlocker were used except node_min, which were reduced to two (default is five) as most genomic intervals in our data set had <5 haplotype variants. Overlapping haplotypes were removed using the parameter overlap_remove = TRUE, and the SNP coordinates were included in the haplotype library via the parameter bp_map. Haplotype blocks were split into windows by setting the parameter return_dataset = TRUE in the block_windowdataset() function. The haplotype variants within a given chromosome interval were recorded as 0, 1, 2, or 3 depending on the total number of haplotype variants present within the interval. In both cases, a mixed linear model implemented in the R package GAPIT was used to detect the associations. To control for a population structure in SNP-based analyses, 100,000 randomly selected markers were used to calculate the principal components. In haplotype-blockbased GWAS, all haplotype blocks were used to calculate the principal components. In both cases, only the first three principal components were used to control the population structure. Two threshold options were used to identify significant associations including a more stringent Bonferroni correction and a less stringent Benjamini and Hochberg (1995) false discovery rate (FDR) at 5% significance level. To control for the effect of treatment and year, GWAS based on BLUPs was also performed, and significant associations were reported only when there were SNP-or haplotype-trait association in at least two independent trials or in the BLUP-based analysis.

SNP-and Haplotype-Trait Effects
Haplotype variation at loci with significant SNPs and their effects on traits in the introgression population were analyzed. The R package HaploBlocker v1.5.2 (Pook et al., 2019) was used to infer haplotypes at the genomic loci with significant SNP-trait associations. Heatmap.2 function provided in the R package gplots was used to visualize the variation in haplotypes from hexaploid wheat and A. tauschii. However, at the Ppd-D1 locus, a visual comparison of the SNPs from the parental lines was done, and the SNPs were annotated using the snpEff v4.3 software to resolve haplotype variants in the A. tauschii lines that could not be clearly distinguished by HaploBlocker. SNPs significant at FDR ≤ 0.05 and their estimated allelic effects were selected from the association mapping results and used to verify if the haplotype effect corresponded to the observed phenotype in the introgression population. The mean and the SD of the phenotype were calculated for each group of lines carrying a similar haplotype, and the difference between the means was compared using Tukey's honestly significant difference and the Student's t-test.

Trait Variation in the Introgression Population
Broad sense H 2 of GY was 0.7 in 2018 and 0.64 in 2019, whereas for the yield component traits such as TGW, GA, GW, and GL, H 2 values were 0.85, 0.89, 0.83, and 0.95, respectively. The agronomic performance of the ILs relative to the controls (parent/checks) was assessed by comparing their yield and yield component traits under different treatments. The effect of treatment on yield was significant in 2018 (p < 2.2e-16), but not in 2019 (p = 0.24) at 95% confidence level. The latter is partially associated with more abundant rainfall in 2019 that reduced the difference in the water availability stress levels between the irrigated and non-irrigated field trials in Colby, KS, USA. Based on the LS means, significant differences in yield between controls and ILs was observed in 2019 and 2020, but not in 2018 ( Table 1). The yield data collected from irrigated and rainfed (non-irrigated) field trials conducted between 2018 and 2020 revealed that up to 23% of the lines with introgressions produce more grain than the controls (Supplementary Figure 2). In 2018, however, 3.2 and 48% of the ILs produced more grain than the parental lines and checks, respectively, in the non-irrigated trial (CO18), suggesting that the check cultivars were more sensitive to drought stress than the parental lines. The proportion of ILs outperforming the checks and parental lines for the measured traits varied between treatments and years with a minimum of 0.8% for BM in the non-irrigated trial in 2019 (CO19) and a maximum of 73% for TGW in the 2018 irrigated trial (COI18). The percentage increase in yield for the ILs that outperformed both checks and parent varied from 11 to 57%, whereas the number of lines that produced more grains varied from 6 to 94 (Table 2). Under drought stress conditions in 2018 (CO18), the mean yield of top-performing ILs was 1.6and 1.4-folds higher than the checks and parents, respectively (Figure 1). These results suggest that some ILs carry the alleles that confer drought tolerance, thus ensuring high productivity under stressful conditions. The highest yield potential of both ILs and controls was observed in 2019. The mean of the top yielding ILs reached 134 BPA, whereas that of the parental lines and checks reached 119 and 121 BPA, respectively ( Table 2).
The yield stability analyses were performed to identify ILs that are both high yielding and stable under various environmental conditions. Ranking by mean yield showed that 6% of the lines carrying introgression produced more grains than most parental lines, except Larry (Supplementary File 1). The yield of these lines ranged from 84.4 to 92.7 BPA. The average rank (AR) of 16 stability statistics revealed 12 lines with introgressions showing high yield stability. Five of these lines were both stable and high yielding according to KR when compared to the controls. The yield of these five ILs (KS15SGDCB110-11, KS15SGDCB098-1, KS15SGDCB103-6, KS15SGDCB098-14, and KS15SGDCB111-1) varied between 82 and 93 BPA. The yield of the most stable and high yielding IL (92.7 BPA) was 9.8% higher than the average yield of the controls (84.4 BPA). These results indicate that novel alleles from A. tauschii have the potential to increase the adaptive potential of hard red winter wheat to different environmental conditions. In addition, the stability statistics could help to prioritize ILs for deployment in different agroecological zones depending on their ranking in stability and yield. Lines that are moderately high yielding but show good yield stability could be deployed in marginal environments, whereas less stable but high yielding lines could be deployed in less stressful environments to achieve high productivity. Harvest index, a measure of source-sink capacity, was also assessed for stability in irrigated and rainfed trials. About 92 ILs showed a higher average HI (47.4-52.8) than the best parental line KS061406LN-26 (47.3). The AR based on the 16 stability FIGURE 1 | Boxplots comparing the mean grain yield (GY) between the top-performing introgression lines (ILs) and the controls (parents and checks) across treatments, years, and locations. AS20 refers to Ashland rainfed trial in 2020, CO18 is Colby rainfed trial in 2018, COI18 is Colby irrigated trial in 2018, CO19 is Colby rainfed trial in 2019, and COI19 is Colby irrigated trial in 2019.
statistics placed 11 out of 92 lines in the top 20 most stable lines (Supplementary File 2). Line KS15SGDCB111-1, which is high yielding and stable, also ranked in the top five lines with a stable and high average HI.

Trade-Off Between Yield and Yield Component Traits
Pearson's correlation coefficients between the average yield and yield stability based on AR of the 16 stability statistics was −0.44 (p < 0.001; Supplementary File 1). However, the correlation between yield and KR was −0.71 (p < 0.001), indicating that the most stable ILs were not necessarily the highest grain yielders although there were some exceptions. Similarly, the correlation between average HI and AR was −0.42 (p < 2.2e-16), whereas the correlation between HI and KR was −0.73 (p < 2.2e-16; Supplementary File 2).
The trade-off between yield and yield components was influenced by the treatment, year, and location as evidenced by a variation in the levels of correlations (Supplementary File 3). Higher positive correlations were observed among grain morphometric traits such as TGW, GA, GW, and GL, ranging from 0.13 (between GW and GL) to 0.96 (between TGW and GA) (Figure 2). HI and GSW positively correlated with GY, whereas the correlation between GY and GN was positive but nonsignificant in all trials, except for the Colby irrigated trial in 2019 (COI19) (Supplementary File 3). BM correlated negatively with HI but showed a positive correlation with GSW (Supplementary File 3). In some cases, an increase in the SNS resulted in a reduction in the TGW, GA, GW, or GL, which was consistent with the previously observed trade-off between these traits (Kuzay et al., 2019). In contrast, HD positively correlated with the SNS and PH, which is in agreement with the previous findings (Shaw et al., 2013;Muqaddasi et al., 2019).
To further understand the contribution of different yield components to the final yield, we compared the phenotypes of the top yielding ILs to those of the controls across all treatments (Supplementary File 4). In the CO18 trial under nonirrigated conditions, the ILs that outperformed the controls in yield had the highest TGW, GA, and GL, whereas under irrigated conditions (COI18), all yield component traits showed the highest levels of expression in the top yielding ILs. The top yielding ILs in the CO19 trial had the highest HI, GW, SNS, and BM while TGW and GA were comparable to those of the parental lines. In the COI19 trial, the TGW, GA, and GW traits contributed more toward the final yield compared to the GL, HI, and BM traits. In the AS20 trial, high levels of heterogeneity were observed among the top yielding lines for the TGW, GA, GW, and GL traits. However, these lines showed a higher BM than the controls, resulting in a reduced HI.
Previously, it was suggested that the introgression from wild relatives might have a negative impact on agronomic traits due to a negative epistasis between the alleles of wild and cultivated wheat (Nyine et al., 2020). We investigated the relationship between the total size of the introgressed genomic segments and phenotype. We found a positive linear relationship between GA, GL, SNS, and the total size of the introgressed segments (Figure 3, Supplementary File 5). For the TGW, however, a positive linear relationship was only observed under drought stress conditions indicating that some wheat lines with large introgressions are efficient in utilizing the limited soil moisture and nutrients during grain filling. There was a negative relationship between GY, HI, GW, TGW under irrigated conditions, and the size of introgression.

Genotyping and Imputation
To identify A. tauschii haplotypes in the D genome of introgression population, we generated high-density SNP data. By the whole-genome sequencing of 6 hexaploid parental lines and 21 A. tauschii accessions used for generating octoploid parents, we identified about 20 million high-quality SNP variants (MAF ≥ 0.05) and used them for genotype imputation in the introgression population genotyped by complexity-reduced sequencing. The total number of D genome SNPs retained after filtering out SNPs with genotype probability below 0.7 and MAF < 0.05 was 5.2 million.
Haplotypic Variation Between ssp. strangulata and ssp. tauschii Families Using HaploBlocker v1.5.2, we identified 4,764 and 6,429 non-overlapping haplotype blocks in the A. tauschii ssp. strangulata (FAM93) and A. tauschii ssp. tauschii (FAM97) families, respectively. After filtering out the monomorphic haplotypes between the parental lines, 869 (18%) and 3,020 (47%) segregating haplotypes were retained in FAM93 and FAM97, respectively (Table 3, Supplementary File 6). The low proportion of segregating haplotypes between hexaploid wheat and ssp. strangulata D genomes is in agreement with the finding that A. tauschii ssp. strangulata was the donor of the D genome of hexaploid wheat . These results also suggest that the high similarity between the genome of ssp. strangulata and the D genome of hexaploid wheat could result in the underestimation of the proportion of the introgressed haplotypes. The average genome-wide haplotype block length in FAM93 was higher (2 Mb) than that in the FAM97 (1 Mb) (Supplementary File 6). There was a significant difference in the introgressed haplotype length between the lines in FAM93 and FAM97 based on the t-test (p = 3.1e-16). The longest haplotype introgressed in all lines from FAM93 was 44 Mb on chromosome arm 3DL, whereas in FAM97 only four lines had a haplotype >32 Mb on chromosome arm 7DL. The number of segregating haplotypes in FAM93 varies from 32 (3D) to 336 (2D), whereas in FAM97 the number of segregating haplotypes varies from 173 (3D) to 617 (5D) ( Table 3). In FAM93 and FAM97, the average frequency of each haplotype from A. tauschii parents in the ILs was 11 and 4, respectively (Supplementary File 6).

SNP-and Haplotype-Based GWAS Mapping
Genome-wide association study was performed in the A. tauschii introgression population to assess the effects of introgression into the D genome on the variance of traits related to BM, yield and yield components, and Tg. The marker-trait association analyses were based on both individual SNPs and haplotype blocks identified by HaploBlocker from the 5.2 million imputed variants. We report only those associations that are replicated in at least two independent field trials and show a significant association with both SNPs and haplotypes at FDR 0.05 (Supplementary Table 1). Several genomic loci with significant associations distributed on the D genome chromosomes were detected for GL, GW, and SNS. For other traits such as GY, TGW, GN, GA, HI, BM, GSW, and SPSF, no consistent associations replicated in independent trials were detected. Ashland in 2020 (AS20). Here, r is the correlation coefficient and P is the significance of the correlation between introgression size and observed trait phenotype. We identified multiple significant SNP-and haplotypetrait associations from all trials on chromosome arms 2DS and 7DS for GL (Figure 4). The most significant SNPs were located in haplotype block windows 22,262,289,017,30,582,595,115,and 80,864,398,316 bp on chromosome arm 2DS,and 11,024,374,767 bp on chromosome arm 7DS (Supplementary Table 1). Association analysis based on BLUPs confirmed the results obtained from individual trials for GL on these two chromosomes. Other significant associations detected in at least two trials were identified on chromosome arms 3DS and 5DS (Supplementary File 7).
We identified haplotypes with significant SNPs associated with GW on 1DL, 2DS, 6DL, and 7DS from at least two independent trials that were confirmed by BLUP-based analysis (Supplementary File 7). Haplotype block windows  ,964,778-66,124,103 and 66,265,325-66,266,089 bp showed the most significant association on 2DS.

65
At 95% confidence level, the most significant SNP-trait associations were identified on chromosome arms 2DS and 6DL for SNS from the three independent trials (COI19, CO19, and AS20), (Figure 4, Supplementary File 7). The most significant associations are located at 16.5 and 463.8 Mb on 2DS and 6DL, respectively. Haplotype-trait analysis confirmed the association on 2DS for SNS at the 16.5 Mb locus located within the haplotype block window 16,497,666-16,548,006 bp. At FDR < 0.05, there was no haplotype block window on 6DL locus that overlapped with a significant SNPtrait association.
Previous studies have shown that SNS is linked to HD (Shaw et al., 2013;Muqaddasi et al., 2019). In the current study, we detected significant associations with SNS on chromosome arms 2DS and 6DL. We had 1 year data for HD and PH collected from Ashland in 2020, which provided us a good opportunity to validate this link in the A. tauschii introgression population. GWAS mapping detected significant associations with HD on chromosome arms 2DS and 4DL, whereas all D genome chromosomes showed a significant association with PH but the strongest signals were observed on 1DS, 3DS, and 6DL. The haplotype block window 16,548,753-16,639,561 bp on 2DS with the most significant SNPs for HD overlapped the locus showing a significant association with SNS, which is in proximity to another haplotype block overlapping with the most significant SNPs for SNS (16,497,666-16,548,006 bp). These results suggest that the expression of these two traits could be co-regulated.
For HD, the haplotype block windows on chromosome arm 4DL 442,735,751,954 bp and 459,271,290,731 bp had the most significant SNP-trait associations. The three traits (SNS, HD, and PH) are known to be affected by the Rht8 and Ppd-D1 genes on 2DS, in addition to Rht1 on 4D, which control PH and flowering time (Borojevic and Borojevic, 2005;Chen et al., 2018). Due to the lack of SNPs located near the Ppd-D1 gene locus at ∼34 Mb (33,961,438-33,951,651 bp interval in CS RefSeq v1.0), we could not directly validate its association with these traits. However, significant associations for SNS were detected at ∼3 Mb next to the Ppd-D1 locus in the CO19 and AS20 trials on haplotype blocks 2D: 30,192,335-30,264,745 bp and 2D: 28,829,778-28,937,705 bp, respectively. In the parental lines with high-density SNPs (∼20 million), the Ppd-D1 locus had SNPs, which allowed us to precisely map the haplotypes from A. tauschii and hexaploid wheat lines. The results obtained from HaploBlocker showed that all hexaploid parents carry an identical haplotype, which is distinct from that of A. tauschii accessions. By using SNPs identified by the whole-genome sequencing of parental lines, we characterized a haplotypic diversity at the Ppd-D1 locus ( Figure 5A). All hexaploid wheat lines carried the same Ppd-D1 haplotype (Hap1) while seven haplotypes of the Ppd-D1 gene (Hap2-Hap8) were identified in A. tauschii. The wholegenome sequencing of 21 A. tauschii revealed a broader range of Ppd-D1 diversity compared to a previous study (Guo et al., 2009), which identified only three Ppd-D1 haplotypes. The A. tauschii ssp. strangulata accessions carried the haplotypes that were identical to hexaploid wheat, except for Hap2 in TA1642, which had one SNP at position 33,952,131 bp ( Figure 5A). The Ppd-D1 genic region in A. tauschii ssp. tauschii accessions has one synonymous (SN), three intronic (IN), and one missense (MS) SNPs. The MS variant at position 33,955,614 bp results in His16Asn change, which is predicted to have a moderate functional impact, and only present in the lines with haplotype Hap5 (Figure 5A). Next, we inferred the parental haplotypes of the Ppd-D1 locus in the introgression population by using SNPs within the ∼1-2 Mb region surrounding the Ppd-D1 locus. About 82% of the ILs carried haplotype Hap1.
Further, we evaluated the linkage of Ppd-D1 haplotypes with SNP alleles showing a significant association with a variation in SNS and HD. For this purpose, we used two SNP sites, 2D_33786967 and 2D_35558454, which flank the Ppd-D1 locus on both sides and have genotyping information in the introgression population. We compared them to SNP alleles that were significantly associated with SNS and HD in a haplotype block window 2D: 16,548,753-16,639,561 bp (2D_16574050 and 2D_16574159), spanning ∼17 Mb region ( Figure 5B). We found that the GWAS alleles associated with an increase in SNS and HD in the introgression population are also linked with two A. tauschii haplotypes (Hap_AeT * and Hap_AeT), whereas the GWAS alleles associated with decreasing effects were in LD with Hap_HW contributed by the hexaploid wheat parents. The Hap_AeT * group of haplotypes was contributed by the A. tauschii parents having Hap2 and Hap7 at the Ppd-D1 locus.

The Phenotypic Effects of Haplotype Block Variants
Average SNS and HD Significant haplotype-trait associations were identified on chromosome arms 2DS and 4DL that influence SNS and HD. Chromosome 2DS had multiple introgressed haplotypes that are significantly associated with a variation in SNS and HD, with the most significant haplotypes located at 16,497,666-16,548,006 and 16,548,753-16,639,561 bp for SNS and HD, respectively. The haplotype variants with the increasing effect at these loci were from A. tauschii parents, whereas those with a reducing effect were from the hexaploid wheat lines (Figures 5C,D). The verification of GWAS results for an allelic effect at 2DS locus associated with SNS and HD supports the abovementioned observation (Supplementary File 8). We observed a positive Pearson's correlation coefficient between SNS and HD; and lines having haplotypes from either parent showed significant differences in the phenotype based on a t-test (r = 0.23, p = 3.31e-07) at 95% confidence level. Haplotypes on 4DL had a smaller effect on SNS compared to HD. Among 35 and 66 ILs having SNS and HD trait values above the 90th percentile of trait distribution, respectively, 13 lines had the increasing alleles from A. tauschii at both 2DS and 4DL loci associated with SNS and HD traits ( Figure 5E). Initially, we did not detect a significant GWAS signal directly associated with SNPs within the Ppd-D1 gene due to the lack of high-quality imputed SNPs in this region. Further analysis of parental haplotypes identified SNP variants linked with both the Ppd-D1 haplotypes and haplotypes at 28 and 30 Mb region, showing a significant haplotype-trait association in two trials. These haplotypes were within ∼3 Mb from the Ppd-D1 locus and likely overlap with Ppd-D1. We performed the ANOVA to determine the effect of different haplotype variants identified in the parental accessions on the SNS in the introgression population using data from the three experimental trials (COI19, CO19, and AS20) ( Table 4). The results show that both hexaploid and A. tauschii haplotypes have a significant effect on SNS (p < 0.001; Table 4, Figure 5B). Among the A. tauschii haplotypes, Hap7 had the highest impact on SNS (p = 0.003) followed by Hap2 (p = 0041) and Hap3 (p = 0.040). In contrast, Hap5 with the His16Asn MS mutation had a negative effect on SNS and was not significantly different from Hap1 present in hexaploid wheat lines (p = 0.072). Consistent with previous studies, these results showed that the Ppd-D1 gene located at ∼34 Mb (33,961,951,651 bp interval in CS RefSeq v.1.0), which plays a role in flowering time regulation in wheat, also has a strong effect on the variation in the SNS (Beales et al., 2007;Guo et al., 2009).  Table 2). This increase was associated with a significant decrease in GL and had no significant effect on GY. At 7D: 14,722,457-14,817,138 bp, the Hap_AeT haplotype contributed by A. tauschii was also linked with a significant increase in SNS compared to Hap1_HW&AeT detected in both hexaploid wheat and A. tauschii parents. However, an increase in SNS for this haplotype was connected with a decrease in both GL and GY. At this haplotype block (7D: 14,722,457-14,817,138 bp), the most significant increase in GY was observed for lines carrying haplotype Hap0_HW&AeT, which was associated with a moderate increase in both SNS and GL (Supplementary Table 2).

DISCUSSION
Here, we performed the sequence-based analysis of haplotypes in the wild-relative introgression population developed by crossing a diverse panel of A. tauschii accessions with winter wheat cultivars. Our results demonstrate that the wholegenome sequencing of wild and cultivated wheat founder lines in combination with the complexity-reduced sequencing of a derived introgression population provides an effective framework for SNP imputation. Because most breeding populations are based on a limited number of founders, often including 10-30 lines, their whole-genome sequencing is feasible in crops even with large genomes such as wheat, and provides a comprehensive description of allelic diversity present in a breeding population. The latter makes sequenced founders an ideal reference panel for imputing genotypes in a breeding population genotyped using low-coverage or complexity-reduced sequencing. This was recently demonstrated by imputing genotypes in the wheat MAGIC population genotyped by low-coverage sequencing (Scott et al., 2021). The composition of our introgression population, including multiple biparental cross families (Nyine et al., 2020), also shifts the population allele frequency toward more common variants, which could be imputed with a higher accuracy than rare variants (Huang et al., 2015). In addition, the high levels of LD in the introgression population should increase the length of haplotype blocks and facilitate the detection of matching haplotypes in the reference panel of founders using even sparse genotyping data generated by low-coverage or complexity-reducing sequencing. Consistent with these assumptions, an imputation algorithm implemented in Beagle (Browning and Browning, 2013) allowed us to impute nearly 5.2 million SNPs in the introgression population with high genotype call probabilities above 0.7 using SNPs generated by complexity-reduced sequencing of this population and nearly 20 million variants identified in 27 founders. This high-density SNP marker data permitted a detailed characterization of the introgressed haplotypes (Pook et al., 2019) and assessing their effects on productivity traits.
Our results demonstrate that wild-relative introgressions into the D genome of wheat, the least diverse amongst the three subgenomes (Chao et al., 2010;Jordan et al., 2015;Singh et al., 2019), is associated with the increased levels of variation in yield and yield component traits. The analysis of data from several years and locations under irrigated and non-irrigated conditions revealed many superior ILs that produce more grains or show higher yield stability than the control cultivars. The yield increase in top-performing ILs was driven by a combination of yield component traits, and in many cases, it was associated with increased grain size, grain weight, and BM or improved harvest index. These results suggest that wild-relative introgression has the potential to positively affect source-sink balance, which was suggested to be one of the important factors contributing to yield potential (Reynolds et al., 2017). Many of these high yielding lines (∼23%) were also among the top lines showing the highest levels of yield stability, indicating that the introgression from A. tauschii likely improves the adaptive potential of hard red winter wheat in different environmental conditions. Consistent with this conclusion, the highest impact of introgression on yield was found in a non-irrigated trial, indicating that alleles from A. tauschii likely improve the adaption of hexaploid wheat to water-limiting conditions. The A. tasuchii accessions used to generate the introgression population represent both L1 and L2 lineages  and originate from a broad range of geographical locations with variable climatic conditions, likely capturing adaptive haplotypes from the regions prone to drought stress.
Heading date is one of the key agronomic traits linked with wheat adaptation to different geographical locations and improvement in yield (Jung and Müller, 2009). In our population, a positive correlation was observed between HD and the SNS, with some lines showing up to 2-week delay in HD. Several haplotype blocks on chromosome arms 2DS and 4DL were significantly associated with a variation in spikelet number and HD. The haplotypes with increasing effects at both loci were derived from A. tauschii, indicating their potential for modulating both traits in bread wheat. Chromosome 2DS is known to harbor the Ppd-D1 and Rht8 genes that control flowering time and PH, respectively, and also could affect the spikelet number (Shaw et al., 2013;Muqaddasi et al., 2019). The overlapping haplotype blocks associated with the spikelet number and HD were identified on 2DS, confirming that the two traits co-segregating in the population have a common genetic basis. We demonstrated that these 2DS haplotypes are associated with the different allelic variants of the Ppd-D1 gene from A. tauschii. Consistent with the earlier studies, these results demonstrated that the different alleles of the Ppd-D1 gene have distinct effects on HD and SNS (Beales et al., 2007, Guo et al., 2009. These effects were correlated with the relative expression levels of each Ppd-D1 allele (Guo et al., 2009), suggesting that functional mutations within the Ppd-D1 coding region and the modifier mutations in the regulatory region of the gene likely to account for a variation in these traits in the A. tauschii winter wheat introgression population. The developmental plasticity modulated by Ppd-D1 is mediated by the changes in the expression of flowering time genes (Gol et al., 2021). It was shown that the Ppd-H1 from wild barley is capable of integrating environmental signals to control HD and minimize the negative impact of transient drought stress on spikelet number (Gol et al., 2021). Consistent with this observation, in the current study, ILs that have a high proportion of A. tauschii segments produced more grain under drought stress in the Colby 2018 trial, raising the possibility that the A. tauschii alleles of Ppd-D1 also have the potential to protect wheat from the physiological effects of stress that lead to low yield.
Our study reveals that some haplotypes associated with the productivity trait variation in the introgression population also exhibit significant pleiotropic effects. Meanwhile, the direction of effects on various traits was largely consistent with the previously reported trade-offs among component traits (Griffiths et al., 2015;Reynolds et al., 2017;Quintero et al., 2018), the combined effects of some introgressed haplotypes were associated with the positive trends in yield. For example, a haplotype contributed by A. tauschii ssp. tauschii at the chromosome 2D haplotype block at 65,964,778-66,124,103 bp was associated with an increase in GL, size, and number with a moderate positive effect on GY. At the haplotype block on chromosome 7D located between 14,722,457-14,817,138 bp, the Hap0_HW&AeT haplotype shared between hexaploid wheat and A. tauschii parents and associated with a moderate increase in both SNS and GL was also associated with the most significant increase in GY. Analyses of the pleotropic effects of the introgressed haplotypes suggest that these haplotypes on chromosomes 2D and 7D could be utilized in breeding programs to improve yield component traits without negative effects on other productivity traits.

CONCLUSIONS
The imputation of markers from whole-genome-sequenced reference panels into skim-sequenced inference populations is increasingly becoming a common practice in plant breeding program due to its cost-effectiveness (Happ et al., 2019;Jensen et al., 2020). Our study demonstrates the utility of this strategy for detecting introgression in the wheat genome and contributes to developing genomic resources for deploying wild-relative diversity in wheat breeding programs. We show that the haplotype-based analysis of trait variation in this population has the potential to improve our knowledge on the genetic effects of the introgressed diversity on productivity traits and identify novel haplotypes for improving yield potential in wheat.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

AUTHOR CONTRIBUTIONS
AF and MC generated the A. tauschii introgression population. MN, EAd, MC, RA, BB, WW, DD, ZY, YG, FH, KJ, and AF phenotyped the introgression population. MN, BB, and AA generated the genotyping data. AA performed next-generation sequencing of parental lines and introgression population. MN and FH performed the bioinformatical analyses of data. MN performed the statistical analysis. EAk conceived the idea and interpreted results. MN and EAk wrote the manuscript. All authors contributed to the manuscript revision, read, and approved the submitted version.