Haplotype Loci Under Selection in Canadian Durum Wheat Germplasm Over 60 Years of Breeding: Association With Grain Yield, Quality Traits, Protein Loss, and Plant Height

Durum wheat was introduced in the southern prairies of western Canada in the late nineteenth century. Breeding efforts have mainly focused on improving quality traits to meet the pasta industry demands. For this study, 192 durum wheat lines were genotyped using the Illumina 90K Infinium iSelect assay, and resulted in a total of 14,324 polymorphic SNPs. Genetic diversity changed over time, declining during the first 20 years of breeding in Canada, then increased in the late 1980s and early 1990s. We scanned the genome for signatures of selection, using the total variance Fst-based outlier detection method (Lositan), the hierarchical island model (Arlequin) and the Bayesian genome scan method (BayeScan). A total of 407 outliers were identified and clustered into 84 LD-based haplotype loci, spanning all 14 chromosomes of the durum wheat genome. The association analysis detected 54 haplotype loci, of which 39% contained markers with a complete reversal of allelic state. This tendency to fixation of favorable alleles corroborates the success of the Canadian durum wheat breeding programs over time. Twenty-one haplotype loci were associated with multiple traits. In particular, hap_4B_1 explained 20.6, 17.9 and 16.6% of the phenotypic variance of pigment loss, pasta b∗ and dough extensibility, respectively. The locus hap_2B_9 explained 15.9 and 17.8% of the variation of protein content and protein loss, respectively. All these pleiotropic haplotype loci offer breeders the unique opportunity for further improving multiple traits, facilitating marker-assisted selection in durum wheat, and could help in identifying genes as functional annotations of the wheat genome become available.


INTRODUCTION
Durum wheat (Triticum turgidum L. ssp. durum Desf. Husn., 2n = 4x = 28; genome AABB) is an important crop in Canada, grown on an average of approximately 2 million hectares and comprising about 25% of total wheat area (Canada, 2018). Nearly all of Canada's wheat is produced in the western prairie provinces of Saskatchewan, Alberta and Manitoba, with a relatively small area in British Columbia and eastern Canada (McCallum and DePauw, 2008). Durum wheat was introduced into western Canada in the late nineteenth century (see Dexter, 2008 for a detailed history of Canadian durum wheat breeding) and planned hybridization and targeted selection started in 1928 (Clarke J.M. et al., 2010). However, the first variety developed in Canada, Stewart 63, was not released until 1963 (Dexter, 2008).
The improvement of quality traits, such as yellow pigment and gluten strength, was a major focus for durum breeding to satisfy the requirements of the global pasta industry. Canadian durum wheat is classified into four Canada Western Amber Durum (CWAD) wheat milling grades defined by the Canadian Grain Commission (Dexter and Edwards, 1998). Only varieties that meet a set of requirements for a grade are registered. Specifications for new varieties continue to evolve in response to the feedback of CWAD customers (Dexter, 2008). Durum breeding in Canada has made steady genetic progress to improve yield and agronomic traits. This was done concomitantly with improvements in end-use quality attributes such as grain protein concentration, yellow pigment concentration and gluten strength, while improving or maintaining resistance to disease (Clarke J.M. et al., 2010). Grain protein concentration and gluten strength are crucial factors in pasta manufacturing and cooking quality (Feillet and Dexter, 1996). These and other quality trait targets have indirectly driven durum wheat breeders to design hybridization programs within narrow limits, using a similar set of standard cultivars as donors of these quality traits. In particular, high grain protein concentration is a requirement for durum cultivar registration in Canada and this likely limited grain yield gain (Clarke J.M. et al., 2010) due to the generally negative relationship between grain yield and protein concentration .
When it is necessary to bring new diversity for particular traits into the breeding programs, it is imperative to return the elite materials to a new state of equilibrium as quickly as possible. Efficient means to identify both core essential adaptation and quality traits in addition to new traits being introgressed can facilitate this process. Starting in the late 1990s, molecular markers became an important tool for Canadian wheat breeding programs. However, the lack of tightly linked diagnostic markers, QTL × environmental interactions and prevalence of QTL background effects have limited the application of marker assisted breeding for some traits (Randhawa et al., 2013). Advances in high-throughput genotyping platforms at a low cost have now made it possible to consider using empirical LD patterns to conduct genome-wide scans to link markers with phenotypes of interest. Additionally, the recent availability of high quality genome assemblies for tetraploid wheat (Avni et al., 2017) could be a valuable source to facilitate the identification of loci and genes of interest.
Association mapping (AM) is increasingly being adopted as a complementary method to bi-parental linkage mapping to identify genotype-phenotype associations. Several AM studies have been conducted to dissect the genetic basis of durum grain yield (Maccaferri et al., 2011;Mengistu et al., 2016;Kidane et al., 2017;Sukumaran et al., 2018), grain and semolina quality traits (Fiedler et al., 2017), semolina and pasta color , agronomic and morphological traits (Hu et al., 2015) and grain cadmium concentration and yellow color loss during pasta manufacturing (Pozniak et al., 2012) in durum wheat. Recently, the haplotype-based AM approach was suggested as an efficient method for investigating the genetic basis of traits of interest in durum wheat by detecting more loci , capturing epistatic interactions and reducing type I error rate (Morris and Kaplan, 2002;Zhao et al., 2007;Hamblin and Jannink, 2011). All of these genome-wide association studies in durum wheat populations were performed using the whole set of polymorphic SNPs, but not loci under selection over the course of durum wheat breeding. However, the detection of selection signatures is gaining ground in modern population genetics (Fariello et al., 2013). The growing availability of large-scale genotyping data has facilitated the identification of regions targeted by natural and/or artificial selection in wild and domesticated populations of plants, animals and humans (Nielsen et al., 2007). The search for molecular signatures aims to uncover the evolutionary past of species, understand their functional or adaptive importance, and detect associations between these genomic regions and traits of interest (López et al., 2015). It has become an efficient approach in biomedical sciences to identify genes related to disease resistance (Tishkoff et al., 2001;Barreiro et al., 2008;Albrechtsen et al., 2010;Fumagalli et al., 2010;Cagliani et al., 2011), adaptation to climate (Lao et al., 2007;Sturm, 2009;Rees and Harding, 2012), or altitude (Bigham et al., 2010;Simonson et al., 2010). In livestock species, where artificial selection has been carried out by humans since domestication, it contributes to mapping traits of commercial interest (Guan et al., 2016;Liu et al., 2016;Taye et al., 2017). Therefore, the main objectives of this study were to scan the genome of elite Canadian durum wheat lines tested in the pre-registration trial for selection footprints and relate these genomic regions to phenotypes. The availability of such loci targeted by selection would be a valuable resource for developing markers and/or investigating gene candidates controlling the traits of interest we analyzed, in particular grain yield, protein loss and pasta quality traits.

Plant Materials and Phenotypic Data Analyses
One hundred and ninety-two durum wheat lines, including the 169 lines from our previous study  from the official Canadian durum cultivar registration trial (Durum wheat Cooperative Test) were used for this project (Supplementary Table S1). Phenotypic data and trials were described in previous reports (Clarke J.M. et al., 2010;Pozniak et al., 2012;Haile et al., 2018). Briefly, the trials were run under the auspices of science/industry groups responsible for recommending cultivars for registration by the Canadian Food Inspection Agency and grown each year at 10-12 locations in western Canada and one in the United States. Candidate lines were tested for 1-3 years, those that did not meet the merit requirements being withdrawn, with 3 years of data required for registration of cultivars. Each trial included four or five check cultivars. The checks AC Avonlea, AC Morse, AC Navigator and Strongfield were in the trials for the years since 1999, and Commander since 2001. Trials were arranged in lattice designs with four replications. For the end-use quality traits, we evaluated gluten index described by Haile et al. (2018), semolina pigment, semolina b * , pasta b * and pigment loss as previously described . We also included alveograph measures (Haile et al., 2018): dough tenacity (P), dough extensibilty (L), and deformation energy (W). Protein loss was estimated as the difference between grain protein and semolina protein concentration.
The historical and unbalanced phenotypic data were analyzed using SAS version 9.4 PROC HPMIXED with three models due to the different data structures to calculate best linear unbiased predictions (BLUPs). For grain yield concentration which was measured on composites of locations within years (175 lines), year, location, replication, and genotype, and their interactions were considered random; for grain protein and yellow pigment concentrations (186 lines), years, locations, genotypes, and interactions were random; for gluten strength and color traits measured on yearly composites (170 lines), genotypes and years were random. The analyses included all genotypes tested (up to approximately 300 depending on trait) in the registration trials, not just those used in the present study, to provide a better estimate of random variances and covariances (Clarke F.R. et al., 2010;Pozniak et al., 2012).

SNP Genotyping and Data Curation
DNA extraction and genotyping with the 90K iSelect assay chip were carried out as reported in our previous paper . A total of 14,324 polymorphic SNPs were scored and missing calls were imputed using the RF regression procedure (Breiman, 2001) as implemented in the randomForest R package (Liaw and Wiener, 2002;R Development Core Team, 2013). After removing SNPs having MAF < 0.05, a total of 11,323 SNPs were kept for analyses.
Because relatively few semi-dwarf lines in the panel were selected for very high pigment, presenting the possibility of spurious associations, the lines were also genotyped with Rht-B1b, an allele known to confer semi-dwarf growth habit in wheat (Ellis et al., 2002). In order to distinguish the association signals from Rht-B1b, pairwise LD (r2) was performed between all 4B association signals and this gene using MIDAS software (Gaunt et al., 2006).

Population Structure and Genetic Diversity Analysis
First, duplicate SNPs were removed using an in-house Ruby script, as previously described . Then, a subset of 4,235 highly polymorphic (0.32 ≤ PIC ≤ 0.45) SNPs were selected for the clustering analysis (Kabbaj et al., 2017). The population structure among the 192 breeding lines was investigated using the discriminant analysis of principal components (DAPC) as implemented in the Adegenet R package (Jombart, 2008;Jombart et al., 2010;Jombart and Ahmed, 2011). The number of clusters (K) for the DAPC was estimated from the lowest value of the Bayesian information criterion (BIC) according to Jombart et al. (2010).
The genetic differentiation between populations derived from the DAPC was assessed by pairwise Fst values, and the variance between and within populations was calculated using the analysis of molecular variance (AMOVA) as implemented in the Arlequin 3.5 software (Excoffier and Lischer, 2010). Significance levels for variance components and Fst statistics were estimated based on 10,000 permutations. To analyze the changes in diversity over time, the breeding lines were assigned to temporal groups (decades) according to the time of entry into the Durum wheat Cooperative Test and the genetic diversity (π) was computed according to Nei and Li (1979), using Arlequin 3.5 software (Excoffier and Lischer, 2010).

Selection Signatures
The subset of 4,235 highly polymorphic SNPs was used to scan the genome for signatures of selection, using three approaches: the total variance Fst-based outlier detection method implemented in Lositan (Antao et al., 2008), the hierarchical island model with 100,000 simulations using Arlequin (Excoffier and Lischer, 2010) and the Bayesian genome scan approach implemented in BayeScan (Foll and Gaggiotti, 2008). For the latter, we performed 20 pilot runs of 50,000 iterations, followed by 100,000 iterations on a sample size of 5000 and thinning interval of 10.
For Lositan, markers were considered under divergent selection if the Fst values were higher than 99% of the neutral distribution. For Arlequin, loci were considered being under selection if the observed Fst values were higher than expected on the basis of neutral variation, and showing Fst out of the 99% quantile based on coalescent simulations (Beaumont and Nichols, 1996). We identified markers under selection with BayeScan by comparing posterior probabilities and threshold values obtained from the FDR (FDR q-values < 0.05). As a result, the Fst cut-off for declaring outliers was 0.20 for Arlequin and Lositan, and 0.15 for BayeScan (Supplementary Figure S1).

Haplotype Blocks and Association Study
Haplotype blocks were built as previously described , with little modifications. Only markers that showed strong evidence of directional selection were clustered into haplotype blocks based on the durum wheat consensus map (Maccaferri et al., 2014), using an in-house python script. These haplotype loci were used to perform association studies in order to relate them to the traits of interest (e.g., gluten index, semolina pigment, grain yield, protein content, protein loss, and plant height) that have been scored over years. The haplotype-trait association analyses were carried out using a Mixed Linear Model (MLM) (Yu et al., 2006) with either the Kinship matrix alone (MLM-K) or the Q matrix from the DAPC plus Kinship (MLM-QK) as random effect, using TASSEL software version 3 (Bradbury et al., 2007). The Q-Q (quantile-quantile) plot profiles (Supplementary Figure S2) showed that MLM-K better controlled the P-value inflation while MLM-QK led to overcorrections. Therefore, all association analyses were carried out using the MLM-K model. A false discovery rate (FDR) of 5% was applied and haplotype loci with P-value ≤ 0.05 were declared significant. The allelic effect of haplotypes was estimated as the difference between the mean value of the lines carrying these haplotypes and the mean value of the entire population for each trait, as previously described .

Genetic Diversity and Population Stratification
SNP markers used for the analysis were distributed across all 14 chromosomes of the durum wheat genome. Genetic diversity in the registration trials changed over time, declining during the first 20 years of breeding in Canada when the germplasm shifted from introduced cultivars in the 1940s and 1950s to locally-bred lines in the 1970s (Figure 1). Diversity increased in the late 1980s and early 1990s, then started a decrease in the 2005s.
The genetic relationship between the 192 durum lines as revealed by the DAPC is illustrated in Figure 2. Based on the Frontiers in Plant Science | www.frontiersin.org lowest BIC, the lines were clustered into four sub-populations. However, the pairwise Fst analysis showed only moderate (0.05 < Fst ≤ 0.15) genetic differentiation between the four sub-populations ( Table 1). The AMOVA revealed that only 9.8% of the genetic variation is found between sub-populations, whereas 85.2% of the genetic variation resides between individuals ( Table 2).

Loci Under Selection
The Fst distribution and threshold (0.15 for BayeScan, 0.2 for both Arlequin and Lositan) to declare loci being under selection are shown on Supplementary Figure S1. From the total of 4,235 most informative SNPs, 407 appeared to be under selection, spanning all 14 chromosomes of the durum wheat genome (Figure 3). Lositan detected 403 outliers, including all 397 markers from Arlequin (Figure 4). BayeScan identified only 144 outliers, of which 4 SNPs were undetected by Lositan and BayeScan. Twenty-three percent (95/407) of markers under selection showed a complete reversal of allelic state in response to selection over time. However, the time at which the allelic state changed was different depending on the marker. For example, the switch of allele's frequency occurred in the early 1960s (e.g., BobWhite_c8016_301, BS00009060_51, and BS00087544_51), in the mid-1980s (BS00091561_51, BS00097263_51, and CAP7_c11156_108) or in the late 2000s (e.g., Excalibur_c29304_176, IACX6346, and Ra_c11263_2353) during the breeding program. Of the markers that showed a complete reversal of allelic state, 11% are fixed (allele frequency = 1) in the population (Supplementary Figure S3).
The distribution pattern of markers under selection was different between chromosomes (Figure 3), outliers spanned several genomic regions on the entire chromosome (e.g., 1B, 2B, and 7A) or were clustered at only few localized regions on the chromosome (e.g., 2A and 4A). The markers under selection were clustered into 84 LD-based haplotype loci, containing 1-28 SNPs (Supplementary Table S2). The number of haplotype loci varied among chromosomes, ranging from 3 (chromosomes 2A, 3B) to 11 (chromosome 7A). Thirty-six (30/84) percent of the haplotype loci under selection harbored markers with a complete reversal of allelic state in response to selection over time; the proportion of SNPs with a complete reversal of allelic ranging from 20 (hap_5A_5) to 100% (e.g., hap_1B_5).

Phenotypes Analysis
Wide phenotypic variation was observed among lines in the breeding panel for all of the traits ( Table 3 and Supplementary Figure S4). Sample means were significantly (P-value ≤ 0.05) different between sub-populations for all of the traits, except  grain yield. Many traits were significantly (P-value ≤ 0.05) correlated to each other ( Figure 5). In particular, pasta b * was strongly positively correlated with pigment loss, semolina pigment and semolina b * , r = 0.80, 0.66, and 0.68, respectively. Grain protein and protein loss showed significant (P-value < 0.05) positive correlation, r = 0.82.

Haplotype-Traits Association Analyses
The association analysis detected 54 haplotype loci, of which 39% (21/54) were associated with at least two traits. For quality traits (Table 4), 49 loci were detected, spanning all chromosomes. In particular, hap_1B_2 on chromosome 2B was associated with dough extensibility, gluten index, dough tenacity and deformation energy while hap_4B_2 (chromosome 4B) was associated with pigment loss, pasta b * , protein loss, semolina pigment and protein content. The phenotypic variations explained by the pleiotropic loci varied depending on the traits. For example, hap_4B_1 explained 20.6, 17.9, 16.6, and 12.1% of the phenotypic variance of pigment loss, pasta b * , dough extensibility and gluten index, respectively, whereas hap_4B_2 explained 6.8% (semolina pigment, protein content) to 16.8% (protein loss) of the phenotypic variation. For protein content, six loci were detected, spanning chromosomes 1B, 2B, 4B, and 7A (2 loci) and explained 3.5 (hap_7A_6) to 15.9% (hap_2B_9) of the phenotypic variation ( Table 4).
A total of five loci, hap_1A_1, hap_1B_2, hap_2B_7, hap_4B_1, and hap_7A_6 were associated with gluten index. The locus hap_4B_1 gave the highest allelic effect (21.3) and explained 12.1 % of gluten index variation.
A total of 11 haplotype loci were associated with plant height and/or grain yield ( Table 5). These loci spanned chromosomes 2A, 4A, 4B, 5A, 7A, and 7B for plant height; and 1B, 2B, 4A, and 4B for grain yield. In particular, the loci hap_4A_5 and hap_4B_1 controlled both plant height and grain yield. The locus hap_4A_5, which showed the highest allelic effect (190.03) on grain yield, explained 43.9 and 38.5% of the phenotypic variation of plant height and grain yield, respectively. On the other hand, hap_4B_1 was strongly associated (r 2 = 0.90) with the dwarfing gene Rht-B1b and explained 30.0% of the phenotypic variation of plant height.
Thirty-nine (21/54) percent of haplotype loci associated with the traits we investigated contained markers with a complete reversal of allelic state (Supplementary Table S3). In particular, hap_1B_5 and hap_2A_1 consisted of 100% (7/7) and 91% (10/11) markers with a complete reversal of allelic state, respectively. For each haplotype locus, the time at which the switch in allelic state occurred varied among SNPs, regardless of the number of traits the haplotype locus was associated with. For example, for hap_1B_5 associated with only semolina pigment, the switch in allelic state occurred in the early 1960s (Tdurum_contig85180_99), in the late 1980s (BobWhite_c17644_456) and in the 2000s (IACX6346) (Supplementary Figure S5). For hap_2A_1 that was associated with plant height and dough extensibility, the switch in allelic state occurred in the 1970s (BS00097263_51), in the 1985s (e.g., CAP7_c239_267, CAP7_c11156_108), and in the 1990s (wsnp_Ex_rep_c66615_64916512) (Supplementary Figure S6).

Population Stratification and Genetic Diversity Over Time
The presence of genetic structure within a population can lead to spurious association signals (Marchini et al., 2004;Kang et al., 2008;Myles et al., 2009;Mezmouk et al., 2011;Cappa et al., 2013). Therefore, the analysis of the actual population structure of the durum breeding panel was intended to limit the FDR in the association analysis. The discriminant analysis of principal components (Jombart et al., 2010) clustered the 192 lines into four sub-populations. Nonetheless, the genetic differentiation between the four sub-populations was moderate (0.05 < Fst ≤ 0.15), whereas 85.2% of the genetic variation resided between individuals. Therefore, the association analysis accounted only for kinship (MLM-K model). The population structure is in agreement with known differences in pedigree, breeding program sources and era of testing in the trials, as described in our previous study .  Similarly, many other studies about genetic diversity changes in durum wheat germplasm over time revealed that most (up to 91%) of the variations were found between individuals within groups compared to that between groups (Henkrar et al., 2016).
Durum breeding began in Canada in the early 1950s, with the first locally-bred cultivar registered in 1963. The Canadian wheat industry has strict processing quality requirements for registration of durum cultivars, so the majority of breeding crosses involve closely-related local materials, with relatively little use of either introduced modern cultivars or landraces except where absolutely necessary to introduce a new trait. Genetic diversity in the registration trials changed over time, declining during the first 20 years of breeding in Canada when the germplasm shifted from introduced cultivars in the 1940s and 1950s to locally-bred lines in the 1970s (Figure 1). Diversity increased in the late 1980s and early 1990s coinciding with introgression of higher yellow pigment concentration, gluten strength and low grain cadmium concentration (Clarke J.M. et al., 2010), and with major expansion of Canadian durum breeding programs due to increased funding. The sources of these traits were improved cultivars from CIMMYT, Germany, Italy, and United States. Further cycles of crossing and selection then reduced diversity observed during the past decade to a level similar to the 1970s. In contrast, a relatively stable diversity over 25 years was observed in the CIMMYT Elite Spring Wheat Yield Trial (Dreisigacker et al., 2012), probably reflecting consistent usage of diverse parents to meet requirements for adaptation to global production environments. Similarly, the impact of traits improvement on allelic changes over time has been described in Canadian bread wheat cultivars (Fu and Somers, 2011).
The impacts of modern plant breeding on crop genetic diversity has been a major concern of many scientists in search for implementing efficient conservation strategies and a better utilization of germplasms (see Duvick, 1984;Fu, 2007Fu, , 2015 for review). In particular, some temporal patterns of genetic diversity have been observed (Rauf et al., 2010;van de Wouw et al., 2010b,c). The high selection pressure in modern plant breeding has reduced crops genetic diversity over time (Duvick, 1984;Hallauer and Darrah, 1985;Bowman, 2003;Gepts, 2006;Fu and Dong, 2015). In particular, allelic reduction at specific loci and allele loss over time have been reported in the Canadian hard red spring wheat germplasm (Fu et al., 2005). However, no loss of allele's number was found in European winter wheat varieties over time (Huang et al., 2007). Similarly, a meta-analysis of 44 diversity studies did not reveal evidence of genetic erosion in released varieties (van de Wouw et al., 2010a). The discrepancy between genetic diversity trends in released cultivars and varieties was attributed to differences in breeding goals and methods that have been applied over time (Gepts and Hancock, 2006;Rauf et al., 2010). Many other sources that could affect the crops genetic diversity have been reported, including use of different genetic diversity measures, sampling bias and arbitrary grouping of cultivars to represent specific breeding periods (Reif et al., 2005;Fu, 2007;Aremu, 2011).

Loci Under Selection
Loci under directional selection are expected to have lower intra-population variability and larger inter-population variability than neutral loci. Thus, loci under directional selection can be unveiled by patterns in heterozygosity and/or differences in Fst values (Eveno et al., 2008;Perez-Figueroa et al., 2010;Kirk and Freeland, 2011;Konijnendijk et al., 2015;Lin et al., 2017). Evidence for selection was investigated using, the total variance Fst-based outlier detection method implemented in Lositan (Antao et al., 2008), the hierarchical island model implemented in Arlequin (Excoffier and Lischer, 2010) and the Bayesian genome scan approach implemented in BayeScan (Foll and Gaggiotti, 2008). The number of outliers varied according to the method. Lositan detected 403 outliers, including all 397 markers from Arlequin. BayeScan identified the less number (144) of outliers of which 4 SNPs were undetected by Lositan and BayeScan. Comparison of several outlier detection methods showed that these methods differ in their type I (false positives) and type II (false negatives) error rates (Perez-Figueroa et al., 2010;Narum and Hess, 2011). Thus, combining the properties of different outlier detection methods could reduce the percentage of false positives and strengthen the candidate status of the identified outlier loci (Vasemagi et al., 2005). A similar approach has been reported for many crops, including wheat (Gao et al., 2017;Ren et al., 2017), rice (Xia et al., 2014), and apple (Khan et al., 2014). In general, researchers feel more comfortable when loci under selection are detected by at least two different methods. However, among the four outliers detected by only BayeScan, one marker representing the haplotype locus hap_3A_3, showed strong association with semolina pigment and explained 4.2% of the phenotypic variance (Table 4). Studies where only one method was used to scan for loci under selection have been reported elsewhere (Mäkinen et al., 2008;Jiao et al., 2012;Liao et al., 2013). Therefore, it seems reasonable to take into consideration any outlier detected by a single method even if it might not be ranked as a prime candidate.
Markers under selection showed different distribution patterns, spanning several genomic regions or clustered at   a Position in centiMorgan on the chromosome. b Allelic effect: The allelic effect of haplotypes was estimated as the difference between the mean value of the lines carrying these haplotypes and the mean value of the entire population for each trait. * Should be read as star (e.g., Semolina b * is "Semolina b star"). a Position in centiMorgan on the chromosome. b Allelic effect: The allelic effect of haplotypes was estimated as the difference between the mean value of the lines carrying these haplotypes and the mean value of the entire population for each trait. Singh et al., 2017), wheat (Zhang et al., , 2015, eggplant (Portis et al., 2014), cabbage (Lv et al., 2016), and Brassica (Basnet et al., 2015). The existence of such hot regions could offer an appealing opportunity to select for multiple traits, especially when traits are positively correlated. Thirty-six (30/84) percent of the haplotype loci under selection harbored markers with a complete reversal of allelic state. This tendency towards fixation of favorable alleles in our material corroborates the success of the Canadian durum wheat breeding programs over time.

Genome-Wide Association Studies
Previous studies showed that haplotype-based analysis increases the power of QTL detection (Liu et al., 2008;Hamblin and Jannink, 2011;Gawenda et al., 2015;Contreras-Soto et al., 2017) and explains a larger proportion of the QTL variance (Grapes et al., 2006;Hayes et al., 2007;) as compared to single-marker analysis. Therefore, all of the 407 SNP markers under selection were clustered into 84 LD-based haplotype loci, as reported in our previous manuscript .
Thirty-six (30/84) percent of the haplotype loci under selection failed to be associated with any of the traits that we investigated. Because these loci showed strong evidence of directional selection, the lack of an association signal could suggest that they might be controlling traits that we do not have data for, and we didn't analyze here. Therefore, these loci could still be of interest for further investigations when more phenotypic data becomes available.
A total of 49 haplotype loci, spanning all 14 chromosomes of the durum wheat genome, were associated with quality traits. In particular, hap_4B_1 located on chromosome 4B explained 20.6, 17.9, 16.6, and 12.1% of the phenotypic variance of pigment loss, pasta b * , dough extensibility and gluten index, respectively. This result is congruent with many studies that reported major QTL for yellow pigment on chromosome 4B (Mares and Campbell, 2001;Pozniak et al., 2007;Blanco et al., 2011;Roncallo et al., 2012;. The haplotype locus hap_3A_5 explained 23.4% of the phenotypic variation of semolina pigment, contrasting with the minor QTL of pigment color reported by other studies in durum wheat (Reimer et al., 2008;Roncallo et al., 2012). However, major QTL explaining up to 20% of the variation of flour color have been reported on chromosome 3A in hexaploid wheat (Parker et al., 1998;Mares and Campbell, 2001;Crawford and Francki, 2013). The locus hap_7A_3 explained 21.3% of the phenotypic variation of semolina pigment; coincident with the existence of a major QTL of yellow pigment on chromosome 7A in durum wheat (Patil et al., 2008;Zhang and Dubcovsky, 2008;. These conflicting results clearly demonstrate the complex inheritance of yellow pigment in wheat. Protein content is a key specification for wheat because of its nutritive value and its impact on many processing properties, such as water absorption and gluten strength (Delcour et al., 2010;Kaur et al., 2015;Kaushik et al., 2015). Six haplotype loci were associated with protein content, spanning different chromosomes, of which chromosome 6B that had been reported to harbor a major QTL for grain protein (Joppa et al., 1997;Olmos et al., 2003;Uauy et al., 2006;Patil et al., 2009;Suprayogi et al., 2009;Conti et al., 2011). Nonetheless, in our study the highest proportion (15.9%) of variance explained came from the locus hap_2B_9, located on chromosome 2B. Depending on the populations, the QTL explaining the largest variance of protein content in durum wheat was found on different chromosomes, such as 2A (Blanco et al., 2006), 5B (Golabadi et al., 2012), and 5A (Marcotuli et al., 2017). Similarly, in hexaploid wheat, the number of QTL controlling protein content were reported on different chromosomes (Börner et al., 2002;Sun et al., 2008;Wang et al., 2012;Li et al., 2013;Terasawa et al., 2016;Tiwari et al., 2016;Zou et al., 2017). These results suggest the complex inheritance of grain protein and the strong influence of the underlying population and the environment.
High semolina protein content determines end-use products quality such as texture, appearance and firmness. However, because of its negative correlation with grain yield (Blanco et al., 2006;Würschum et al., 2016), selecting for high grain yield has indirectly resulted in lines with lower protein content (De Vita et al., 2007;Acreche and Slafer, 2009). To simultaneously improve grain yield and protein content in durum wheat, an index selection method has recently been developed (Rapp et al., 2018). Moreover, it has been reported that protein content substantially decreases (up to 25% loss of protein) during milling mainly with regard to the milling methods (Prabhasankar and Haridas Rao, 2001;Ramberg and McAnalley, 2002;Heshe et al., 2016;Oghbaei and Prakash, 2016). Because the concentrations of protein components follow a gradient (higher concentration in the outer layer) along the different parts of the grain (Tosi et al., 2011;He et al., 2013;Li et al., 2016;Savill et al., 2018) and that this gradient is in part determined by genetic factors (He et al., 2013), one might wonder if protein loss during milling could have some genetic basis. To the best of our knowledge, no investigation for a possible genetic basis of protein loss has been carried out. In this study, five haplotype loci were associated with protein loss, explaining 3.5 (hap_7A_1) to 17.8% (hap_2B_9) of the phenotypic variance. The haplotype locus hap_2B_9, explaining the highest proportion (17.8%) of protein loss variance, also explained 15.9% of the variation of protein content. This result is not surprising given the strong positive (r = 0.82) correlation between these two traits; the higher protein content the bigger the protein loss. Similarly, Savill et al. (2018) reported that any increase in the gradient of protein concentration may result in an even greater amount of protein being lost during milling. In fact, during milling some endosperm tissues remain adhered to the bran layers and because protein is concentrated in the outer layers of the endosperm, a proportionally greater amount of protein relative to starch is lost during the production of white flour. Therefore, there is a need to select for genotypes having high protein content but low protein loss during milling. The availability of many haplotype loci for both protein content and protein loss offers the opportunity to screen different allele combinations to work this challenge out.
A total of 11 haplotype loci were associated with plant height and/or grain yield, explaining low (2.3%) to high (43.9%) proportion of the phenotypic variance depending on the trait. For grain yield, of the six haplotype loci that were detected, hap_4A_5 located on chromosome 4A, explained 38.5% of the phenotypic variance. Yield is the most important and genetically complex trait in wheat, being controlled by a large number of small effect QTL across all chromosomes (Wu et al., 2012). Many QTL and marker-trait associations for yield and yield-related traits (e.g., spike length, spikelet per spike and 1000 kernel weight) have been reported on all chromosomes of wheat (Marza et al., 2006;Heidari et al., 2011;Ma et al., 2012;Li et al., 2015;Marcotuli et al., 2017). The major yield QTL were reported on different chromosomes depending on the study. In particular, a comprehensive QTL analysis in a durum wheat population across 16 environments detected 16 QTL of grain yield, including two major QTL on 2B and 3B, explaining 21.5 and 13.8% of the variance, respectively (Maccaferri et al., 2008). Nonetheless, of the many QTL regions affecting yield and yield-related traits, two strong QTL hotspots were identified on chromosomes 2A and 2B in a durum wheat panel of 208 lines (Sukumaran et al., 2018). In contrast, the most important QTL affecting yield in a durum wheat population evaluated in different environments were reported on chromosome 3B and 3A, explaining 20.7 and 18.0 % of the phenotypic variation, respectively (Roncallo et al., 2017). A number of QTL controlling grain yield have also been reported on different chromosomes of common wheat (Quarrie et al., 2006;Zhang et al., 2010;Wang et al., 2012;Simmonds et al., 2014;Li et al., 2015;Tahmasebi et al., 2016;Sehgal et al., 2017). In particular, one and two major yield QTL were reported on chromosomes 3A (Mengistu et al., 2012) and 3B (Bennett et al., 2012), respectively.
Among the seven haplotype loci associated with plant height, hap_4A_5 (chromosome 4A) and hap_4B_1 (chromosome 4B) explained 43.9 and 30.0% of the variance, respectively. The Rht-B1b gene was mapped on chromosome 4B. These results are consistent with those of many studies that reported major QTL for plant height on chromosomes group 4 in both durum and hexaploid wheat (Gao et al., 2015;Guo et al., 2017;Iannucci et al., 2017;Shi et al., 2017). Moreover, many QTL and marker-trait associations for plant height have been reported, spanning all chromosomes of wheat (Griffiths et al., 2012;Bentley et al., 2014;Zanke et al., 2014). Plant height is one of the most studied traits in wheat because it determines the general architecture of the plant and has strong effect on lodging stability, harvest index and ultimately grain yield. Because tall plants are more susceptible to lodging (Berry et al., 2003), plant height reduction has been a major target for wheat breeding programs for many years. As a result, major dwarfing genes such as Rht-D1b, Rht-B1b, Rht8, and Rht12 have been incorporated in new wheat varieties to reduce plant height without reducing grain yield potential (Flintham et al., 1997;Korzun et al., 1998;Ellis et al., 2002;Borojevic and Borojevic, 2005;Kowalski et al., 2016).
The haplotype locus hap_4A_5 was associated with both plant height and grain yield, explaining 43.9 and 30.0% of the phenotypic variation, respectively. The influence of plant height on grain yield is well known and pleiotropic loci controlling these two traits have been reported in durum wheat (Maccaferri et al., 2008) and common wheat (McCartney et al., 2005;Gao et al., 2015;Cabral et al., 2018).
Overall, the presence of numerous pleiotropic haplotype loci, such as hap_2B_9 (protein content, protein loss, deformation energy), hap_4A_5 (plant height, grain yield) and hap_4B_1 (pigment loss, pasta b * , dough tenacity, gluten index), offers breeders the appealing opportunity for improving multiple traits. Indeed, pinpointing the genes controlling these traits and identifying causal mutations would be greatly resourceful to the wheat community. However, tracking down causative genes is not trivial and might take several years, especially when dealing with many complex traits. For sake of information, we retrieved the genes and their annotations (Supplementary Table S4), using the Chinese spring reference genome (Appels et al., 2018).
Thirty-nine percent of haplotype loci associated with the traits we investigated contained markers with a complete reversal of allelic state, suggesting a tendency of fixation of favorable alleles in our plant material. However, according to the theory of selection limits in finite population such as plant breeding populations, artificial selection is expected to increase the frequency of favorable alleles, with the possibility of other less desirable or selectively neutral alleles being fixed simply by chance (Robertson, 1960). Long-term improvement of a trait influences genes associated with the trait and nearby chromosomal regions via linkage (Hedrick, 2000). Therefore, separating the loci directly controlling the improved trait from those derived from hitchhiking could be challenging (Fu and Somers, 2011). Thus, for each haplotype locus, we only kept the alleles series offering the largest allelic effect on the phenotype.
The time at which the switch in allelic state occurred varied among SNPs within each haplotype locus, probably due to the involvement of different genes in the genetic control of the targeted traits. The selection response of complex traits is the result of simultaneous changes in allele's frequencies across many genetic variants and large effect loci are more likely to be fixed rapidly (Johansson et al., 2010). However, because of the relatively small size of our population (192 lines) and the possible involvement of many genes with small effects, it is challenging to predict how fast the favorable alleles will be fixed. Similarly, many years (1845-2004) of improvement of different traits in Canadian hard red spring wheat introduced significant changes in allele frequencies at different loci across the whole genome (Fu and Somers, 2011).
Our results clearly demonstrate the impact of artificial selection on the dynamics of the durum wheat genome, in terms of allelic changes at selected loci over time. The long-term breeding efforts have impacted different chromosomal regions as many genes were targeted by the selection of complex traits.

CONCLUSION
Eighty-four LD-based haplotype loci showed strong evidence of selection over 60 years of breeding in Canadian durum wheat germplasm. The distribution pattern of these loci was different between chromosomes, outliers spanned several genomic regions on the entire chromosome (e.g., 1B, 2B, and 7A) or were clustered at only few localized regions on the chromosome (e.g., 2A and 4A). The association analysis detected 54 haplotype loci, of which 39% contained markers with a complete reversal of allelic state. This tendency towards fixation of favorable alleles corroborates the success of the Canadian durum wheat breeding programs over time. All of the loci under selection uncovered by this study could be helpful in the identification of genes related to many traits of interest as functional annotations of the wheat genome become available, and to facilitate marker-assisted selection in durum wheat.

AUTHOR CONTRIBUTIONS
AN designed the experiment, performed all analyses and wrote the manuscript. JKH contributed to the initial draft and edited the manuscript. KTN and SW edited the manuscript. AKS, FRC, JMC, and YR collected phenotypic data and edited the manuscript. CJP supervised the project, designed the experiment, collected phenotypic data and edited the manuscript.

FUNDING
This research was conducted as part of the Canadian Triticum Applied Genomics (CTAG 2 ) project. We are grateful for the financial support of Genome Canada, Genome Prairie, Western Grains Research Foundation, Saskatchewan Wheat Development Commission, and the Saskatchewan Ministry of Agriculture.

ACKNOWLEDGMENTS
We acknowledge the technical assistance of Shawn Yates at the Agriculture and Agri-Food Canada Research and Development Centre, Swift Current for curation of the database from which the breeding lines data were drawn. The data from these trials were conducted under the auspices of the Prairie Recommending Committee for Wheat, Rye and Triticale and we acknowledge them for permission to use the data, and the many people who conducted the field trials. We acknowledge the coordinators of these trials, including Dr. D. Leisle, Agriculture and Agri-Food Canada, Winnipeg (retired) from 1961 to 1994, Dr. J. M. Clarke, from 1995 to 2007, Dr. A.K. Singh from 2008 to 2012, and Dr. R.M. DePauw in 2013, all from Agriculture and Agri-Food Canada. We also acknowledge those responsible for end-use quality analysis of these trials from the Grain Research Laboratory of the Canadian Grain Commission. We are also thankful to Krystalee Wiebe and Justin Coulson for support in generation of the molecular data used in this project. We are grateful to Brook Byrns for his help in retrieving the genes annotation information. Special thanks go to François Roewer-Despres for his valuable advices and helpful assistance in Python programming.