Genome Wide Associations of Growth, Phenology, and Plasticity Traits in Willow [Salix viminalis (L.)]

The short rotation biomass crop willow (Salix genera) has been of interest for bioenergy but recently also for biofuel production. For a faster development of new varieties molecular markers could be used as selection tool in an early stage of the breeding cycle. To identify markers associated with growth traits, genome-wide association mapping was conducted using a population of 291 Salix viminalis accessions collected across Europe and Russia and a large set of genotyping-by-sequencing markers. The accessions were vegetatively propagated and planted in replicated field experiments, one in Southern Sweden and one in Central Sweden. Phenology data, including bud burst and leaf senescence, as well as different growth traits were collected and measured repeatedly between 2010 and 2017 at both field environments. A value of the plasticity for each accession was calculated for all traits that were measured the same year in both environments as the normalized accession value in one environment subtracted by the corresponding value in the other environment. Broad-sense accession heritabilities and narrow-sense chip heritabilities ranged from 0.68 to 0.95 and 0.45 to 0.99, respectively for phenology traits and from 0.56 to 0.85 and 0.24 to 0.97 for growth traits indicating a considerable genetic component for most traits. Population structure and kinship between accessions were taken into account in the association analyses. In total, 39 marker-trait associations were found where four were specifically connected to plasticity and interestingly one particular marker was associated to several different plasticity growth traits. Otherwise association consistency was poor, possibly due to accession by environment interactions which were demonstrated by the low structure adjusted accession correlations across environments (ranging from 0.40 to 0.58). However, one marker association with biomass fresh weight was repeatedly observed in the same environment over two harvest years. For some traits where several associations were found, the markers jointly explained over 20% of the accession variation. The result from this study using a population of unrelated accessions has given useful information about marker-trait associations especially highlighting marker-plasticity associations and genotype-by-environment interactions as important factors to take account of in future strategies of Salix breeding.


INTRODUCTION
Fast growing trees for bioenergy has been of interest since the 1980s and lately the increasing demand of non-fossil fuels has further put fast growing woody biomass production into focus. Fast growing shrubby Salix species and more recently also Populus species have been used in a short rotation coppice (SRC) system with harvests every 3-5 years from the same plants in order to continuously have a high biomass production (Weih, 2004). Breeding programs for Salix and Populus has been developed both in Europe and North America to fulfill the goals of high producing and well adapted plant material for cultivation (Karp and Shield, 2008;Kuzovkina et al., 2008;Karp et al., 2011;Stanton et al., 2010;Serapiglia et al., 2013). In Europe Salix viminalis, S. dasyclados, and S. schwerinii have been the main willow species used for SRC plantations (Karp et al., 2011;Hanley and Karp, 2014). Salix viminalis has a history of cultivation to provide raw material for basketry and for stabilization of river banks across Europe and has been domesticated to some degree due to trading of clonal material between countries resulting in successive expansion of the species geographic distribution (Lascoux et al., 1996;Kuzovkina et al., 2008;Berlin et al., 2014). Despite this the Salix species of interest for SRC are still at a low level of domestication implying substantial genetic variation within species and a high degree of individual heterozygosity (Berlin et al., 2011. As a result, recurrent selection programs have developed cultivars with increased biomass production compared to old varieties originating from wild collections (Åhman and Larsson, 1994;Larsson, 1998;Kuzovkina et al., 2008;Karp et al., 2011).
An important aspect of breeding for improved local adaptation is how environmental differences influence the performance of the different genotypes, also called genotype-by-environment interaction (G × E). The response of a genotype across environments reflect the plasticity of the genotype and its possibility to survive and grow in different environmental conditions (Bradshaw, 1965;Schlichting, 1986;Wu, 1998). Studying and identifying genetic factors that regulate plasticity of traits is highly interesting from an evolutionary and ecological point of view since it ultimately sets the limit for the distribution of a species (Stapley et al., 2010;Tétard-Jones et al., 2011;Savolainen et al., 2013). For plant breeding it is also crucial to understand how plant material can be transferred and used in different environmental conditions in order to set up breeding zones within which material can be transferred without losing in productivity (Via and Lande, 1985;Nicotra et al., 2010). To identify the genetic basis of plasticity per se would make it possible to breed for more or less plastic individuals. Many studies have been conducted to reveal the genetic base and the genetic regulation of plasticity in plants (see reviews by Nicotra et al., 2010;El Soda et al., 2014). In Populus quantitative trait loci (QTL) mapping have gained more insight into the genetic background of plasticity (Rae et al., 2008;Fabbrini et al., 2012) and in Rae et al. (2008) evidence for heritable genetic variation for plasticity traits was shown and these traits were in many cases regulated by QTL different from QTL regulating the traits themselves. Also, in Salix species, studies show clonal differences in plasticity across environments and identified QTL for plasticity demonstrates a genetic basis for the plasticity traits .
To further increase the genetic gain in a recurrent selection breeding program, selection tools using molecular markers could be included in the breeding process (Allwright and Taylors, 2016). A first step toward marker-assisted selection (MAS) has been to dissect the genetic basis for different important breeding traits and identify markers influencing the trait variation. This has been accomplished with several molecular marker techniques and various statistical approaches giving different genomic resolution. For instance, QTL mapping can be performed, where regions in the genome important for the trait variation are indicated, whereas in association mapping either variation in trait candidate genes or genome wide single nucleotide polymorphism (SNP) markers are used to attempt the identification of the actual gene that regulate the trait variation. A drawback with QTL-studies is that the variation seen in the mapping population originates from a limited set of parents and thus restricts the generality of the results. In association mapping studies, on the other hand, where preferably unrelated individuals are sampled from natural populations of the species, any detected genotype-phenotype association would have a more general applicability for the species since more alleles are studied (Khan and Korban, 2012). Because recombination at an evolutionary timescale has made the extent of linkage disequilibrium very short, many thousands of markers are needed for a genome wide association study (GWAS) to be able to find an association (Neale and Kremer, 2011).
Salix viminalis is an excellent model species for trees due to the possibility of propagating genotypes vegetatively and thus simplifying the establishment and analysis of common garden experiments across environments. Salix viminalis also has a very short generation time of one or two years which facilitates genetic studies and breeding. Furthermore, S. viminalis is dioecious (like other species of the Salicaceae family) and thus 100% outcrossing giving a high genetic diversity and heterozygosity . Linkage disequilibrium for S. viminalis has been estimated to vary from 0.5 to 0.6 with a decrease to 0.2 at around 4000 bp, a slower decrease than in S. schwerinii which could be of importance for GWAS and the possibility to identify associations (Berlin et al., 2011). With a population of S. viminalis consisting of 291 accessions earlier genetically and phenotypically described  and studied for associations using a candidate gene approach , we here perform a GWAS with markers positioned across the whole genome and biomass traits. The main objective of this study was to identify genotype-phenotype associations for different biomass related traits as well as for plasticity traits. We report on multi-year and multi-environment association analysis on phenology and growth traits as well as plasticity across environments.

Plant Material and Field Experiments
The association mapping population includes S. viminalis accessions that originates from the United Kingdom, Sweden, Belgium, Germany, and Western Russia , and from natural willow stands in the Czechia (Trybush et al., 2012 (Table 1) where the town Lund close to Svalöv, showed consistently higher mean month temperatures (difference ranging from 0.9 to 3.6 • C) than that of Uppsala close to Pustnäs. Also, the soil in the Pustnäs experiment is a sandy soil with approx. 2% clay while the soil in the Svalöv experiment is a loam with 15-25% of clay content. A total of 388 accessions were planted with 20 cm long cuttings at a density of 15,000 ha −1 at both field experiments. Each accession was represented by six clonal replicates per experiment arranged in a randomized complete block design. The spacing was 130 cm between rows and 50 cm between plants within rows. To avoid border effects, two rows of the accession 78183 were planted outside the experimental plants. Later genetic analyses revealed that several of the accessions were the same clone Hallingbäck et al., 2016) and subsequently they were collapsed and treated as one accession leading to a final number of 321 accessions. Except for the first year when no fertilization was made to the field experiments, fertilization corresponding to 70 kg N ha −1 was applied each year. The plants were cut back in winter 2009/2010, 2013/2014, 2014/2015, and 2016/2017 at both experiments and an additional cut back was made during winter 2010/2011 at Svalöv. Further details about the association mapping collection are found in Berlin et al. (2014).

Phenotypic Measurements
Leaf bud burst was assessed on each individual plant in Pustnäs during spring 2010, 2011, 2013, and 2014 using a scale between 1 and 5 where 1 equals no sign of bud growth and 5 equals the most developed bud burst stage, with one or more leaves growing perpendicular to the shoot axis (Weih, 2009). The bud burst assessments were repeated during a period between late April and middle of May in order to find the time point for each year where the most variation in bud burst was observed (May 5, 2010;April 15, 2011;May 2, 2013;April 29, 2014, respectively). These timepoints were chosen as traits for further analyses (BB10, BB11, BB13, BB14; Table 2). Leaf senescence and abscission were visually assessed in Pustnäs on November 4 and 5, 2010 (LS10), on October 31, 2011 (LS11), and in Svalöv on November 9, 2016 (LS16) according to a leaf senescence index (LSI) from 0 to 4 with 0 = no leaves left on the plant (100% abscission) and 4 = more than 80% green leaves (∼10% abscission) (Ghelardini et al., 2014). During winter 2010/2011 and 2015/2016 two growth traits closely related to total biomass (Nordh and Verwijst, 2004) were assessed at Pustnäs, while at Svalöv these measurements were taken only during winter 2015/2016. The number of shoots (Nsh11, Nsh16) was counted and the diameters on all shoots above 105 cm from the ground were measured. Mean diameter (MeanD11, MeanD16) and the summed basal area (SumBA11, SumBA16) were calculated and used for analysis. During cut back of all plants winter 2013/2014 and 2016/2017 fresh weight (FW14, FW17) and total number of shoots (Nsh14, Nsh17) of each plant were taken at both experiments.

Genotyping-by-Sequencing
From unique accessions of the study population, young leaves were sampled (approximately 200 mg) and DNA was extracted following a CTAB-protocol described in Pucholt et al. (2017), which in turn was a modification of the protocol from Brunner et al. (2001). DNA-extracts were genotyped with the genotyping-by-sequencing method (GBS) at the Genomic Diversity Facility, Cornell University, Ithaca, NY, United States. In a manner much similar to that of Elshire et al. (2011), the DNA was digested by the ApeKI restriction endonuclease, ligated to sample specific barcode adapter sequences and subsequently sequenced on an Illumina Next-generation sequencing (NGS) platform (Illumina Inc., San Diego, CA, United States). Sequence reads and polymorphisms were provided at the Genomic Diversity Facility by running the Tassel GBS analysis pipeline v. 3.0.166 (Glaubitz et al., 2014) using the available genome sequence of the close relative Salix purpurea as a mapping reference (Zhou et al., 2018; Salix purpurea v1.0, DOE-JGI 1 ). For the 291 accessions for which sampling, DNA extractions, and genotyping were successful, DNA sequence sites showing polymorphisms were identified. The resulting genotype data were merged by Tassel ver. 4.3.7, provided to us as Variant Call Format files (VCF) and is stored at a publicly available repository (Hallingbäck et al., 2019). The polymorphisms consisted mainly of biallelic SNPs (1,235,800) plus smaller numbers of alleged triallelic SNPs (231,512), biallelic indels (67,435), and triallelic indel/SNP combinations (21,047). For each putative polymorphic site, diploid genotypes were called by finding the maximum likelihood for the observed distribution of haplotype sequence reads (Hohenlohe et al., 2010). But in order to avoid biases and to ensure sufficient calling accuracy, genotypes were called only provided a read depth of at least five at any particular site and accession. Otherwise the genotype was set as missing. Assuming an Illumina genotyping error rate of 0.1% this procedure implies a diploid genotype calling accuracy above 97%. The polymorphism data produced by GBS was thereafter merged with genotype data from 1290 SNPs previously developed for this population . Mapping of the old SNPs to the S. purpurea genome was performed by Blast (e < 10 −9 ) whereupon all polymorphic sites were sorted according to increasing chromosome number and position with respect to the reference. The polymorphic sites were then filtered based on data completeness and on minor allele frequency (MAF) depending on intended downstream use. All the above-mentioned processing steps were performed with VCFtools ver.0.1.12b. Furthermore, to reduce the number of sites produced by erroneous merging of paralog or repetitive sequences, all sites for which the frequency of heterozygotes exceeded 70% were removed using a custom perl script. Assuming Hardy-Weinberg equilibrium, even triallelic sites should never exhibit heterozygosity frequencies above 66.7%.

Structure and Kinship Analysis
In order to take into account the effects of population structure and relatedness, structure and relatedness was inferred. This was done using a dataset which used all polymorphic sites with >95% called genotypes and a MAF of >1% resulting in total 19,243 markers. Using this dataset, a kinship matrix (K) for the sample population was estimated in accordance with Loiselle et al. (1995). Moreover, population structure was investigated by applying an admixture model with correlated allele frequencies within populations using the software STRUCTURE ver. 2.3.4 (Pritchard et al., 2000;Falush et al., 2003). In order to avoid potentially adverse effects of linkage disequilibrium between markers the dataset used for this was reduced by selecting every fifth marker (3,848 markers) out of those used for kinship estimation. The number of clusters assumed (k) was set to vary from one to ten. The length of the burn-in period was set to 20,000 iterations and the subsequent sample recording period was set an additional 40,000 iterations. For each k, ten replicates were run. Similarly to previous studies using smaller marker sets , the optimal value of k was found to be 4 by examination of the maximum log-likelihood at convergence and on Evanno's k statistics (Evanno et al., 2005). A graphical illustration of the ancestry of individual accessions to the population structure clusters is shown in Supplementary Figure S1.

Development of Accession Estimators
To obtain unbiased estimators (BLUEs) for each accession and trait, phenotypic data was subjected to variance decomposition applying linear mixed models (LMM) with the statistics software ASReml 3.0 (Gilmour et al., 2009). All analyses were made on untransformed data since, even though the distributions for some of the number of shoots, summed shoot basal area and fresh weight deviated from normal, the results from the analysis with transformed data did not differ from analyses with original data. The mixed linear model applied was: where y ijkl is the phenotypic trait value in the ith block for the jth accession located at row k and position l. The overall mean is denoted as µ while b signifies the fixed block effect, c the fixed accession effect, r × p the random interaction between rows and positions (spatial term) and e the random residual. Random effects were assumed to be independent except for the spatial term (r × p) which was restricted to follow a two-dimensional first order autoregressive correlation structure (Cullis and Gleeson, 1991) across plant rows and positions. Thus accession estimators (y es ) adjusted for block and spatial environmental effects were obtained from the effect estimates as y es,j =μ +ĉ j and were used in subsequent analyses.
In addition to the estimators per se, the (broad sense) accession estimator heritability (H 2 c )was also estimated for each trait and for that purpose c was instead considered to be random. H 2 c may serve as an indicator of accession estimator precision and was calculated from the estimated accession variance (σ 2 c ) and error variance (σ 2 e ) components as: where n is the harmonic mean of the number of measured plants per accession. For each clone, plasticity variables of traits were calculated as the standardized (mean = 0, standard deviation = 1) BLUE values in Pustnäs subtracted with standardized BLUE values in Svalöv when traits were measured the same year (Nsh14, Nsh16, Nsh17, MeanD16, SumBA16, FW14, FW17).

Association Mapping Marker Dataset
Polymorphic sites used as markers for the association mapping itself were required to be biallelic, have clear genotypic calls for at least 75% of the individuals and MAF > 5%. Furthermore, any adjacently located indel polymorphisms that would cancel each other out (erroneously called in the GBS pipeline) were removed. The LD (r 2 ) between closely situated markers (according to the S. purpurea reference) was estimated using a maximum likelihood method (Hill, 1974) implemented in the function "LD" in the Genetics library of R. This calculation was performed using a sliding window scheme where the LD was estimated between each marker and its 100 closest neighbor markersupstream and downstream. In this way, certain groups of markers were found to be redundant (r 2 > 0.99) and only one representative marker, having the highest genotype call percentage in that group, were retained for further analysis. Redundant markers were nonetheless kept in a separate accessory marker list to enable future reference. In all, the fully filtered association mapping dataset comprised 19,411 polymorphic markers. Missing marker genotypes were imputed using the LD-kNNi method as implemented by Money et al. (2015) in the software LinkImpute. The number of nearest relatives (k) and the number of SNP markers within a 10 Mb distance that were in closest LD with the marker to be imputed (l), were optimized by LinkImpute and were determined as k = 15 and l = 5.

Association Mapping Analysis and Model Selection
Using the imputed marker dataset and the previously calculated accession estimators (y es ), a second series of multilocus linear mixed model (MLMM) analyses were performed, again using ASReml 3.0 (Gilmour et al., 2009). The number of accessions having both the required genotype data and accession BLUE estimators were 291 and 288 for Pustnäs and Svalöv, respectively (288 accessions in common). The MLMM were applied in separate analyses for each trait at the Pustnäs experiment, the Svalöv experiment and for the plasticity traits as: where y es is the vector form of the BLUE accession estimators (y es,j ), q and g are the vectors of fixed population ancestry and SNP genotype effects, respectively, u is the vector of random chip additive genetic effects (Speed et al., 2012;Kruijer et al., 2015) linked to the marker-based kinship matrix K, and e es is the vector of the random residual effects. The design matrix S constitutes the individual genotypes for one or several analyzed SNPs as separate and independent factors, implying genetic effects of the form: g = [g AA,1 g Aa,1 g aa,1 ... g AA,n g Aa,n g aa,n ] T for markers 1 to n included in the model, each featuring the genotypes AA, Aa, and aa, respectively. F and Z link the respective individual ancestry proportion and additive chip effect to its observation. All effects were considered to be statistically independent except for the chip additive genetic effects whose variance was assumed to be structured as Var (u) = 2σ 2 A K where σ 2 A is the chip additive genetic variance. The number of markers included in the Sg model factor was determined by consecutive scans of all markers using the forward-backward stepwise model selection approach of Segura et al. (2012). Structure component effects (q) and chip additive genetic variance (σ 2 A ) were re-estimated for each analysis. In the forward model selection phase, the SNP with the largest −log(p) values (Wald-F test) was selected and during the backward phase, the included SNP with the smallest −log(p) value was eliminated. Forward selections of up to five markers were made and the model with the maximum number of markers all exhibiting Bonferroni corrected p-values of 0.2 or less (p < 1.03 × 10 −5 ) was reported and further analyzed. The percentage variance explained of any marker i added to the model (R 2 δi ), was calculated as the difference between the percentage variance explained of all fixed effects already in the model including marker i (R 2 incl ) and the corresponding percentage of variance explained of a reduced model including the same fixed effects except for marker i (R 2 excl ). Please note that R 2 incl and R 2 excl comprise fixed effects of ancestral structure (q) as well as all marker genotype effects (g) of stronger statistical significance than marker i in order to discount R 2 inflation due to collinearity (e.g., due to LD). Both R 2 incl and R 2 excl were estimated using the fixed effect estimators for their respective models as advocated by Nakagawa and Schielzeth (2013). Finally LD correlations were estimated between all significantly trait-associated markers using the "LD" function of the Genetics R-package.
For each identified marker association, all genes in the region 3,000 bp up-and downstream the marker position were checked in the S. purpurea genome (Zhou et al., 2018; Salix purpurea v1.0, DOE-JGI, see footnote 1). The size of the region to check genes were based on LD estimates of S. viminalis (Berlin et al., 2011), which show a decay of LD to a non-significant value of 0.2 at a distance of 4,000 bp from the marker.

Chip Genetic Parameter Estimation
For the association analyses the full model previously presented was used, but in order to estimate other genetic parameters adjusted for ancestral structure such as the chip narrow sense heritability (h 2 s ) and accession correlations between traits or environments (r s ), analyses were made were the model component of individual markers (Sg) was excluded. y es = Fq + Zu + e es Thus, for each trait, h 2 s was calculated from the estimated chip additive genetic and residual variance components (σ 2 e,es ) as: Furthermore, in order to structure adjusted estimate accession correlations (r s ) between traits thereby also assessing the potential occurrence of genotype-by-environment (G × E) interactions, multivariate extensions of the previously described model were made treating each trait-year-environment combination as a separate variate (e.g., Burdon, 1977). Accession correlations were based on the summed variances of the random effect model terms for any trait 1 or 2 (e.g., σ 2 A1 + σ 2 e,es1 ) and the correspondingly summed covariances between trait 1 and 2 (σ A12 + σ e,es12 ) as: Estimation errors for all genetic parameters were calculated using the delta method based on the Taylor series approximation (Casella and Berger, 2002).

Trait Means and Variation
The association mapping population showed large variation for all measured traits both at Pustnäs and at Svalöv ( Table 2 and Figure 1).

Marker Variation
The marker dataset used for association mapping analyses consisted of 19,411 polymorphic markers fairly evenly distributed across the genome (Figure 2). Of these markers 18,273 (94%) were genotyped using the genotyping-by-sequencing approach while the remainder 1,138 markers were available from previous candidate gene exploration . Mapping to a chromosome of the established S. purpurea genome was possible for 16,943 markers (87%) while the remainder 2,468 markers could only be mapped to 569 incompletely integrated sequence in smaller sized scaffolds (Figure 2). 17,853 (92%) of the markers were regular biallelic SNPs while the remainder 1,558 (8%) were single nucleotide indels.   the relationship between accessions was generally low, estimation errors of the chip heritabilities are considerable. Structure adjusted accession correlation estimates across environments were low-to-moderate for the different traits (0.39-0.58) indicating G × E-interactions (Table 3). This was also reflected in  Figure 3A).

Genetic Parameters
For structure adjusted accession correlations between traits at the same environment but measured during different years, estimates were moderate-to-high (0.45-0.94) for the number  of shoots (Nsh), estimated biomass (SumBA) and fresh weight (FW) both at Pustnäs and Svalöv (Table 4). For FW this is also illustrated in Figure 3 where, especially in Svalöv, the accessions with the highest FW in 2017 also in many cases exhibited the highest FW in 2014. Bud burst in Pustnäs did not show high correlations between measurements (0.15-0.52).
In particular the 2013 bud burst assessment correlated poorly with the assessment made in 2010 as well as in 2014. Bud burst in 2010 and 2014 was scored from the stump close to ground level which implies an environment different from the budburst scoring in 2013 which was performed on 3-yearold stems on a height level of one or several meters above the ground. This difference would be the most reasonable explanation for the low r s -estimates. The structure-adjusted accession correlations between different growth traits were above 0.50 in 50% of the possible correlations in Pustnäs and in 82% of the correlations in Svalöv while correlation magnitudes between the two different phenology traits measured in Pustnäs never exceeded 0.5 (−0.07 to 0.15) ( Table 4). Very few accession correlations between phenology and growth traits at any of the environments were above 0.5 (2% in Pustnäs and none in Svalöv).

Association Mapping
In total, across all traits and environments as well as for the plasticity of traits, 39 marker-trait associations of significance (p < 0.2 after Bonferroni correction) were found (Tables 5, 6 and Supplementary Figures S2-S8). Of these associations five were based on recessive effects linked to a homozygote group with only one accession while an additional six associations were based on recessive effects linked to a homozygote group of five accessions with very close kinship (θ ranging from 0.620 to 0.672) that originated from beyond the Ural Mountains. These particular associations should thus be treated with skepticism. However, discounting such associations ten associations between markers and phenology traits were nonetheless found in Pustnäs   x Estimated magnitude larger than twice the size of their standard errors are given in bold.
( Table 5). For growth traits we also found eight such associations in Pustnäs. In Svalöv no significant association was found for the only phenology trait (LS16), but for growth traits six markertrait associations not affected by the aforementioned issues, were identified. When considered individually, trait-associated markers explained from 1.8 to 18.1% of the accession variation but when discounting for association effects dependent on single homozygotes or to the five accessions beyond the Ural Mountains, any single trait-marker association explained up to 9.2% of the accession variation. Taking into account the variation explained by all marker associations for a trait, R 2 tot -estimates could reach 20.8% (BB13, BB14) when many markers were identified. Significant markers were rarely associated to several traits simultaneously but the marker S1_361520786 was a notable exception being connected to fresh weight in Svalöv during both harvests in 2014 and in 2017 ( Table 5). With respect to this marker the rare allele homozygotes exhibited 130 to 151% better growth in terms of biomass traits (SumBA, FW) in Svalöv ( Figure 4B) in comparison to the population mean. Indeed this marker showed effects of similar tendencies for SumBA11 and FW14 also in Pustnäs (25-80%; Figure 4A) although this trend was not statistically significant (p > 0.2 after Bonferroni correction). The interpretation of the marker S1_361520786 effects also requires some caution because only two rare allele homozygotic accessions constituted the basis for the apparent recessive effect raising the possibility that the effects may be overestimated. Nonetheless the effect cannot be dismissed as a case of cryptic structure because the two homozygotic accessions involved were almost completely unrelated according to kinship estimates (θ = 0.010).
A total of four associations were found for plasticity traits. Interestingly, one specific marker on chromosome 12 (S1_246135575) was associated both with the plasticity of number of shoots 2014, SumBA16 and FW17 explaining from 7.4 to 8.6% of the accession variation ( Table 6 and Supplementary  Figures S7, S8). In Figure 5 the effects of this indel marker is shown for different genotypes both with respect to the plasticity traits (subplots E,F) as well as for the biomass traits themselves at the two field experimental locations (subplots A-D). The addition of an insert-C allele appeared to confer a substantial negative effect in Svalöv for SumBA16 and FW17 (−12 to −17%, Figure 5D) while exerting a positive effect on the corresponding traits in Pustnäs (7-20%, Figure 5B). A similar although less obvious pattern was also observed for the number of shoots (Figures 5A,C). In terms of plasticity this implied an overall positive additive effect of an added insert-C allele ranging from 0.10 to 0.61 accession standard deviations (Figures 5E,F).
In order to know whether the multiple marker-trait associations detected in this study could reflect a smaller set of unique associations, we estimated linkage disequilibrium (LD) correlations between all markers that had significant trait associations. However, such correlation estimates were generally very low indicating no or little LD (Supplementary Figure S9). Only 14 marker pairs (out of 595) showed LD correlations with magnitudes above 0.2. Seven of those cases, including the case with the strongest  a The association is dependent on the effect of a single BB-homozygote individual and should be treated with caution. b The association dependent on a very close cluster of BB-homozygotes originating beyond the Ural mountains and should be treated with caution. c The underlying marker is located in the indicated scaffold not yet successfully incorporated into the Salix purpurea genome assembly. LD correlation (r = 0.35), were between markers whose trait association effects depended on the rare homozygotic accession group originating beyond the Ural mountain range and these were viewed with skepticism anyway. In summary, there was no substantive evidence that the other associations constituted multiple reflections of a lower set of unique associations. All genes in the region 3000 bp up-and downstream the associated markers are reported in Supplementary Table S1. Many of the genes were connected to membrane transport or to transcription factors. The marker that in Svalöv repeatedly was identified for FW14 and FW17 (S1_361520786) was situated in a gene connected to translational initiation factor 4B. Translational initiation factors are involved in a number of processes connected to growth and to interaction with different stresses and could thus be important factors in plant improvement (Dutt et al., 2015). The marker identified for plasticity of different growth traits (S1_246135575) was in a gene resulting in an integral membrane HPP family protein which is a large group of proteins that involved in many different processes connected to membranes. Both markers were in the non-coding region of the genes.

DISCUSSION
In this study, we investigated associations between a genome wide set of markers and growth and phenology related traits, measured repeatedly during several years and in two different environments. Genome wide association studies with biomass traits has never been conducted in Salix viminalis but recently within the same genera in S. purpurea (Carlson et al., 2019). Previously, Hallingbäck et al. (2016) used a candidate gene approach and identified associations between growth traits and phenology traits and markers in S. viminalis. An obvious drawback with this approach is that only genes with known functions will be analyzed. Compared with previous QTL-mapping studies, (e.g., Berlin et al., 2014Berlin et al., , 2017Ghelardini et al., 2014) the identified associations in the current GWAS and in the candidate gene study will have a better resolution and be closer to the causative genes due to the shorter LD in a population of unrelated individuals compared to biparental populations (Neale and Kremer, 2011;Khan and Korban, 2012). On the other hand, in a population with short LD, a GWAS requires a large number of markers to cover the complete genome to identify associations (Khan and Korban, 2012). In S. viminalis LD has earlier been estimated to drop to low non-significant values within 4000 bp from a marker which is longer compared to the closely related species S. schwerinii (Berlin et al., 2011) and show promise for possibilities to identify associations with a reasonable number of markers. Here we have identified 19,411 SNPs and single nucleotide indel markers from a GBS-effort, quite evenly distributed across the genome, to use in a first attempt of GWAS with biomass traits in Salix viminalis.
The population of S. viminalis used in the study contains accessions collected from the British Isles in the west throughout Europe into Russia as well as clonal material from breeding programs in England and in Sweden . It could be regarded as a population well representing Western and Central Europe while being a bit sparse in the Eastern distribution of the species. The population could also, based on GBS marker data (this study) and microsatellite data , be divided into four subpopulations where one subpopulation includes the accessions with Russian origin. One complication was that the kinship analyses revealed another substructure within this Russian population where five accessions originating from diverse places beyond the Ural Mountain range were all indicated to be in very close kinship (Hallingbäck et al., 2016 and present study). As the population size was rather limited (291) it was not considered prudent to eliminate accessions only due to this reason. Instead, throughout the remainder of the discussion such associations whose effects are solely dependent on single accessions or the accessions beyond the Ural Mountains (footnotes in Tables 5, 6) will not be seriously considered even though they may appear strongly significant or show high R 2 -estimates.
Substantial phenotypic variation could be seen for all traits at both environments with mostly high broad-sense accession estimator heritability (H 2 c ) values for phenology traits and medium-to-high for growth traits. Also, the narrow-sense chip heritabilities (h 2 s ) were considerable with values up to 0.99 for phenology while estimates were more variable for growth traits (0.24-0.97) depending of trait and environment. This general trend with higher heritability estimates for phenology traits compared with growth traits was also observed earlier (Rönnberg-Wästljung, 2001) where two large factorial crossings of S. viminalis with clones of Swedish and Polish origin were used. This also seems to be a general trend in many boreal and temperate tree species (e.g., McKown et al., 2014;Guerra et al., 2016). Genetic correlation estimates between phenology and biomass traits were low indicating different genetic background of the different types of traits and suggest that these traits could be selected for independently of each other. In an earlier biparental QTL study , phenology QTL overlapped in some cases with QTL for different biomass traits which could indicate a common genetic background but it should nonetheless be cautioned that the broad QTL regions identified in that study contains many different genes raising the possibility that the QTL overlap merely reflected linkage.
Taking into consideration the structure of the population as well as the kinship between accessions and corrections for multiple testing we could in total identify 25 associations across environments, traits and years whose genotypic effects could not be ascribed to either a single homozygotic accession or to the tightly related group of trans-Ural accessions. For bud burst (BB13, BB14) and for the number of shoots in 2011 (Nsh11) in Pustnäs, such associated markers explained over 20% of the total accession variation. GWAS in different Populus species identified markers for different wood traits (Porth et al., 2013;Du et al., 2015b) that in total explained above 20% of the variance for each trait. Du et al. (2015b) also identified markers for different growth traits as stem diameter, tree height and stem volume, that together explained from 24 to 33% of the accession variation. In any case it should be noted that even 20% explained variation is still a limited proportion in comparison to the percentage of accession variation explained by genetics overall, namely H 2 c which varied from 59% to 82% for BB13, BB14, and Nsh11. This implies that, given the current sample size (291 accessions), a substantial portion of the genotypic variation was still unaccounted for in FIGURE 5 | Genotype effects for the marker S1_246135575 on number of shoots (Nsh), mean shoot diameter (MeanD), summed basal area of shoots (SumBA), and fresh weight (FW) at Pustnäs (A,B), Svalöv (C,D) and for the corresponding differences (plasticity) between Pustnäs and Svalöv (E,F). Genotype effects at Pustnäs and Svalöv are described as percentages relative to the population mean, while plasticity effects are given in unit standard deviations of BLUE accession estimators. The values in parentheses after each genotype (class) describe the number of accessions having that genotype (class).
terms of individual marker-trait associations. In addition, we also found three associations, not connected to single homozygote or trans-Ural accessions, connected to plasticity of biomass traits.
Most of the associations individually explained a limited portion of the total accession variation ranging from 2.3 to 9.2%. Again, considering the previously mentioned large portions of genotypic variation unexplained by marker associations, and that the limited size of the study population would allow only the marker-trait associations with the greatest R 2 to be detected with any confidence, the results of this study agree with the notion that biomass and phenology traits are regulated by a multitude of small effect QTL, a concept originally suggested by Fisher (1918). This is furthermore in agreement with previously reported GWAS results from Populus studies (McKown et al., 2014;Du et al., 2015a) and further supported by meta-analyses for a number of quantitative traits in tree species in general (e.g., Hall et al., 2016).
For traits studied in Pustnäs and Svalöv there was only one case where the marker-trait association was stable from one year to another (FW14 to FW17 in Svalöv). In line with this result, aggregated biomass traits, such as summed basal area and fresh weight, showed consistently strong accession correlations between years of measurement. For other traits, accession correlations were generally lower and associations were less stable and unique for the trait measurement. Some stable associations were found in the previous candidate gene study  where, for example a SNP situated in the ELF3b-gene was consistently identified across years and environments. But also in that investigation, most of the associations were unique. Similar results were found for S. purpurea where most of the identified associations were unique with some few consistent across years and environments (Carlson et al., 2019). The general trend also in Populus is that unique as well as some stable associations are found, e.g., Muchero et al. (2013Muchero et al. ( , 2015 identified several associations for different wood traits that were stable across environments and between developmental stages. It is notable that the association of SNP S1_361520786 to both FW14 and FW17 in Svalöv was largely dependent on a considerable recessive effect linked to two unrelated accessions that were homozygous for the rare C-allele of the SNP (Figure 4). Therefore this association certainly need more examination in order to be confirmed and one should remain skeptical of the large effect which may be considerably overestimated (e.g., Beavis, 1994;Xu, 2003). Nonetheless, one should not entirely disregard associations that are dependent on few individuals or on low allele frequency markers, since the effect may still become a considerable factor if frequency of the rare allele can be increased by MAS or by conventional breeding. Indeed, similar results were detected in the previous candidate gene study using the same population, where an interesting SNP in the ELF3b-gene showed highly significant effects connected to few individuals in one of the homozygote classes . It should be emphasized that the chief reason why the observation of this association was not repeated in this study was that one of the homozygote accessions of the SNP of ELF3b was not successfully genotyped by GBS likely leading to its non-significance in the current analyses. In summary, low frequency alleles might still play an important role to explain the variation in a complex trait and for the missing heritability but are often discarded in association analysis (Brachi et al., 2011;Fahrenkrog et al., 2016).
The two field environments exhibited differences in climate, especially in spring (April) and in autumn (September, October) temperature, and in soil type but both represent typical environments used for practical cultivation of SRC Salix. We found no common marker associations between environments for the different growth or phenology traits. This finding could of course be an effect of the statistical power of the analysis being insufficient to detect one and the same association repeatedly but could also be due to G × E-interactions between the Pustnäs and Svalöv. Indeed, the moderate accession correlations between Pustnäs and Svalöv indicate substantial G × E-interactions between the two environments and Hallingbäck et al. (2016) also identified G × E-interactions between Pustnäs and a contrasting site situated in the United Kingdom (Woburn). Similar patterns with few common associations between environments was also seen for bud burst using a population of Populus balsamifera grown in common garden experiments at two locations in Canada, while bud set shared more common associations between locations (Olson et al., 2013). These circumstances together with the non-stability of markers will have important implications for the eventual conduct of MAS where different sets of markers have to be used in different environments for breeding of adapted plant material.
Given the observed G × E-interactions in the current study, we estimated the plasticity of each genotype by examining the standardized differences in trait accession BLUEs between Pustnäs and Svalöv and then mapped this difference for associations. One specific marker (S1_246135575) was identified to associate with the plasticity of three different growth and biomass traits (Nsh14, SumBA16 and FW17) thus being remarkably consistent. This marker explained around 8% of the accession variation suggesting that we have identified an important element of the genetic base for growth trait plasticity. Furthermore, the genotypic effect of this marker showed that the rare C-insert allele produced an increase in biomass traits in one environment while a decrease in the other environment. Interestingly these effects behaved in a largely additive manner (Figure 5). The main climatic difference between the two field environments are in spring and autumn where Svalöv, the more southern environment, has higher temperatures and thus giving a longer growing season. In addition, the Svalöv environment also has a soil with more clay content which most probably also have an influence of the biomass growth. The marker identified is clearly influenced by environmental differences, an effect that could be used in breeding as indicated below.

Breeding Implications
This is one of the first association mapping study in Salix using a GWAS approach with biomass traits, that successfully has identified a set of candidate markers connected to growth and phenology traits as well as to plasticity of traits. Each marker does not explain much of the total phenotypic variation but the effect of the marker may still be substantial. Adding several markers together explain a greater amount of the accession variation.
In a breeding perspective these markers and preferably a set of markers with specific allele combinations could be of value and interesting to use in the selection of parents as well for selection of individuals in early stages of the breeding cycle. An early selection of seed plants would reduce the number of plants for field testing and thus also make the breeding more cost effective. Some of the marker-trait associations identified were connected to a homozygote genotypic class with few individuals of large effect, these markers need to be validated through, for example, crossings of heterozygotes and study of the phenotypes in the segregating offspring, i.e., multiple family QTL mapping (Ukrainetz et al., 2008).
Even though the accession correlation across environments for the different traits was moderate indicating G × E-interactions, we still observed specific accessions that performed well in both environments. Such accessions could be of use for breeding toward stable performance across environments with plasticity close to zero. Another alternative to handle the G × E is to divide the area of cultivation into different breeding zones utilizing the very best performing accessions in each region and thus breed material for more specific use (Namkoong et al., 1988). As an example, the marker connected to plasticity gave a considerable effect for biomass traits and number of shoots with different effect dependent of environment and could be used to select for stability (heterozygotes) or for different alleles to increase the frequency of homozygotes for better performance in selected environments.
Nonetheless, given the general impression that the studied traits are likely regulated by a multitude of loci each exerting relatively minor effects and the considerable G × E-interactions encountered in this study, it appears difficult to efficiently utilize any specific set of few markers to perform MAS (see also Grattapaglia et al., 2018). A different alternative in this regard would be to perform MAS by genotyping breeding populations using large sets of markers that would utilize the LD with a multitude of small effect QTL, so called genomic selection (Meuwissen et al., 2001;Harfouche et al., 2012;Desta and Ortiz, 2014). The marker-trait associations observed in this or other studies could offer candidate targets for directed genotyping efforts, e.g., by developing SNP arrays. By enlisting markers linked to previously detected QTL into such arrays, further links, based on causality or LD, could be established between markers and QTL. Thereby the selection accuracy of genomic prediction models could be increased and the efficiency of genomic selection would be improved.
The results from this study using a population of unrelated accessions have given basic information about marker-trait associations that are of importance and could be used for future strategies and breeding of Salix. In general, individual associations explained a limited proportion (<10%) of the accession variation although, in concert, associations with budburst and number of shoots occasionally explained about 20%. Associations were generally not repeated across measurement years and environments, due to limited statistical power or to the substantial G × E-interactions observed. Still, one particular association for fresh weight was observed to be consistent across years in the Svalöv experiment and for the plasticity of growth traits between Pustnäs and Svalöv, another association was detected which was consistent across years and traits to a greater degree. Further examination of these associations should contribute to an improved understanding of the genetic architecture of important traits in Salix and facilitate the development of marker-assisted breeding methods for this species.