Single-step genome-wide association study for susceptibility to Teratosphaeria nubilosa and precocity of vegetative phase change in Eucalyptus globulus

Introduction Mycosphaerella leaf disease (MLD) is one of the most prevalent foliar diseases of Eucalyptus globulus plantations around the world. Since resistance management strategies have not been effective in commercial plantations, breeding to develop more resistant genotypes is the most promising strategy. Available genomic information can be used to detect genomic regions associated with resistance to MLD, which could significantly speed up the process of genetic improvement. Methods We investigated the genetic basis of MLD resistance in a breeding population of E. globulus which was genotyped with the EUChip60K SNP array. Resistance to MLD was evaluated through resistance of the juvenile foliage, as defoliation and leaf spot severity, and through precocity of change to resistant adult foliage. Genome-wide association studies (GWAS) were carried out applying four Single-SNP models, a Genomic Best Linear Unbiased Prediction (GBLUP-GWAS) approach, and a Single-step genome-wide association study (ssGWAS). Results The Single-SNP (model K) and GBLUP-GWAS models detected 13 and 16 SNP-trait associations in chromosomes 2, 3 y 11; whereas the ssGWAS detected 66 SNP-trait associations in the same chromosomes, and additional significant SNP-trait associations in chromosomes 5 to 9 for the precocity of phase change (proportion of adult foliage). For this trait, the two main regions in chromosomes 3 and 11 were identified for the three approaches. The SNPs identified in these regions were positioned near the key miRNA genes, miR156.5 and miR157.4, which have a main role in the regulation of the timing of vegetative change, and also in the response to environmental stresses in plants. Discussion Our results demonstrated that ssGWAS was more powerful in detecting regions that affect resistance than conventional GWAS approaches. Additionally, the results suggest a polygenic genetic architecture for the heteroblastic transition in E. globulus and identified useful SNP markers for the development of marker-assisted selection strategies for resistance to MLD.


Introduction
Eucalyptus globulus Labill. is widely planted in temperate countries mainly for the excellent quality of its wood for pulp production, which has a high market value in the paper industry (Tibbits et al., 1997). However, the susceptibility of this species to several pests and diseases limits its use in commercial plantations (Simeto et al., 2020). Mycosphaerella leaf disease (MLD), caused by a complex of fungal species of the genus Mycosphaerella and Teratosphaeria, is one of the most prevalent foliar diseases of E. globulus in natural forests and plantations worldwide (Tibbits et al., 1997). Teratosphaeria nubilosa is considered one of the most virulent MLD species (Mohammed et al., 2003;Hunter et al., 2009). This pathogen infects predominantly juveniles and intermediate foliage and causes severe leaf spotting, premature defoliation and shoot blight (Carnegie and Ades, 2003). Crown damage from MLD in young plantations can range from <10% to 80% which reduces the tree growth and survival, leading to concomitant loss in productivity of affected plantations (Balmelli et al., 2013;Milgate et al., 2005).
A number of silvicultural practices and management strategies have been proposed to minimize the effects of foliar diseases in Eucalyptus plantations. These include avoiding planting on highrisk endemic areas, the application of protectans and fungicides, as well using remedial fertilizer applications. However, these strategies have economic, environmental or operational constraints (Simeto et al., 2020). Thus, breeding for resistance to develop more resistant genotypes is the most promising strategy to effectively control MLD in commercial plantations (Milgate et al., 2005;Hunter et al., 2009;Balmelli et al., 2013). Genetic variation on E. globulus for susceptibility to MLD has been identified between and within families, provenances and genetics groups (Costa e Silva et al., 2013;Balmelli et al., 2014). Considering that this species is markedly heteroblastic, significant genetic variation in the timing of vegetative phase change has also been reported (Balmelli et al., 2013). As young foliage is particularly susceptible to T. nubilosa infection, at least two mechanisms has been proposed to manage MLD disease: increase the resistance of juvenile foliage and increase the precocity change to the resistant adult foliage (Milgate et al., 2005). Several studies suggest that both mechanisms involved in foliar resistance are under moderate to strong genetic control in E. globulus (Smith et al., 2017). But the underlying molecular mechanism and genetic architecture of MLD response has been relatively little investigated (Freeman et al., 2008;Hudson et al., 2014).
Genome-wide association studies (GWAS) are a leading approach for complex trait dissection and identification of genomic segments or alleles that underlie phenotypic variation and thus can be used for breeding (Tibbs Cortes et al., 2021). A common GWAS method in plants is the unified mixed linear model (MLM) proposed by Yu et al. (2006), that accounts for relatedness at two levels: population structure and kinship. Since this method was computationally very intensive, a reparameterization of the MLM likelihood function was proposed (Kang et al., 2010). Despite the statistical improvements, a typical feature of these GWAS methods is sequentially fitting each marker one at a time as fixed effects, resulting in a lack of power to map loci for quantitative traits (Wang et al., 2012). This limitation has been overcome by methods that simultaneously fit high density genome-wide molecular markers by mixed linear models as GBLUP (Genomic Best Linear Unbiased Prediction). Following this strategy, marker's effects are estimated by a linear transformation of genomic breeding values obtained from GBLUP (Gualdrón Duarte et al., 2014). One limitation of this method is that it only includes phenotypes of genotyped individuals. Generally in forest breeding populations, as in crops and livestock populations, only a fraction of individuals in a population are genotyped. Thus, the GBLUP methods were extended to include the information of non-genotyped individuals in prediction analysis, which is the so-called Single-step genomic best linear unbiased prediction (ssGBLUP) (Aguilar et al., 2010). The ssGBLUP approach is widely adopted for genomic prediction in livestock  and recently applied in forest species (Cappa et al., 2019;Quezada et al., 2022). The strength of this approach is the ability to jointly incorporate the information of all genotypes, observed phenotypes and pedigree information in one simple and single step model. This procedure was extended to estimate the marker's effects in an approach called Single-step GWAS (ssGWAS) (Wang et al., 2012). Applying this method, marker effects and the p-values associated are estimated simultaneously for all markers while accounting for population structure using the pedigree and genomic relationship for all individuals in the population .
The advances in next generation sequencing technologies allows the development of accessible high throughput genotyping systems. Particularly for Eucalyptus species, in which marker number and density have been the limiting factors for a long time, dense sets of Single Nucleotide Polymorphism (SNP) genotyping platforms are now available (Silva-Junior et al., 2015). The commercial Eucalyptus chip has been useful to understand the genetic basis of complex quantitative traits of interest, such as growth or wood properties in Eucalyptus species (Du et al., 2018). For example, in E. grandis × E. urophylla breeding populations, eight SNPs were described as main associations for growth traits, which were related to genes involved in cell wall biosynthesis (Müller et al., 2019). Also, it has become possible to detect significant marker-trait associations for growth related traits in species poorly represented in the chip. Thus, a total of 87 SNPs were associated with growth and wood quality traits in E. cladocalyx, revealing associations with genes related to primary metabolism and biosynthesis of cell wall components (Valenzuela et al., 2021). Similarly, a study of E. grandis × E. urophylla hybrids detected 22 quantitative trait loci for growth and wood traits and four for Puccinia psidii rust disease resistance (Resende et al., 2017). This resistance QTLs were positioned near the major QTLs detected in E. grandis for Myrtle rust (Ppr1 locus). Recently, for the same fungal disease, 33 highly significant SNPs were detected in E. obliqua, identifying candidate defense response genes, one of them located within the Ppr1 locus (Yong et al., 2021). With the exception of Myrtle rust, GWAS analysis to investigate the genetic control and identify candidate defense genes for fungal diseases in Eucalyptus have not been implemented. For MLD resistance, a classical QTLs mapping using biparental crosses and 165 molecular markers detected two major QTLs that explained a large proportion of the phenotypic variance for Mycosphaerella cryptica severity (Freeman et al., 2008).
This work aimed to examine the genetic architecture of resistance to MLD caused by T. nubilosa in a breeding population of E. globulus. Although the Single-step GWAS strategy has received increasing attention in livestock species , to the best of our knowledge, it has not yet been applied in forest tree species. Therefore, this study has two specific objectives: i) to compare the Single-SNP associations GWAS, the GBLUP-GWAS and the Single-step GWAS strategies in a forest tree species, ii) to elucidate the genetic base of resistance and escape to T. nubilosa in E. globulus.

Population and phenotypic data
The population and phenotypic data used in this study have been described in detail in Balmelli et al., (2013 year old, there were several consecutive days of rain and high humidity, causing a severe infection of MLD (Balmelli et al., 2016). Phenotypic responses to the natural infection were assessed at age 14 (coinciding with outbreak of MLD), 21 and 26 months. The response to disease was assessed using two parameters, the severity of leaf spots (SEV) and defoliation (DEF). Both were evaluated on the whole crown (juvenile and adult foliage) using a visual scale, recording the percentage of leaf area affected by spots (SEV) and the percentage of leaves prematurely abscised (DEF). The escape to disease was assessed as the precocity of vegetative phase change (ADFO), quantified as the proportion of the crown with adult foliage (petiolate, alternate, shiny green and pendulous leaves). The SEV was classified in percentages classes of 5%, whereas DEF and ADFO in intervals of 10%.

Genotyping
A total of 1008 trees were sampled for genotyping. DNA was extracted from leaf tissue of each tree using a standard CTAB protocol. Genotyping was performed using the Eucalyptus Illumina EUChip60K (Silva-Junior et al., 2015) by GeneSeek (Lincoln, NE, USA). Of the 194 families in the trial, 179 have genotyped trees ranging from 4 to 8 trees per family. Individual genotypes were filtered by excluding samples with more than 10% of missing data across SNPs. The SNP data were then filtered by call rate > 90% and a minor allele frequency (MAF) > 0.01. The quality control of genotyping data was performed using QCF90 software ).

Population structure and linkage disequilibrium
The genetic structure of the breeding population was assessed by the Principal Components Analysis (PCA) using PREGSF90 of BLUPF90 family of programs (Aguilar et al., 2018). Pairwiseestimates of linkage disequilibrium (LD) were calculated as the squared correlation of allele counts for each pair of two SNPs (r 2 ) within each of the 11 chromosome. The LD decay was estimated using the R script by Marroni et al. (2011). The pattern of LD decay was visualized plotting the r 2 against the physical distance with ggplot (Wickham, 2016) R package (R Core Team, 2022). The threshold of LD decay was identified at r 2 = 0.02.

Statistical analyses 2.4.1 Mixed model analysis
Linear mixed models were used to estimate the variance components of proportion of adult foliage, leaf spot severity and defoliation, considering the observed phenotypes and transformed data into normal score within each block (Cappa and Varona, 2013). The linear mixed model was: where y is the vector of phenotypes; b is the vector of fixed effects including the overall mean and block, with incidence matrix X; p ∼ N(0, Is 2 p ) is the vector of random plot effects, with design matrix W; a ∼ N(0 ; As 2 a ) is the vector of random effects of individual trees (i.e., breeding values), with incidence matrix Z; and e ∼ N( 0, Is 2 e ) is the vector of residual errors. The matrix A is the pedigree relationship matrix and s 2 a is the additive genetic variance. Genomic data were not included for variance components estimation.
Heritability of each trait was estimated as: where s 2 a is the additive genetic variance, s 2 p is the plot variance, and s 2 e is the residual variance.

Single-SNP models
The most common single-marker models were compared, with different combinations of population stratrification and genetic relatedness correction. First, a naive model, without any correction for population structure or relatedness was fitted independently for each SNP following the linear mixed model: where y is the vector of phenotypes; b is the vector of fixed effects including the overall mean and block, with incidence matrix X; p ∼ N( 0, Is 2 p ) is the vector of random plot effects, with design matrix W; and e ∼ N(0, Is 2 e ) is the vector of residual errors. The x i is a vector that contains the genotype for the i th SNP for each individual and b i is the i th SNP effect considered as fixed. A second model was fitted to account for population structure (model P), where the vector of fixed effects (b) of equation 3 include the first three principal components identified.
Alternatively, to account for family structure, a linear mixed model including a polygenic effect was tested (model K). This association model was fitted using the same naive model by adding a random effect utilizing a genomic relationship matrix as follow: where Z is a design matrix and u is the vector of polygene background effects (random), u ∼ N(0, Gs 2 u ). The rest of the components were previously defined (equation 3). The G matrix was estimated using the first method of VanRaden (2008): where Z is the matrix of gene content adjusted for observed allele frequencies and p i is the allele frequency of the i th SNP. To account for family and population structure simultaneously, a model K + P was fitted, where the vector of fixed effects (b) of equation 4 include the first three principal components. All Single-SNP models were fitted using the Sommer R package (Covarrubias-Pazaran, 2016).

GBLUP-GWAS model
For the GBLUP-GWAS model, the follow mixed model was used: where every element is defined as before (equation 1), except for the vector of random effects of the individual trees (a) that follow a var(a) ∼ N(0, Gs 2 a ). The matrix G is the genomic relationship matrix calculated using all the SNPs (equation 5).
Under this GBLUP-GWAS model, the vector of SNP effects was obtained from a linear transformation of the vector of breeding values in a. The vector of allele marker effects (û ) was calculated for all SNPs simultaneously following Gualdroń Duarte et al. (2014).
The variance of SNP effects and the p-values of each SNP effect were estimated following the formulas of Gualdrón Duarte et al. (2014).

Single-step GBLUP association model
The ssGWAS model is similar to the GBLUP-GWAS previously presented, but in this model all the phenotype and pedigree information of all assessed trees (genotyped and non-genotyped) was considered. Thus, the G matrix is replaced with the H matrix that combines both pedigree and genomic information.
The inverse of the relationship matrix that combines pedigree and genomic information (H -1 ) was derived by Aguilar et al. (2010) as: where A -1 and A −1 22 are the inverse of the pedigree relationship matrices for all individuals and only for the genotyped individuals, respectively, and G -1 is the inverse of the genomic relationship matrix.
The SNP effects were estimated aŝ whereâ 22 is a vector of genomic breeding values only for genotyped individuals. The variance explained for each marker and the p-values were obtained as suggested by Aguilar et al. (2019).

Model comparison
Correction for multiple testing was applied to determine a threshold to indicate significant associations for all the six models implemented. At a genome-wide level, the Bonferroni correction at an alpha level of 0.05 was applied as a conservative criterion. In addition, a less stringent threshold for declaring significance was applied using the false discovery rate (FDR), with an alpha level of 0.05. Quantile-quantile (Q-Q) plots of observed and expected pvalues were used to evaluate the control of population structure and genetic relationships in the different GWAS models. These plots were generated with QQMAN R package (Turner, 2018).

Candidate genes and functional annotation
The physical position of the SNPs was defined according to the reference genome of E. grandis (Phytozome Version 1.1 available on https://phytozome-next.jgi.doe.gov/) since the genome of this species is thoroughly annotated. Candidate genes were searched within 30 kb upstream and downstream of the significant SNPs. The size of this interval was chosen based on the typical LD present in the Eucalyptus populations and the SNPs density in our study (1 SNP per 28kb). The sequence of the candidate genes were extracted from the annotation file of E. grandis v1.1. Particular miRNA related to heteroblasty, and their targets, were searched based on the annotation of Hudson et al. (2014) using blastn-short (BLASTN program optimized for sequences shorter than 50 bases). Functional annotation of genes associated with significant SNPs was done by protein homology search against the NCBI non-redundant protein sequences (NR) database (Sayers et al., 2022) using BLAST (Altschul et al., 1990). When possible, the best hit with functional annotation was used.

Phenotypic variation
In this trial, the epidemic of T. nubilosa affected all the trees, which showed high variability for all the diseased-related traits. The mean and standard deviation for all the traits increased along with the increasing age of measurement, indicating a high prevalence of infection (Table 1). Though a wide range of variability was observed for the proportion of adult foliage, a high proportion of trees had only juvenile foliage, resulting in a distribution with an inflated proportion of zero counts in the three measurement ages. The disease damage traits (SEV and DEF) presented low but significant variability, following a normal distribution (Figure 1). Phenotypic correlations between measurement ages were positive and high for the proportion of adult foliage, and positive and moderate for defoliation. The disease damage traits, presented moderately positive correlation. The vegetative change to adult foliage (ADFO) was negatively correlated with response to diseased traits, confirming the mechanism of disease escape through early change to resistant adult foliage (Figure 1). The estimated pedigreebased heritabilities were moderate to high, varying from 0.33 for DEF_21 to 0.77 for ADFO_26 (Table 1).

Population structure and linkage disequilibrium
A total of 19,992 markers and 961 trees were retained after quality filtering. The SNP markers were distributed throughout the 11 chromosomes according to the E. grandis reference genome, with a range from 1,268 in chromosome 4 to 2,516 in chromosome 8 ( Figure 2). The trend of linkage disequilibrium (LD) in the E. globulus population was analyzed across each chromosome. The average genowide r 2 in this population was 0.032, with a LD decay at 19.1 kb using the significant threshold (r 2 = 0.2) ( Figure 3A). The LD decay varied across different chromosomes, with the most rapid LD decay observed for chromosome 5 (12.4 kb) compared with the slower rate on chromosome 9 (50.1 kb)(data not shown). The principal component analysis (PCA) showed the diversity and population structure present in the breeding population. The first two principal components explained 6.22% and 1.53% of the total variation, respectively. Based on these components, the majority of trees cluster into a major group, and different sub clusters can be identified ( Figure 3B). Only the three first components that accounted for genetic variances greater than 1% were included in the association analysis to correct for population stratification.

Association analysis 3.3.1 Single-SNP models
The Single-SNP model without taking into account the population structure (naive model) resulted in the detection of a large number of associations for all traits ( Table 2). Most of these were apparent false positives or spurious associations, by ignoring the multiple levels of relatedness between individuals. To control the population structure, the three first principal components were included in the model (model P). Although the number of associations slightly decreased, a high number of spurious signals were detected, due to family relatedness not being accounted for. The quantile-quantile plot (QQ-plot) obtained for the naive and model P, showed a great deviation for the observed and expected p-values, reflecting the inadequacy of these models (Supplementary Figure 1). When the random effects captured for the G matrix were included in the association models (models K and K+P), more pvalues follow the uniform diagonal line in the QQ-plots (Supplementary Figure 1). Compared with the naive and model P, these models properly control false positive associations resulting from the within population family structure. As a result, including the G matrix resulted in a drastic reduction in the number of significant SNP associations. No significant associations were detected for SEV and DEF after correcting for multiple testing (Bonferroni and FDR at 5%) ( Table 2). All the significant associations were detected for ADFO at the three measurement age. We found a total of 13 SNP-trait associations (8 unique SNPs) for the K model and 11 associations (6 unique SNPs) for the K + P model considering a FDR threshold of 5%, and 6 were also significant after the more stringent Bonferroni correction (5%), in both models. Moreover, both models did not show any difference, because all the SNPs detected in the K + P model were also significant in the K model. Thus, for this breeding population, the genetic matrix can account for genetic structure, with an negligible effect of the adjustment of principal components.
For ADFO, the significant SNPs were distributed in chromosomes 2, 3 and 11 (Supplementary Figure 2). The higher number of significant SNPs associated was detected at the first age of measurement (14 months). It is not unexpected that the four and unique SNPs detected as significantly associated at the 21 and 26 months respectively, were also significant associated in the previous age measurement. Similar results were obtained using the observed Distribution of filtered SNPs in a 1Mb windows across the Eucalyptus grandis genome. The x-axis represents the distance in Mb.

FIGURE 1
Phenotypic distribution and correlation of six disease-related traits for Eucalyptus globulus. Histograms in the diagonal show the distribution of each trait. Scatter plots (lower off-diagonal) and Pearson's correlation coefficient (upper off-diagonal) illustrate the underlying relationship between traits.
variable and the transformed variable to normal scores for all Single-SNP models tested, as well as for GBLUP-GWAS and ssGWAS models (results not shown).

GBLUP-GWAS model
The GBLUP-GWAS model was performed for each variable to evaluate the equivalence with the Single-SNP model that accounts for genetic relationships in the G matrix (model K). For disease damage traits (SEV and DEF), no significant associations were declared using the multiple test correction (Bonferroni and FDR at 5%) (Table 2). Conversely, for ADFO the GBLUP-GWAS model identified 16 SNPtrait associations (8 unique SNPs) at the three measurement ages at the most permissive level of FDR at 5%. Half of these associations were considered significant after Bonferroni correction (5%).
Of these 16 SNP-trait associations, 12 were previously identified by the Single-SNP model K and four were novel SNPs. The seven SNPs associated with the proportion of adult foliage at 14 months were also identified in the Single-SNP approach (model K) at the same measurement age (Figure 4). Similarly, all the SNPs associated at 21 and 26 months for the Single-SNP model K were detected in the GBLUP-GWAS model.
Associations for the proportion of adult foliage at 14 months were detected on chromosomes 2 (1 SNP), 3 (3 SNPs) and 11 (3 SNPs) (Supplementary Figure 3). The same regions in chromosomes 2, 3 and 11 were detected at 21 months, with six SNPs detected in both measurement ages. Two SNPs, one in chromosome 3 and the other in chromosome 11, were found associated with ADFO at 26 months. These SNPs were associated with this escape to disease trait in the three measurement ages.

ssGWAS model
The number of significant SNP markers detected in the Singlestep GBLUP association approach for response and escape to disease traits was larger than in the Single-SNP (model K and K + P) or GBLUP methods, which indicate a better performance of ssGWAS model (Table 2). Considering the FDR multiple test correction at 5%, we detected 66 SNP-trait associations and 38 unique SNPs for ADFO by the three measurement age. This approach identified 12 significant SNP associations for DEF at 21 months (Supplementary Table 1; Supplementary Figure 4). After the more conservative Bonferroni correction at 5%, 19 and one associations continue to be significant with ADFO and DEF traits, respectively. Once again, we found no significant association for SEV trait.
For ADFO, significant SNPs were associated across all chromosomes, except for chromosomes 1, 4 and 10. However, the most significant SNPs were identified in chromosomes 3 and 11 in the three measurement ages, indicating the presence of few genomic regions controlling the precocity of adult foliage change ( Figure 5).
The first peak identified on chromosome 3 spans a region of~2 Mb, defined in accordance with SNPs overlapping at the three measurements ages. The high number of significant SNPs was associated with the first age of measurement (14 months), namely  nine SNPs located in this region, the most significant with a -log10 (p-value) of 9.86 (position 47 949 673). Inside this area, the six SNPs significantly associated at 21 months, were also detected at the measurement age of 14 months. Additionally, the five SNPs found at the 26 months, were also detected in the two previous measurement ages. The SNPs detected in this region explained the 1.14, 1.64 and 0.44 of the total genetic variance for each age at measurement of 14, 21 and 26 months, respectively. A small region of~200 kb was also identified in chromosome 11 significantly associated with this trait. This region encompasses 7, 5 and 3 SNPs for measurement at 14, 21 and 26 months, respectively. Once again, the higher number of SNPs was identified at the first measurement age (14 months), and all the markers identified at 21 and 26 months were also detected in the previous measurement age. In this region, the SNP in the position 29 564 447 was the most significant, with -log10(p-values) values of 9.69, 11.07 and 9.36 for 14, 21 and 26 months, respectively. The SNPs inside this region represented 1.14, 0.94 and 0.73 of the genetic total variance at the three measurement ages, respectively.
Additionally, other significant signals were identified in chromosome 2, with one SNP (position 54 236 535) detected at the conservative threshold of Bonferroni correction at 14 and 21 months, and with the FDR threshold at 26 months. For chromosome 6, two SNPs located 100 kb apart were identified with the less stringent FDR correction for the three measurement ages ( Figure 5).
The ssGWAS model was able to detect 12 SNPs in chromosomes 3, 6, 8 and 11 associated with the defoliation measured at 21 months. However, no significant associations were detected for the previous measurement age (14 months). As expected by the correlation between defoliation and the proportion of adult foliage traits, six common SNPs between these traits at the same age (21 months) were observed. Thus, two SNPs identified in chromosome 3 and four in chromosome 11, correspond to the significant regions detected for ADFO.

Candidate genes for precocity of vegetative phase change
Of the 38 unique significant SNPs identified for heteroblasty, two of them did not have any associated genes. A total of 133 genes were associated with the rest of the significant SNPs, of which 131 were protein coding. Two key miRNA genes, miR156.5 and miR157.4, were found in chromosome 3 and 11, respectively ( Table 3). The significant SNPs associated with these genes were at 21 kb and 25 kb for miR156.5 and miR157.4, respectively. None of the miR156 and miR157 target proteins were found.
Out of the 131 protein coding genes, 20 could not be assigned a specific functional annotation. In total 95 different functions were found (Supplementary Table 2). A total of 11 proteins related to

Discussion
In a GWA study, for any population of a given genetic background, the ability to detect associations is a function of several parameters that include the phenotypic variance, the heritability and the complexity of the trait. In the current study, we show significant phenotypic variance for resistance to MLD for all the traits assessed in a breeding population of E. globulus. The heritabilities were high (h 2 > 0.65) for ADFO and SEV, whereas moderate values were observed for DEF. These estimates were similar or higher to those reported in previous studies carried out for equivalent resistance or damage traits of Teratosphaeria fungal diseases in E. globulus. For example, heritabilities for severity to T. nubilosa and T. cryptica ranging from 0.12 to 0.60 (Dungey et al., 1997;Carnegie and Ades, 2005;Milgate et al., 2005;Costa e Silva et al., 2013;Hamilton et al., 2013) whereas values between 0.43 to 0.74 have been reported for proportion of adult foliage (Balmelli et al., 2013). Regarding the genetic complexity of the MLD resistant traits in E. globulus, two major genomic regions which explained a large proportion of the phenotypic variance have been previously identified, and an oligogenic or major gene control proposed for these traits (Freeman et al., 2008). However, recent studies show that resistance mechanisms for other diseases in Eucalyptus involve multiple interacting loci of variable effect, according to a quasiinfinitesimal model (Butler et al., 2016;Resende et al., 2017). Specifically for heteroblastic transition, although major genes have been identified in various species, we still have a rather limited understanding of the number of loci involved in the regulatory pathway of this process (Hudson et al., 2014). Thus, for the diseaserelated traits analyzed here a quantitative nature is expected, which means that the phenotypic variation observed can be dependent on more than one major QTLs and many significant marker-trait associations would be identified.
In a first approximation to the genetic architecture to MLD resistance, we applied the widespread mixed model approach proposed by Yu et al. (2006). We used the PCA scores to control for population stratification and the genomic relationship matrix (G) to control for close family relatedness. As expected, when the background genetic structure is not properly accounted for in the model, a high rate of false positive marker-trait associations are observed (Supplementary Figure 1). Our results, in line with previous association studies in Eucalyptus, demonstrated that the inclusion of family relatedness correction in association models drastically reduces the number of false-positive associations (Thavamanikumar et al., 2014;Müller et al., 2019). Thus, for Circular Manhattan plots for proportion of adult foliage (ADFO) using Single-step GBLUP association model (ssGWAS). Inner, middle and outer layers represent the measurement ages at 14, 21 and 26 months, respectively. The Bonferroni (alpha = 0.05) and false discovery rate (FDR) threshold at 5% are represented by the red and blue circle in each layer, respectively. Vertical dashed lines highlight common SNPs identified among the three measurement ages. Significant SNPs identified after multiple test corrections are in black.
breeding populations consisting of families with cryptic relatedness, such as open-pollinated populations, the family structure (G matrix) efficiently captures the variations caused by genetic structure. The adjustment by using the PCA scores may be not adequate, and would be only necessary in populations with high levels of subrace differentiation or with different geographic origins (Cappa et al., 2013).
In the standard GWAS approach, each marker is tested independently for an association to the trait, included as a fixed effect in the model. For this method, two main drawbacks have been identified. First, the magnitude of the effects by the marker-trait associations identified are largely overestimated, caused mainly by the limited sample sizes and/or the use of the same data set for discovery and parameter estimation (Beavis, 1998). Second, the effect of one marker is estimated ignoring the rest of the genome, causing confusion about the number and location and also contributing to the upward bias of estimated effects of loci identified (Kemper et al., 2012). The alternative approach that fits all markers simultaneously as a random effect, GBLUP-GWAS, is becoming popular due to having much in common with genomic selection (VanRaden, 2008). In our study, the GBLUP-GWAS and the Single-SNP approach (model K) show similar results in the marker-trait associations identified, where 75% of the markers identified in the GBLUP model were previously identified in the Single-SNP model. When the SNP effects for this common SNPs were compared, our results agree with previous results that show an upwardly biased estimation under a fixed single-marker model (Kemper et al., 2012). For GWA studies, the use of more phenotypic and genotypic data improves power and resolution (Tibbs Cortes et al., 2021). Thus, those ssGWAS methods that include the phenotypes of all genotyped and non-genotyped individuals as well as the pedigree information, have been shown to provide more consistent solutions and increased accuracy than the standard GWAS approach (Wang et al., 2012). Although this approach was successfully applied in GWA studies in domestic animals , to our knowledge, this is the first report of ssGWAS applied to forest tree species. In this study, the ssGWAS approach identified 66 and 12 significant SNP-trait associations for the ADFO and DEF traits, respectively. The significant SNPs identified here are robust since the ssGWAS approach has shown, through simulations by Mancin et al. (2021), to control for false positives successfully. Additionally, our results show a general agreement between the Single-SNP and GBLUP strategies, with the same genomic regions identified with all models. However, the ssGWAS demonstrated superior performance by detecting new significant SNPs in chromosomes that had not been identified in the other models tested and resulted in clear peaks with higher -log10(p-values). Therefore, the ssGWAS is a promising method to the dissection of complex traits in populations when only a fraction of the population is genotyped, taking advantage of the available phenotypic information of individuals non-genotyped.
Most studies addressing disease resistance in forest trees have mainly assessed the disease's severity as the presence of leaf spots and/or the necrotic lesions present on the leaves. The high heritability found for severity of leaf spots (SEV) in agreement with previous reports, broadens the probability of detecting a gene of large effect. However, large heritability does not imply a direct relationship with the number, effect or proportion of the genetic variance of the genomic regions identified (Visscher et al., 2008). Our outcome can be explained by the low phenotypic variability of this trait in this population. Freeman et al. (2008) showed that the number of resistance QTLs detected in three breeding populations of E. globulus was strongly correlated with the variability within each population. In this study we also evaluated the degree of defoliation (DEF) as a descriptor of the impact of the MLD infection. For this trait, assessed at 21 months of age, we found 12 significant associations, although only one was significant after Bonferroni correction (Table 1). The low number of significant associations can be partially explained by the moderately-low heritability of this trait, reflecting the superior performance of the ssGWAS approach relative to the standard GWAS approaches.
The vegetative phase change in plants involves many morphological changes, well differentiated in many tree species as E. globulus. The mechanism to the juvenile to adult transition is under a strong genetic control with the same regulatory mechanisms in both annual herbaceous plants and perennial trees (Wang et al., 2011). Although the understanding of the genetic control of this transition remains limited, it is mediated by environmental and endogenous signals (Poethig, 2010). In E. globulus, precocious vegetative change was evidenced in response to abiotic stress (drought, salt and high winds) in a natural coastal population (Jordan et al., 2000). Also, it is proposed as a mechanism involved in disease resistance, anticipating the timing of vegetative change to the resistant adult foliage. For precocity of vegetative change, measured as the proportion of adult foliage in this study, the ssGWAS model was able to identify the highest number of significant trait-SNP associations in this breeding population. For the three measurement ages, the same SNPs were identified as the markers that explained the highest genetic variation, which allowed identify two main significant regions in chromosome 3 and 11. Within those regions, we identified the miR156.5 and miR157.4 respectively, both of which target DNA-binding transcription factors that regulates the SQUAMOSA PROMOTER BINDING PROTEIN-LIKE (SPL) gene family. The miR156 has been recognized as a master regulator of vegetative change in plants, promoting juvenility due to high expression in seedlings that decrease during development (Poethig, 2010;Manuela and Xu, 2020). Our results support similar reports from bi-parental QTLs mapping studies that identified miR156.5 as the candidate gene causing difference in the timing of phase change between precocious and normal natural populations of E. globulus (Hudson et al., 2014). Furthermore, emerging studies suggest that miR156 is probably an integral component of the miRNA response to all environmental stresses in plants (Zhao et al., 2012;Jeyakumar et al., 2020). Specifically for fungal pathogen defense response, miR156 targets the plant disease resistance proteins (NBS-LRR proteins) in Populus trichocarpa infected with stem canker pathogen (Zhao et al., 2012). Additionally, precursors of miR156 have also been reported located near the main QTL identified for Puccinia rust resistance (Ppr1), suggesting a main role in the genetic control of plant resistance (Butler et al., 2016).
We also identified various genes proposed to be involved in fungal pathogen resistance through different mechanisms. For example, the 26S proteasome non-ATPase regulatory subunit 9 is part of the ubiquitin/26S proteasome system, known to be involved in almost every step of the defense mechanisms in plants, regardless of the type of pathogen, regulating key cellular processes through protein degradation (Dielen et al., 2010). The dirigent protein 22 (DIR22) is part of a gene family involved in plant defense by their role in the dynamic reorganization of the cell wall through the formation of lignans and production of defense-related compounds (Yadav et al., 2021). Numerous studies have shown that the expression levels of DIRs genes are enhanced during the late stages of fungal infection (e.g., Colletotrichum gleosporioides, Botrytis cinerea, Fusarium oxysporum) in various plant species, increasing antifungal responses (Arasan et al., 2013;Borges et al., 2013;Reboledo et al., 2015). Glutathione transferase GST23 belong to a super-family of multifunctional proteins in plants, being one of their most important roles the detoxification of fungal toxins in the response against biotic stresses (Gardiner et al., 2010). The expression of GSTs is markedly induced during fungal infection, leading to enhanced resistance to the pathogen (Ahn et al., 2016). The (-)-germacrene D synthase is an enzyme involved in the volatile terpenoid biosynthesis. It has been reported that the downregulation of these genes alters monoterpene levels leading to resistance against biotic stresses at a global level, including defense responses to fungus infection (Rodrıǵuez et al., 2014). Lastly, the endochitinase EP3-like genes encode a protein involved in catalyzing the hydrolytic cleavage in chitin, the prominent wall component of fungus, and are induced by pathogen attack and other biotic stresses (Nagpure et al., 2014).
Finally, we identified genes involved in pathogen defenses' regulation and signaling mechanisms. MAP kinase 4 substrate 1 (MKS1) has been reported to be pivotal in signaling basal defense responses (Andreasson et al., 2005), whereas the WRKY transcription factor WRKY76 is involved in the regulation of the plant defense signaling pathway (Rushton et al., 2010). In addition, many plant disease resistance proteins (NBS-LRR proteins), which trigger signal transduction leading to the induction of defense responses (Andersen et al., 2018), were found. For instance, the disease resistance protein RPV1 is known to confer resistance to oomycete and fungal pathogens in cultivated grapevines (Vitis vinifera) (Gessler et al., 2011). LRR receptor-like serine/threonineprotein kinase GSO1 was reported to be related to the resistance of Nicotiana tabacum to the oomycete Phytophthora parasitica (Dang et al., 2019). TMV resistance protein N-like was reported to be involved in the resistance in potato to the fungus Synchytrium endobioticum (Hehl et al., 1999), and in peanut to two serious fungal foliar diseases (Dang et al., 2021). F-Box/LRR repeat protein was reported to be activated by fungal pathogens (Cladosporium fulvum, Puccinia striiformis) in different plant species in order to regulate defense responses (Yin et al., 2018).

Conclusions
In the present study, we carried out GWAS for traits related to response and escape to MLD in a breeding population of E. globulus only partially genotyped, and assessed the advantage of including phenotypes from relatives without genotypes in a single-step procedure. We compare several GWAS models that differ in the correction for population structure, the statistical approach to fit the markers and the impact of including combined pedigree, genomic and phenotypic information. Several SNP-trait associations were detected for the precocity of vegetative phase change (proportion of adult foliage). The same SNPs were identified at different measurement ages for the different strategies, providing validation of the results for these specific loci. Additionally, our results suggest that ssGWAS is a powerful model in detecting regions associated to the escape to MLD, showing a greater number of significant SNP associations than Single-SNP and GBLUP-GWAS models. Associations were detected near the miRNAs that regulate the timing of vegetative phase change, consistent with results from other studies about heteroblastic transition in trees and annual species. Thus, our research supports the hypothesis that this is a complex trait determined by a large number of genes with diverse biological functions, that can also include a mechanism of response to biotic stress, such as fungal resistance. Although the identified putative candidate genes will require future validations, they provide a better understanding of the genetic architecture of vegetative change in Eucalyptus. Overall, our results represent a foundational step in marker-assisted selection for a tree breeding program. For example, significant markers or validated genes identified by GWAS can be incorporated into prediction models, to accelerate breeding progress. However, since this is an association discovery study, validation in independent populations is necessary.

Data availability statement
The phenotypic and genotypic data analyzed in this study are not publicly available. The data are available upon reasonable request to the corresponding author and with permission of INIA. Requests to access these datasets should be directed to GB, gbalmelli@inia.org.uy.

Author contributions
MQ, IA, and GB contributed to conception and design of the study. GB designed the field experiment, collected the phenotypic data and provided the marker data. MQ performed the statistical analysis under the supervision of IA. FG and CS performed the candidate gene analysis and wrote this section of the manuscript. MQ wrote the final version of the manuscript. All authors contributed to the article and approved the submitted version.

Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.