The Genetic Architecture of Juvenile Growth Traits in the Conifer Torreya grandis as Revealed by Joint Linkage and Linkage Disequilibrium Mapping

Despite its high economical and ornamental values, Torreya grandis, a dioecious non-timber coniferous species, has long been an underrepresented species. However, the advent and application of advanced genotyping technologies have stimulated its genetic research, making it possible to gain new insight into the genetic architecture of complex traits that may not be detected for model species. We apply an open-pollination (OP) mapping strategy to conduct a QTL mapping experiment of T. grandis, in which nearly 100 unrelated trees randomly chosen from the species’ natural distribution and their half-sib progeny are simultaneously genotyped. This strategy allows us to simultaneously estimate the recombination fractions and linkage disequilibrium (LD) coefficients between each pair of markers. We reconstruct a high-density linkage map of 4,203 SNPs covering a total distance of 8,393.95 cM and plot pairwise normalized LD values against genetic distances to build up a linkage-LD map. We identify 13 QTLs for stem basal diameter growth and 4 QTLs for stem height growth in juvenile seedlings. From the linkage-LD map, we infer the evolutionary history of T. grandis and each of its QTLs. The slow decay of QTL-related LDs indicates that these QTLs and their harboring genomic regions are evolutionarily relatively young, suggesting that they can better utilized by clonal propagation rather than through seed propagation. Genetic results from the OP sampling strategy could provide useful guidance for genetic studies of other dioecious species.


INTRODUCTION
Most plant traits including those of agronomic importance are complex quantitative traits in nature, which are jointly controlled by an interacting network of genes, each with a small effect, and environmental factors (Lynch and Walsh, 1998). There is no exception with tree species. These genes underlying a complex trait are called quantitative trait loci (QTLs) or quantitative trait nucleotides (QTNs; Lynch and Walsh, 1998;Wu and Lin, 2006). Quantitative models have been established to separate the genetic and environmental effects for complex traits at the QTL level (Lander and Botstein, 1989;Yang et al., 2006;Wu et al., 2007;Sillanpää, 2013, 2015;Yin et al., 2015;Zeng et al., 2019).
Linkage analysis based on a family-based design is a very popular approach for identifying and mapping QTLs (Wu et al., 2007;Sajjad et al., 2014). This approach makes use of allelic segregation and transmission at different loci to construct a genetic linkage map from a large number of molecular markers. A number of mapping examples using linkage analysis have been reported in backcross, double haploid, recombinant inbred lines, near isogenic line, and F 2 populations, initiated with two inbred lines (Collard et al., 2005;Shar et al., 2020;Wang et al., 2020;Zhou et al., 2020). For forest trees, in which no inbred lines are available due to their long-life cycle, large genomes, and high heterozygosity (White et al., 2007), linkage analysis is conducted on the basis of a full-sib family derived from two heterozygous parents (Maliepaard et al., 1997;Wu et al., 2002b;Lu et al., 2004). The limitation of linkage analysis lies in its incapacity to high-resolution map QTLs, unless an extremely large number of samples sizes that is hardly met in forest genetics are used.
Apart from linkage mapping, there is an alternative mapping approach based on linkage disequilibria (LD). This LD-based approach can capitalize on recombinant events accumulated over evolutionary history and, therefore, provide high-resolution mapping of QTLs (Flint-Garcia et al., 2005;de Oliveira et al., 2010;Sajjad et al., 2014). Also, by sampling unrelated individuals from domesticated or natural populations, this approach can map a wide spectrum of allelic variants, including multiple alleles . There have been many examples of using LD analysis to map QTLs in forest trees; for example, QTLs detected for wood density in teak (Tectona grandis) (Vaishnav and Shamim, 2019) and the LACCASE gene (an important regulatory gene for lignin biosynthesis) characterized for Japanese larch (Larix kaempferi) (Liu et al., 2017). However, its application may be impaired by spurious LD detection and the occurrence of rare alleles increasingly recognized as important contributors of genetic variation (Cardon and Palmer, 2003).
To overcome the limitations of linkage and LD mapping, a joint linkage-LD analysis has been developed (Wu and Zeng, 2001;Wu et al., 2002a;Myles et al., 2009;Bennett, 2010;Lu et al., 2010;Pagny et al., 2012), which has been shown to increase the precision of QTL mapping and decrease its false positive rate. Unlike autogamous plants, forest trees mostly perform allogamous pollination, i.e., the pistils of a tree receive the sperms in the pollens randomly from different trees to fertilize their eggs. Taking advantages of this open pollination (OP) mating behavior, Wu and Zeng (2001) and Wu et al. (2002a) proposed an OP mapping strategy that combines the advantages of linkage mapping and LD mapping. The merit of the OP mapping strategy is augmented by its additional value to infer the genetic diversity of natural populations and the evolutionary history of QTLs (Sun et al., 2015;Zhu et al., 2015a,b). More recently, Zeng et al. (2019) have equipped this strategy with a capacity to estimate and test QTLs through non-DNA sequence-based (non-genetic) maternal inheritance.
Because of these unique advantages, also stimulated by the development of sequencing techniques that make it possible to genotype a massive amount of SNP markers even for un-sequenced forest trees (Wang et al., 2011;Huang et al., 2015), the OP strategy shows its increasing usefulness in QTL mapping. In this study, we use this strategy to map growth QTLs for Torreya grandis Fort. ex Lindl, an underrepresented dioecious gymnosperm species distributed in the southeastern China (Kang and Tang, 1994). As an economically important species producing edible nuts, T. grandis has been cultivated for more than 1000 years, but its systematic population and quantitative genetic study has not begun until very recently. Using the OP seeds from a single tree of T. grandis "Merrillii, " a commercial variety of T. grandis, Zeng et al. (2014) constructed a first low-density genetic map to which QTLs affecting juvenile growth traits were located. A second low-density linkage map of T. grandis was constructed using the OP design in which both parents and offspring were genotyped for the same set of markers (Zhu et al., 2015a). Using this linkage map, Zeng et al. (2019) further map QTLs that display transgenerational inheritance or epigenetic expression. Although these previous studies demonstrate the power of OP design to dissect complex traits in recalcitrant coniferous species, use of less expensive dominant markers, i.e., sequence-related amplified polymorphism (SRAP) and amplified fragment length polymorphism (AFLP), limits the coverage of the linkage maps reconstructed.
In this article, we report a large-scale OP-based mapping study by genotyping about 100 maternal T. grandis trees and 10 offspring of each maternal by genotyping-by-sequencing (GBS). After quality control, 70,580 SNP markers are detected to be segregating simultaneously in parental and offspring populations. By germinating OP seeds of each sampled maternal tree into seedlings, we implement the mapping algorithm of OP design (Wu and Zeng, 2001;Zhu et al., 2015a,b) to reconstruct a linkage map and identify QTLs on the map that govern stem growth of juvenile T. grandis trees. We further infer the evolutionary history of growth QTLs detected in T. grandis, providing new insight into the genome structure and organization of this less studied but important species.

Experimental Design and Sampling
Based on the Wu and Zeng's (2001) OP mapping strategy, we collected seeds and leaves from 96 unrelated trees randomly distributed in a natural population of T. grandis in Xiaorong Village, Chenkan Town, Huizhou District, Huangshan City, Anhui Province of China (29 • 57 N and 118 • 14 E) in 2017. The leaves were used to isolate DNAs used to genotype sampled parental trees. The seeds were stratified in a greenhouse on campus of Zhejiang A&F University (30 • 15 N and 119 • 43 E). In the following spring (2018), germinated seeds were transplanted into a 22 cm × 21 cm (diameter × height) plastic pot to grow seedlings. Young leaves from seedling were collected for DNA isolation used to genotype the progeny. Seedling height and basal diameter growth was measured for each progeny from each parental tree at the end of each year for 2018-2021. These growth traits were used for QTL mapping.
Leaf samples from both 96 maternal trees and their 878 progeny (4-10 seedlings from each maternal tree) were sent to LC-Bio Technologies (Hangzhou) Co., Ltd., Hangzhou, China. These trees were genotyped for genomewide SNPs by a GBS-based genotyping technique combining EcoRV and ScaI and SNP calling. An insert size of 430-460 bp was selected by gel electrophoresis for further sequencing library construction. Barcoded adaptors and common adaptors were then ligated to digested fragments, followed by every 96 random samples pooling and PCR amplification, in which only those short samples featuring a barcode + common adaptor combination were amplified. Finally, the fragments were enriched by PCR amplification and purified by magnetic beads. The use of a pair of SNPs for joint linkage and linkage disequilibrium analysis requires the information of the markers that are co-segregating in both parental and offspring populations. Excluding rare allele markers in the parental population, markers deviating from Mendelian segregation pattern in the offspring population, and missing markers (missing at >5%) in the parental population, there are 70,580 such quality SNP markers used for consequent analysis.

Linkage Map Construction and Linkage Disequilibrium Estimation
A statistical algorithm for simultaneously estimating the recombination fractions and linkage disequilibria between each pair of markers using the OP design was described in detail by Wu and Zeng (2001), Wu et al. (2002a), and Zhu et al. (2015a,b). To help the readers understand this algorithm, we outlined its basic principle. Consider a pair of biallelic SNP markers that are co-segregating in parental population. Let S p denote parental genotypes at these two markers. The pattern of marker co-segregation is determined by population genetic parameters (P); i.e., four haplotype frequencies, or equivalently, by allele frequencies at each marker and the LD between the two markers. The co-segregating marker pair is co-transmitted from parents to their half-sib progeny, forming offspring genotypes (denoted by S o ). The pattern of co-transmission is determined by the recombination fraction, denoted by r. The likelihood of population genetic parameters and the recombination fraction given observed marker data is formulated as: which includes two components, the likelihood of population genetic parameters [L p (P| S p )] given parental marker genotypes and the likelihood of the recombination fraction [L q (r| S p , S o , P)] given parental and offspring genotypes and the estimates of population genetic parameters. Maximizing the likelihood in Eq.
(1) is equivalent to maximizing L p (P| S p ) and L q (r| S p , S o , P), individually. Wu and Zeng (2001) proposed the EM algorithm to jointly solve these two likelihoods. Specifically, L p (P| S p ) allows us to estimate haplotype frequencies, from which allele frequencies and LD are estimated. The estimated LD was normalized as marker-marker correlation (r 2 ), whereas L q (r| S p , S o , P) allows us to estimate the recombination fraction. Thus, for the same pair of markers, we can estimate both their recombination fraction and LD, which allows the linkage-LD map to be built. We implemented Matlab R2019a to solve the likelihood of Eq. (1). The estimates of pairwise recombination fractions are used to reconstruct linkage maps. Optimal linkage groups were determined by changing the threshold of the recombination fraction and LOD. Markers of each group were ordered by using sum of adjacent recombination frequencies (SARF; Buetow and Chakravarti, 1987). The genetic distance between markers expressed as centiMorgan (cM) was calculated by transforming the recombination rate through the Kosambi mapping function. The R 4.0.3 package LinkageMapView (Ouellette et al., 2018) was adopted to draw the genetic linkage map and the marker density map. Wu et al. (2002a) extended Wu and Zeng's (2001) OP design to map QTLs for phenotypic traits. Consider a SNP with three genotypes AA, Aa, and aa which are segregating in the parental (maternal) population. Through OP, each maternal genotype combines with three possible paternal genotypes AA, Aa, and aa from the pollen pool, producing different proportions of offspring genotypes AA, Aa, and aa. The total number of each of these three offspring genotypes is the sum of the number of the offspring genotype produced by each maternal genotype, weighted by the frequencies of maternal genotypes. Let n 1 , n 2 , and n 3 denote the total number of offspring genotypes AA, Aa, and aa, respectively, and let y i denote the phenotypic value of a trait measured for an offspring individual. The likelihood of trait value at this SNP from the offspring population is formulated as:

Quantitative Trait Loci Identification
where f j (y i ) is the normal distribution density function with mean µ j (j = 1 for AA, 2 for Aa, and 3 for aa) and variance σ 2 . After the model parameters are estimated, a hypothesis test is performed to test whether there are significant QTLs affecting the growth trait. The null hypothesis that assumes no existence of a QTL, whereas the alternative hypothesis assumes its existence, is expressed as: H 1 : at least one of the equations H 0 above does not hold. Under each hypothesis, we calculate its likelihood value, L 0 for H 0 and L 1 for H 1 . The likelihood ratio (LR) is calculated by: Permutation tests were conducted to calculate the critical threshold for testing significant QTLs (Doerge and Churchill, 1996). We estimated the proportion of phenotypic variance explained by each QTL.

Genetic Linkage Map
The offspring (seed) genotype of a maternal parent derives from the combination of its maternal gametes and the paternal gametes from the pollen pool. Thus, based on the segregation and recombination of two SNPs from a maternal genotype to its offspring, we can estimate the recombination fraction for such a SNP pair. By estimating pairwise recombination fractions, we reconstruct a high-density linkage map that contains 11 linkage groups (LG) based on the criterion of LOD (logarithm of the odd) >5. This map, composed of 4,203 SNPs and covering an overall map length of 8,393.95 cM with an average map distance of 2.00 cM, represents one of the highest marker density and genome coverage for T. grandis (Figure 1). The LG constructed ranges in length from 84.28 cM for LG11 with only 13 SNP markers to 2,117.02 cM for LG4 with 835 markers. LG1 has the largest number of markers, which is 903, LG2 has the smallest average distance of 1.09 cM, and LG7 has the largest gap of 172.93 cM ( Table 1). The map has large gaps at several regions, such as those at the end of LG1 and both ends of LG7 (Figure 1).

Linkage-Linkage Disequilibrium Map
The information about how a pair of markers is co-segregating in natural populations is contained in maternal and paternal gametes, which can be extracted from the OP design. We estimate pairwise LD and plot their normalized values against map distances separately for each linkage group (Figure 2). As expected, LD decays with genetic distance, but with the degree of decay depending on linkage group. In general, linkage groups LG1-LG4 and LG6 have LD values that decay dramatically at a genetic distance of 10-20 cM; i.e., the decay of LD from 0.50 to 0.25 occurs at the genetic distance of 17.62 cM for LG1, 23.06 cM for LG2, 9.56 cM for LG3, 28.17 cM for LG4, and 15.89 cM for LG6. Yet, a dramatic LD decay in LG5 and LG7-LG11 occurs at a much shorter interval, i.e., 0.5-3.0 cM; i.e., the decay of LD from 0.50 to 0.25 occurs at a genetic distance of 2.11 cM for LG5, 0.52 cM for LG7, 0.51 cM for LG8, 0.54 cM for LG9, 0.66 cM for LG10, and 1.06 cM for LG11. Within the same linkage group, some marker pairs have large LD, but are separated by long genetic distances, suggesting that these markers may be subject to some recent evolutionary forces. Some markers have tight linkage and strong non-random association, implying that they have experienced a long evolutionary past.
Linkage disequilibrium between unlinked markers will disappear rapidly after several generations of random mating. Thus, the occurrence of such LD implies that relevant markers are experiencing evolutionary actions. We find that significant LD between markers from different linkage groups occurs, but they are not evenly distributed among linkage groups ( Table 2). Markers on LG1 tend to be associated with those from LG2 to LG7. This finding, plus the facts that a good proportion of markers (8%) on LG1 has significant LD and that the frequencies of LD between all other linkage groups are very low, suggests that LG1 are more likely to be subject to recent evolutionary forces than other linkage groups.

Quantitative Trait Loci Identification
The OP design can map QTLs to particular genomic regions of 11 linkage groups through a joint linkage-LD analysis. We consider two growth traits, stem height and basal diameter, measured for T. grandis seedlings from OP progeny in years 2018, 2019, 2020, and 2021. These two traits are found to vary considerably among progeny (Table 3 and Figure 3), implying the existence of their underlying QTLs. We identify 4 QTLs for stem height in 2018 and 2020 located on LG1, LG5, and LG7, and 13 QTLs for basal diameter in 2020 and 2021 on LG1, LG3, LG4, LG5, LG6, LG9, and LG10, respectively. Table 4 provides the information on the marker names of QTLs and their cross types and allele types. These QTLs mostly affect growth traits in an additive manner, but four of them display no dominant inheritance. Each QTL is found to account for a small portion of phenotypic variance, i.e., QTL heritability, suggesting that growth traits in T. grandis are polygenic.
By locating each of these QTLs on the linkage-LD map, we can infer their evolutionary history (Figure 2). The LD between two loci decays with generation at a rate proportional to their genetic distance. In general, if there is no linkage, disequilibrium will disappear at 5-6 generations of random mating. Yet, if there is strong linkage, LD will need a large number of generations to disappear. Thus, by comparing the size of both the linkage and LD, we can infer the number of generations that have passed since the production of disequilibrium. All QTLs, except for Q93443_149, Q169393_182 and Q32220_12, are located in the left bottom corner of the map, i.e., the markers associated with these QTLs have low linkage and disequilibrium values. Q93443_149, Q169393_182, and Q32220_12 reside in the region where there is little recombination but a remarkable disequilibrium.

DISCUSSION
The genetic mapping of complex traits is one of the most important topics in quantitative genetic research and plant breeding. Complex traits can be mapped by two approaches, linkage mapping and linkage disequilibrium (LD) mapping. Each of the two approaches has its own advantages and disadvantages in the accuracy, precision and power of QTL mapping and, thus, a simultaneous application of the two approaches has been considered in many genetic studies (Myles et al., 2009;Bennett, 2010;Lu et al., 2010;Pagny et al., 2012). Many of these studies simultaneously used linkage mapping and LD mapping for the same complex traits, but this simultaneous use was based on different mapping populations, i.e., controlled crosses for linkage mapping and founder-unknown populations for LD mapping. Although this can mutually validate mapping results from different approaches, such a simultaneous use does not resolve the accuracy and power issues characteristic of each Frontiers in Plant Science | www.frontiersin.org approach. For example, the spurious detection of disequilibria may still occur for LD mapping.
To strengthen the advantages of each mapping approach and overcome disadvantages of them, they should be integrated into a unified framework for the same mapping population. Wu and Zeng (2001) and Wu et al. (2002a) are the first who have developed a joint model of linkage-LD mapping, validated the statistical properties of the model, and justified its practical application (Hou et al., 2009;Yin et al., 2015). In particular, based on the allogamous behavior of forest trees, they proposed an open-pollination (OP) sampling strategy for joint linkage-LD mapping, followed by elegant statistical algorithms for parameter estimation and testing. This OP mapping strategy not only preserves the merits of linkage mapping and LD mapping, but also generates a new value, i.e., it allows the cohesive integration of population genetics, evolutionary genetics, and  quantitative genetics to understand the genetic diversity of populations and the evolution of QTLs. This strategy has found its biologically meaningful application for studying the genetics of Euphrates poplar (Zhu et al., 2015b) and T. grandis (Zhu et al., 2015a) and has been regarded as a generic tool for genetic mapping in forest trees and other outcrossing species (Sun et al., 2015). In this study, we employ Wu and Zeng's (2001) OP design to map growth traits in T. grandis, an important but underrepresented tree species. We simultaneously estimate the recombination fractions and LD coefficients between each pair of SNP markers genotyped for paternal trees and their half-sib offspring. By plotting normalized LD values against the recombination fractions, we chart a linkage-LD map, from which evolutionary events acting on the genome of T. grandis can be inferred. The slope of LD decay curve over genetic distances implies the evolutionary history of the species (Bennett and Binet, 1956;Sun et al., 2015). Previous studies suggest that LD values decays rapidly with genetic distance in coniferous trees (González-Martínez et al., 2007). Although this phenomenon is confirmed in this study, we gain additional insight into the genome structure of T. grandis. Our result shows that LD values in some narrow genomic regions decay dramatically, suggesting that these segments of the T. grandis genome may have experienced a long evolutionary history. We TABLE 2 | The number (upper diagonal) and proportion (lower diagonal) of marker pairs with significant LD within and between linkage groups (LG). LG1 LG2 LG3 LG4 LG5 LG6 LG7 LG8 LG9 LG10 LG11 LG1    also find that some regions of the genome have been subjected to certain recent evolutionary forces, because large LD values are detected between genetically distant markers. A further investigation into the detailed distribution of genes located in evolutionarily old and young genomic regions is needed. Such information can help tree breeders choose an optimal breeding scheme for this coniferous tree species. Ye et al. (2020) detected considerable differences in growth performance among T. grandis families, which is confirmed by this study. We implement Wu and Zeng's (2001) algorithm to map QTLs for juvenile growth traits measured at different years in seedlings of half-sib offspring derived from paternal trees. We identified several stem height and basal diameter growth QTLs at different years, but did not find nay QTLs that are shared between different years. This finding, consistent with that detected from a linkage mapping experiment by Zeng et al. (2014), suggests that T. grandis activates diverse genetic systems in response to environmental change during its early establishment. This may also imply that this species preserves a rich warehouse of genetic variants to buffer against environmental perturbations in its growth and development.
Beyond traditional linkage mapping or LD mapping alone, the OP strategy allows us to infer the evolution of QTLs. We find that all QTLs, except for Q93443_149, Q169393_182, and Q32220_12, have weak associations with the markers that are highly linked with them, whereas Q93443_149, Q169393_182, and Q32220_12 are highly linked with the markers that are also strongly associated with them. Taken together, it is suggested that Q93443_149, Q169393_182, and Q32220_12 may still be relatively young in evolution, i.e., no adequate generations that have passed to lead LD to disappear, but the other QTLs may have experienced a long evolutionary history, because only an extremely large number of generations can make the LD of highly linked loci drop to a low level.
Our study can be methodologically improved at least in two aspects. First, growth is a dynamic process, but our mapping was based on growth traits measured at single time points. A dynamic functional mapping (FunMap) approach that integrates the mathematical principle of growth into a mapping context has been developed (Ma et al., 2002;Wu and Lin, 2006). Because of its biological relevance and statistical robustness, FunMap has been applied to many plant mapping projects (Yang et al., 2006;Wu et al., 2011;Li and Sillanpää, 2013;Camargo et al., 2018;Lyra et al., 2020), widely recognized as a powerful tool for QTL mapping (Li and Sillanpää, 2015). Second, our mapping study finds that each QTL explains a relatively small portion of genetic variance (<10%) in growth traits. Similar phenomena were also detected in other studies (González-Martínez et al., 2011). However, it is possible that a smalleffect QTL may not be necessarily small, rather its independent effect is masked by the inhibition of negative regulators (Sun et al., 2021;Wang et al., 2021;Wu and Jiang, 2021). The exact effect of each QTL or gene can better be characterized through modeling gene-gene interactions (Wang et al., 2022). A new mapping approach, called functional graph theory (FunGraph) derived from FunMap, has been proposed to chart the genetic interactome network of complex traits from genetic association data Feng et al., 2021). A detailed analysis of our T. grandis data from the OP mapping strategy deserves by FunMap and FunGraph.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/supplementary material.