Postglacial phylogeography, admixture, and evolution of red spruce (Picea rubens Sarg.) in Eastern North America

Climate change is a major evolutionary force that can affect the structure of forest ecosystems worldwide. Red spruce (Picea rubens Sarg.) has recently faced a considerable decline in the Southern Appalachians due to rapid environmental change, which includes historical land use, and atmospheric pollution. In the northern part of its range, red spruce is sympatric with closely related black spruce (Picea mariana (Mill.) B.S.P.), where introgressive hybridization commonly occurs. We investigated range-wide population genetic diversity and structure and inferred postglacial migration patterns and evolution of red spruce using nuclear microsatellites. Moderate genetic diversity and differentiation were observed in red spruce. Genetic distance, maximum likelihood and Bayesian analyses identified two distinct population clusters: southern glacial populations, and the evolutionarily younger northern populations. Approximate Bayesian computation suggests that patterns of admixture are the result of divergence of red spruce and black spruce from a common ancestor and then introgressive hybridization during post-glacial migration. Genetic diversity, effective population size (Ne ) and genetic differentiation were higher in the northern than in the southern populations. Our results along with previously available fossil data suggest that Picea rubens and Picea mariana occupied separate southern refugia during the last glaciation. After initial expansion in the early Holocene, these two species faced a period of recession and formed a secondary coastal refugium, where introgressive hybridization occurred, and then both species migrated northward. As a result, various levels of black spruce alleles are present in the sympatric red spruce populations. Allopatric populations of P. rubens and P. mariana have many species-specific alleles and much fewer alleles from common ancestry. The pure southern red spruce populations may become critically endangered under projected climate change conditions as their ecological niche may disappear.


Introduction
The current structure of forest ecosystems in the Northern Hemisphere formed several thousand years ago because of major transcontinental migrations caused by climate fluctuations after the Last Glacial Maximum (LGM).This is a relatively short time on the evolutionary scale, especially for long-lived tree species, whose frontier populations may have occupied their current habitats for only a small number of generations.Repeated founder events associated with rapid postglacial migrations, mixture of lineages originating from different glacial refugia, and possible interspecific gene flow make the genetic structure of these frontier populations highly dynamic.At the same time, the trailing populations at the southern margin of the species' range often face decline when they are outcompeted by species better adapted to the new warmer climates (Hampe and Petit, 2005;Beckage et al., 2008) or cannot survive for other reasons regardless of competition.The projected rates of future climate change exceed those observed after the LGM (IPCC, 2022).For many Eastern North American tree species, the ecological optima will move northward at least by 100 km, and for several species the expected range shifts will likely exceed 250 km by the end of this century (Iverson and Prasad, 1998).Knowledge of the evolutionary history of a species is important to better understand its current state, making predictions of its fate in the future (Johnson et al., 2016;Johnson et al., 2018), and development of conservation programs if needed (Hoban et al., 2022) particularly in the light of today's rates of global climate change and human caused habitat fragmentation.This requires comprehensive analysis of genetic processes in tree populations at various evolutionary stages (Johnson et al., 2018) and spatial scales (LaRue et al., 2021) from the expanding front to the declining trailing edge.
Red spruce (Picea rubens Sarg.) is an excellent model system to consider postglacial migration and evolution of northern temperate forest trees in relation to climate change and interspecific hybridization.because: (1) its northern populations are only a few generations old, whereas its southern populations are fragmented and currently restricted to high elevations, including potential refugia, and northern populations are genetically differentiated from southern populations; (2) it is sensitive to climate and environmental changes and has experienced massive diebacks from environmental change caused by the combined effects of climate warming, historical land use, and industrial pollution; and (3) it hybridizes naturally with black spruce (Picea mariana (Mill.)B.S.P.) in the northern parts of its range, but its evolutionary relationships with black spruce are not clear; and (4) very little is known about its postglacial migration and evolution.
Red spruce is an important and characteristic component of late-successional forests of eastern Canada and the northeastern United States.The natural range of red spruce covers territories from North Carolina in the United States to eastern Ontario and Nova Scotia in Canada (Burns and Honkala, 1990).In the northern part of its range, red spruce is one of the main components of cold temperate forests, whereas the southern populations are largely scattered and discontinuous, and occur either at mountainous sites, or in cool and moist wetlands (Burns and Honkala, 1990).Red spruce is characterized by relatively low overall genetic diversity (Eckert, 1989;Hawley and DeHayes, 1994;Perron et al., 2000;Rajora et al., 2000a) and a narrow ecological niche that make it sensitive to climate and environmental changes (DeHayes et al., 1991).A major decrease in the size of red spruce populations from 1800 onward has been associated, in part, with climate warming (Hamburg and Cogbill, 1988), though it should be noted that there have been some recent observations of red spruce re-expansion (Rollins et al., 2010).Since the 1960's, a significant decline of red spruce populations has also been observed at many high elevation sites along the Appalachian Mountain chain (DeHayes and Hawley, 1992).This decline has been associated with the complex damaging effects of industrial emissions -reduced growth, defoliation and poor cold tolerance commonly observed in the affected stands (Friedland et al., 1984;McLaughlin et al., 1987;DeHayes et al., 1991;McLaughlin et al., 1993).Within the southern part of its range, red spruce is restricted to colder high elevation locations.These isolated populations of red spruce represent a valuable portion of the species' gene pool, which are likely to be highly sensitive to climate warming.
In the northern parts of its range, red spruce is sympatric with closely related black spruce, where introgressive hybridization occurs between these two species (Morgenstern and Farrar, 1964;Manley, 1972;Gordon, 1976).Since the criteria for identification of interspecific hybrids between P. rubens and P. mariana are not rigid, estimates of the degree of introgression vary from extensive to minor among studies (Morgenstern and Farrar, 1964;Manley, 1972;Gordon, 1976;Fowler et al., 1988;Bobola et al., 1996;Perron and Bousquet, 1997).Additional analysis is required to understand the evolutionary relationships between red spruce and black spruce, as well as ongoing processes in the hybrid zones.
The history of the postglacial migrations in red spruce probably had significant influence on the population structure in this species, and it should be considered when dissecting the more recent effects of air pollution, logging, and climate change and developing a conservation genetic strategy for its forest genetic resources in North America.However, very little is known on postglacial migration and evolution of red spruce.Based on higher allozyme allelic richness and heterozygosity observed in northern than southern red spruce populations, Hawley and DeHayes (1994) speculated that these two population groups originated from two separate glacial refugia: one in the southern Appalachian Mountains, and another situated in the coastal areas, which later became continental shelf to the east of the mid-Atlantic states.On the other hand, the genetic distance analysis in their study did not show separation between the northern and southern populations as would be expected if the populations originated from two glacial refugia.The higher genetic diversity in the northern red spruce populations may be a result of natural hybridization with black spruce instead of their origins from separate glacial refugia.Spruce was widespread in the continental United States during the glaciation: various Picea macrofossils and pollen grains occur from the ice margin to east-central Louisiana but are absent in the Atlantic coastal areas (Jackson and Overpeck, 2000;Jackson et al., 2000).Fossil records indicate that red spruce survived in the southern Appalachians during glaciation (Jackson et al., 2000) and do not support the existence of the second red spruce refugium during LGM.Red spruce existed in central Appalachians around 15,000 years BP (Watts, 1979;Lindbladh et al., 2003).Coastal refugia for spruce may have existed during the warmer mid-Holocene period to play a significant role at the later stages of recolonization process, but not during the LGM (Schauffler and Jacobson, 2002).Pollen stratigraphies indicated that P. rubens was not widely represented in the northern part of its current range until 1000-1500 years BP (Lindbladh et al., 2003).
In the present study, we investigated range-wide genetic diversity and population genetic structure of red spruce and reconstructed the history of its postglacial migrations using highly variable microsatellite markers of the nuclear genome.We evaluated two concurrent hypotheses: 1.) whether the existing red spruce populations have originated from a single LGM refugium, or 2.) there have been multiple refugia.Under the 1.) hypothesis of recolonization from a single LGM refugium, isolation by distance and reduced genetic diversity levels in the northern populations associated with the repeated founder events may be expected.2.) Multiple refugia can be revealed by clear genetic differentiation among lineages based on multilocus genetic structure.The migration time frames can be established from the recently available fossil pollen stratigraphies.We also assessed the evolutionary relationships between red spruce and black spruce and addressed the question whether red spruce is genetically depauperate using Bayesian analysis, highly variable molecular markers, and robust sample sizes.

Red spruce populations, field sampling and DNA isolation
Eight populations of red spruce were sampled across the range of this species (Figure 1).Six populations formed a transect along the Appalachian Mountain chain, and two populations were from New Brunswick (NB) and Nova Scotia (NS), completing a latitudinal cross-section of the northern part of the species' range.Five populations (WV, TN, NY, NB, and NS) were old-growth red spruce stands without significant human intervention documented in the past, and three (ME, QC, and NH) were represented by bulked seeds collected between 1997 and 2001 from about 50 mature trees (Table 1).Seeds were obtained from the Canadian Forest Service seed bank.Sixty mature trees, spaced at least 50 m apart to eliminate possible family structure effects, were sampled in the five old-growth red spruce populations, and sixty seeds were randomly selected from each of the three bulked-seed populations.Minimum spacing between trees was calculated based on the average seed dispersal distance (Burns and Honkala, 1990).To make sure that the bulked seed lots were not subject to a possible bottleneck effect, we compared their genetic diversity parameters with populations represented by the foliage samples.We found no significant difference in allelic richness and heterozygosity between the two sample groups.Possible heterozygosity fluctuations among seeds and adult trees due to the selection against highly inbred  individuals should be negligible as the latter are mostly eliminated at the seed formation stage (O'Connell et al., 2006a).Using seed collections along with other sample types is a common practice when the natural populations cannot be accessed due to logistical reasons or harvesting (Hawley and DeHayes, 1994;Jaramillo-Correa et al., 2004).
The sample size of 60 individuals per population allowed us to capture most allelic diversity in red spruce populations, both in the evolutionary young northern and older southern populations (Bashalkhanov et al., 2009).A minimum sample size of 57 individuals was recommended to detect all alleles with the threshold frequency of 0.09 with a 95% probability (Gillet, 1999).
To estimate the possible effects of introgressive hybridization of P. rubens with P. mariana in the sympatric zone and to understand evolutionary relationships between these species, two pure black spruce populations were included in the analysis: Pine Falls population from Manitoba (MB) (Rajora and Pluhar, 2003), and Goose Bay population from Labrador (NL) (Figure 1; Table 1).The Pine Falls population sample consisted of needles from mature trees, whereas the Goose Bay population was represented by a bulked seed collection.These two populations of P. mariana possibly originated from separate glacial refugia, according to the mitochondrial haplotype distribution analysis (Jaramillo-Correa et al., 2004).
Foliage samples were collected in plastic bags with 10 g silica gel pouches as desiccant and kept at ambient temperatures in the field.Upon arrival to the lab, the samples were stored at -20˚C.Genomic DNA was isolated using a high-throughput magnetic fishing protocol (Bashalkhanov and Rajora, 2008).

Microsatellite genotyping
Microsatellite markers targeting the biparentally-inherited nuclear (nu) genome were used to genotype the sampled individuals for various genetic diversity, population genetic structure, and phylogeography analyses.
Expressed Sequence Tag (EST)-based and genomic sequencebased microsatellite markers, previously developed in the Rajora lab for P. glauca and P. mariana (Shi et al., 2014;Rajora and Mann, 2021), were tested for amplification and polymorphism detection in red spruce.Genomic microsatellites had a higher proportion of null alleles.Only one showed the absence of null alleles during preliminary screening and was selected for future analysis.Most EST-based markers yielded good amplification results in P. rubens, P. mariana, and P. glauca.Eight EST-based microsatellites with 2bp and 3-bp core repeats were selected for subsequent genotyping (Table 2).Mutation rates for 2-bp repeats are normally higher than that for 3-bp and 4-bp repeats (Chakraborty et al., 1997), and they may provide information about the evolutionary processes occurring at different time scales.One of the primers in each pair carried a standard M13 tail sequence to facilitate fluorescent labeling and detection.In total, genotypes of 600 trees were determined for nine microsatellite loci (Table 2; Supplementary Data File 1).
Amplification reactions were performed in 10 ml reaction volume with 10 ng template DNA, 0.2 mM dNTP, 1.5-1.8mM MgCl 2 , 0.25-0.83pmol of each primer, 0.5 pmol of fluorescent labeled M13 primer (5'-IRDye700/800), 1x GoTaq Flexi Clear reaction buffer and 0.25 units of GoTaq Flexi DNA Polymerase (Promega, Madison, WI; Cat # M8295).Thermal cycling profiles were as follows: 2 min at 94°C, then 5 cycle touchdown step: 30 s at 94°C, 45 s initially at 65°C, then touchdown -2°C/cycle, 45 s at 72°C, followed by 33 cycles each of 30 s at 94°C, 45 s at the T A (Table 2), 45 s at 72°C, and the final extension at 72°C for 5 min.The heating and cooling rates in the Eppendorf EP-S thermal cyclers were set to 6°C/s and 4.5°C/s, respectively.Amplification products with incorporated fluorescent labels were separated on LiCor 4300 Genetic Analyzers (LiCor, Lincoln, NE).Up to 4 primer pairs were used in multiplex PCR with two loci running in each of the 700 nm and 800 nm IRDye detection channels.A minimum gap of 40 bp was maintained between loci to avoid overlapping of alleles.The gels were scored with SAGA GT/ MX 3.3.software, followed by manual data verification.Alleles were designated based on their amplicon sizes.

Genetic diversity analysis
Raw data were exported to a Microsoft Excel file.Data format conversions were performed with the Microsatellite Toolkit for Microsoft Excel (Park, 2001).Data quality for microsatellites was verified by the MICROCHECKER program (Van Oosterhout et al., 2004).MICROCHECKER can indicate possible presence of null alleles if there is an overall significant excess of homozygotes, and if it is evenly distributed across the homozygote classes.Then basic  (Rajora and Mann, 2021).
genetic diversity parameters, such as the mean number of alleles per locus, allele frequencies, expected and observed heterozygosity values, and the inbreeding coefficient (F IS )were calculated using the Microsatellite Toolkit and R (R Core Team, 2022).Alleles specific to P. rubens and P. mariana were counted.The effective number of alleles per locus (A E ) was determined as an inverse of expected homozygosity as described in (O'Connell et al., 2006b).Latent genetic potential (LGP) and genotypic additivity (richness) (G A ) were calculated as described in Rajora et al. (2000b).Latent genetic potential (Bergmann et al., 1990) provides estimates for the content of rare and low frequency alleles in a population that might contribute to its adaptive potential under the changing environmental conditions.Genotypic additivity (Rajora et al., 2000b) describes the genotypic diversity in a population.Observed G A is the total number of genotypes observed in a population summed over all the loci.The expected G A is a sum of the theoretical number of single-locus genotypes in a diploid population.The Shannon's information index (I) was calculated for each population.It has the advantage that it does not assume a population to be in the Hardy-Weinberg equilibrium, which may be unlikely in evolutionary young, or highly fragmented populations.One-way ANOVA was used for each genetic diversity measure to test the differences between the allopatric P. rubens and P. mariana populations, and the red spruce populations from the sympatric zone.Potential deviations from the Hardy-Weinberg equilibrium were assessed by Fisher's exact test (Guo and Thompson, 1992) implemented in ARLEQUIN 3.5 (Excoffier and Lischer, 2010).The Ewens-Watterson neutrality test (Ewens, 1972;Watterson, 1978) was also carried out using ARLEQUIN 3.5 to examine the possible deviations in the observed genetic variation from the neutral expectations.Tests for possible bottleneck events and isolation by distance were carried out using BOTTLENECK 1.2.02 (Piry et al., 1999), and IBD 1.52 (Bohonak, 2002) programs, respectively.Effective population sizes were estimated through q using the maximum likelihood coalescent approach implemented in the MIGRATE 3.0 program (Beerli, 2008), assuming the infinite allele mutation model for all markers, and the average mutation rate of 1x10 -3 , which is typical for microsatellite loci (Schlötterer and Wiehe, 1999;Marriage et al., 2009) and has been used in conifers (Pandey and Rajora, 2012;Rajora and Zinck, 2021).General statistical tests were carried out using the R statistical environment (R Core Team, 2022).

Population genetic structure analysis
Genetic differentiation between populations was assessed by G ST (Nei, 1973), F ST (Weir and Cockerham, 1984), and R ST (Slatkin, 1995), calculated using the FSTAT 2.9.3 program (Goudet, 2001).Several different approaches were used to determine the gene pool subdivision within red spruce and between red and black spruce.Groupwise F ST and Nei's genetic distance (Nei, 1972) estimates were determined using the R statistical program (R Core Team, 2022) and the ADEgenet package (Jombart, 2008).Population group comparisons were made based on various pairwise distance measures, including Nei's genetic distance (Nei, 1972), F ST (Weir and Cockerham, 1984), and Nm (Wright, 1931).The subdivision between the population groups was further tested by AMOVA using the ARLEQUIN 3.5 program.Bayesian clustering analysis implemented in STRUCTURE 2.3.4 (Pritchard et al., 2000) was used to infer the population structure with (location prior) and without referring to predefined geographical populations.Tests were run for K=1-14, in 10 iterations of 500,000 sweeps, plus 100,000 burn-in sweeps using the admixture and correlated allele frequency models.The number of populations/genetic groups in the data set was estimated using the DK parameter suggested by Evanno et al. (2005).Plots of population membership were constructed using CLUMPAK (Kopelman et al., 2015).

Phylogenetic and phylogeography analyses
To infer the patterns of the postglacial expansions of red spruce, pairwise Nei's genetic distances (Nei, 1972), F ST , and dm 2 (Goldstein  et al., 1995) were calculated using the POPULATIONS 1.2.30program (Langella, 1999) with 1,000 bootstrap replications.F ST and standard genetic distances assume the infinite allele mutation model (IAM), and dm 2 is based on the stepwise mutation model (SMM).Unweighted pair group method with arithmetic mean (UPGMA) trees were constructed, and a majority rule consensus tree was produced.UPGMA is not based on any particular evolutionary model, which may be beneficial when the population differentiation is driven by a complex interplay of several factors.Maximum likelihood trees were also constructed with CONTML p r o g r a m f r o m t h e P H Y L I P 3 .6 7 s o f t w a r e p a c k a g e (Felsenstein, 2004).

Population evolutionary history and admixture analysis
To assess historical patterns and dynamics of admixture and introgression between red and black spruce approximate Bayesian computation was run using DIYABC 2.1.0(Cornuet et al., 2014).First, a model of divergence from a common ancestor between red spruce and black spruce was compared to the alternative scenarios where either red spruce or black spruce split from one another.Following the finding that the most likely scenario was that the two species split from a common ancestor three possible divergence scenarios were tested: 1.) an ancestral model where all three populations diverged from a common ancestor (Figure 2B), 2.) a hierarchical model where black and red spruce diverged from a common ancestor and the admixed population diverged from black spruce (Figure 2C), and 3.) an admixture model where red and black spruce diverged from a common ancestor and the admixture population resulted from geneflow between the red and black spruce populations following divergence (Figure 2D).Red and black spruce populations were grouped based on their geographic location and Structure analysis, representing southern red spruce (TN, WV, NY), black spruce (NL and MB), and sympatric northern red spruce (NH, ME, QC, NB, NS).Relative posterior probabilities were computed to provide statistical support to select the most likely scenario.1,000,000 simulations were performed, and the most likely scenario was evaluated by comparing the posterior probabilities using logistic regression on 1% of simulated datasets closest to the observed data.Following selection of the most likely scenario, parameters were estimated including effective population size and number of generations since divergence and admixture.We used 29 years as the selected generation time based on Capblancq et al. (2020) to extrapolate to number of years since divergence.Scenarios were assessed using principal components analysis comparing deviations of the summary statistics of the posterior predictive distribution to the observed data.In addition to the pooled population samples, we selected six groupings of individual populations to test individually.The groupings included all three southern individual red spruce populations (NY, WV, TN), one admixed population (NB), and MB or NL black spruce populations (Supplementary Figure S1).We estimated relative posterior probabilities and assessed their fit as outlined above.Following model selection, we estimated the scenario parameters.

Allele composition and genetic diversity
Nuclear microsatellites with different core repeat sizes demonstrated contrasting patterns of molecular variation.Microsatellite markers with dinucleotide repeats had higher genetic variability (A=22.5)than the markers with trinucleotide repeats (A=7.8).This is consistent with the previously documented lower mutation rates in trinucleotide repeats lower than  (Evanno et al., 2005) suggested K=2 as the optimal clustering level (Figure S2).At K = 2 the split is between red and black spruce.K = 3 was selected as the sub-structure grouping level based on the Evano method (Figure S3).Red spruce populations: Tennessee, West Virginia, New York, New Hampshire, Maine, Quebec, New Brunswick, Nova Scotia.Black spruce: Labrador, Manitoba.Allopatric southern red spruce populations from Tennessee, West Virginia, and New York are well defined, while the northern populations show high degree of admixture.Note the presence of genotypes similar to red spruce lineages in "pure" black spruce population from Labrador.Historical patterns of admixture and introgression between red and black spruce were assessed using approximate Bayesian computation.Tested divergence scenarios included: (B) an admixture model where red and black spruce diverged from a common ancestor and the admixture population resulted from geneflow between the red and black spruce populations following divergence, (C) an ancestral model where all three populations diverged from a common ancestor, and (D) a hierarchical model where black and red spruce diverged from a common ancestor.The three populations tested include Southern Red Spruce (SRS)blue line, Northern Red Spruce (NRS)red line, and Black Spruce (BS)green line.(D) NA = starting effective population size, T2 = divergence time (in generations) from most recent common ancestor, T1 = time (in generations) of admixture event, ra = gene flow rate from red spruce, and 1-ra is gene flow rate from black spruce.Relative posterior probabilities were computed to provide statistical support to select the most likely scenario.1,000,000 simulations were performed, and the most likely scenario was evaluated by comparing the posterior probabilities using logistic regression on 1% of simulated datasets closest to the observed data.
Overall, 139 alleles were detected at the nine nuclear SSR loci (Table S1).Out of those, 36 alleles were specific to red spruce, and 10 were specific to black spruce (Table S1).Allopatric populations of P. rubens and P. mariana had different distribution of allele frequencies (Table S1).For example, at the locus RPGSE03, allele 229 was the most common (frequency 0.66-1.00) in red spruce, whereas in black spruce it showed low frequency (0.09-0.11).On the other hand, alleles 232 and 238 at RPGSE03 were common in black spruce but were absent in the southern allopatric red spruce populations and occurred at lower frequencies in the sympatric red spruce populations.The allele 198 at RPGSE10 was common in black spruce (frequency 0.37-0.48),but rare in southern red spruce.Overall, frequencies of 16 alleles demonstrated significant correlation with latitude in red spruce populations (Table S1): 7 out of 39 alleles in microsatellite loci with 3-bp core repeat units, and 9 out of 90 alleles in microsatellites with 2-bp repeats.A highly significant positive correlation between the pairwise F ST values and geographic distances among the 8 red spruce populations was observed for trinucleotide nuclear repeats (Mantel test r=0.74,p=0.001).This may be related to the increasing proportion of the black spruce alleles in the northern populations of P. rubens, rather than true isolation by distance.Once the loci RPGSE03 and RPGSE10 were removed from the analysis, the correlation between F ST and geographic distance was no longer significant.Dinucleotide microsatellites had high numbers of low frequency alleles having limited effect on the observed F ST values, thus they did not show significant isolation by distance in red spruce.
Observed (H O ) and expected (H E ) heterozygosity in red spruce populations varied between 0.36-0.44 and 0.49-0.62,respectively (Table 3).Moderate to significant heterozygote deficiency was detected in all populations, with average F IS =0.256, which is significantly higher than that previously reported for allozyme markers in red spruce (Hawley and DeHayes, 1994;Rajora et al., 2000a).Populations from Maine, Quebec, and New Brunswick had F IS of 0.318, 0.340, and 0.299, respectively (Table 3).Wilcoxon's rank test under the infinite allele mutation model (IAM) indicated an excess of heterozygotes in populations from New Hampshire and Quebec.However, with the limited number of loci employed, heterozygosity was a poor estimator of genome-wide inbreeding (Balloux et al., 2004).Significant deviations from the Hardy-Weinberg equilibrium were observed in all populations of red spruce, and they were more pronounced in the northern part of the species' range (Table S2).Ewens-Watterson neutrality test did not reveal any significant evidence for selection (lowest pvalue ~0.78).
Observed and expected heterozygosity and inbreeding coefficients were similar among P. rubens and P. mariana, but allelic richness, latent genetic potential, genotypic additivity, and Shannon diversity indices were significantly higher in black spruce.The northern red spruce populations had higher allelic and genotypic genetic diversity and N e (Table 3).However, the  differences in genetic diversity levels and N e were not statistically significant between the northern and southern red spruce (Table 3).The two populations from NY and NH had the lowest latent genetic potential values.

Population genetic structure and phylogeography
Microsatellite markers indicated moderate levels of genetic differentiation between the red spruce populations: total multilocus F ST 8.5%, R ST 6.6% for all microsatellite loci, and F ST 10.0%, R ST 8.5% for dinucleotide repeats, and F ST 5.8%, R ST 4.9% for trinucleotide repeats loci.Earlier studies based on allozyme markers reported G ST of 7.5% (Hawley and DeHayes, 1994), and F ST of 4.7% (Rajora et al., 2000a).
Bayesian clustering carried out using STRUCTURE 2.3.4.(Pritchard et al., 2000) and assessed using the Evano method (Evanno et al., 2005) suggested 2 major clusters based on the admixture model using both location prior (Figure 2A, Figure S2) and no location prior (Figure S3; Figure S4) models.Two minor peaks were also suggested at K=3 and K=4 (Figures S2, S3) indicating genetic substructure within the sampled populations.At K=2, the split is between red and black spruce, as expected (Figure 2A, Figure S4), and K=3 agrees with partitioning of populations between SRS, NRS, and BS (Figure 2A).The pure red spruce populations TN, WV and NY were consistently closely related, with both WV and NY sharing closer ancestry with TN.Five populations showed some level of increased admixture from BS including NH, ME, QC, NB, and NS.Pure P. mariana from NL and MB also were in good agreement.The "pure" black spruce from NL had some admixed red spruce lineages.
The UPGMA tree based on the Nei's standard genetic distances (IAM model implied) was generally consistent with the geographic distribution of the sampled populations and formed four clusters: southern populations from TN and WV, along with NH; northern NB and NS, plus NY; ME and QC; and the two pure black spruce populations of NL and MB (Figure 3).SMM-based methods demonstrated poor resolution and absence of concordance with spatial distances.Northern red spruce populations share a significant proportion of black spruce alleles (Table S1) as a possible result of previously documented introgressive hybridization (Perron and Bousquet, 1997), which leads to biased R ST and dm 2 estimates.Maximum likelihood trees generally confirmed the UPGMA clustering pattern, although bootstrap support among red spruce populations was weak (Figure S5).The Maine and Quebec populations were found to be genetically closer to P. mariana.Both populations are located in the middle of the introgression zone (Perron and Bousquet, 1997), and a higher proportion of black spruce alleles may be expected.Heterozygote deficiency in those two populations is probably related to the distribution of allele frequencies in hybrid populations leading to elevated H E , rather than actual inbreeding or sampling bias as the differences in H O are not statistically significant from other populations.An old-growth red spruce population from New Brunswick has similar excess of H E .Groupwise AMOVA indicated that 3.40% of the genetic variation was between the northern and southern red spruce population groups, 3.47% among populations within groups and 93.13% within populations.This partition of the genetic variation was statistically significant (P=0.000 to 0. An unweighted pair group method with arithmetic mean (UPGMA) cluster plot of 8 red spruce and 2 black spruce populations based on Nei (1972)   noticeable differentiation among the northern and southern populations of red spruce.At the same time, genetic differentiation between red and black spruce is significantly higher than the within-species subdivision (Table 4).Approximate Bayesian computation identified the admixture model as the most likely scenario based on its higher posterior probability (0.8863) compared to the hierarchical split (0.0353) and divergence from a common ancestor scenario (0.0784) (Figure S6).Model checking indicated that the admixture scenario fits well with the data as indicated by the observed data centering on the cluster of posterior predictive distribution in the PCA (Figure S7).Values of effective population size based on the estimates of posterior distributions of parameters suggest that red spruce and black spruce expanded from an ancestral population to their current populations 4290 generations ago.An admixture event occurred around 343 generations ago (Table 5).At an assumed generation time of 29 years (Capblancq et al., 2020) the admixture event occurred following the LGM approximately 9,947 ybp.The admixture rates of both red and black spruce with the admixed population are 0.66 and 0.34 respectively.DIYABC does not consider geneflow following divergence, rather only admixture.Because there is pronounced substructure (based on Structure analysis) between SRS, NRS, and BS and genetic differentiation we are reasonably confident in our ability to assess our scenarios using approximate Bayesian computation.Assessment of individual populations resulted in similar results, with the admixture scenario identified as the most likely model among the six population groupings and the admixture event occurring between 218-1890 generations ago depending on the population combinations compared (Table S5; Figure S1).

Genetic diversity of red spruce
Our results suggest that red spruce has moderate levels of genetic variation.It may be difficult to compare genetic diversity estimates obtained from microsatellite with that obtained from allozyme markers due to the inherently different mutation rates (Avise, 2004).Nevertheless, microsatellite genetic diversity levels were about 4-5 times higher than that of allozyme genetic diversity levels reported in red spruce (Table 6).Microsatellite allelic diversity observed in red spruce populations (A=6.3-9.0,average 7.4) was like that documented in Sitka spruce-Picea sitchensis (A=5.7-10.0,average 7.7 (Mimura and Aitken, 2007), and sympatric white spruce, Picea glauca (average A=6.83) based on EST microsatellites (Fageria and Rajora, 2013) but lower than observed in this species based on genomic (A=16.38;Rajora et al., 2005) or genomic and EST microsatellites (average A=11.38;Fageria and Rajora, 2013) (Table 6) and P. mariana (A=9.22-9.44;this study -Table 3).There is variation in mutation rates among different microsatellite loci resulting in different levels of genetic variation revealed by these loci, which makes study-to-study comparisons difficult.In our study, genetic diversity estimates for P. rubens and P. mariana populations were obtained using the same set of microsatellite markers.Our study suggests that red spruce has lower allelic diversity, latent genetic potential, expected and observed genotypic richness than black spruce for the same microsatellite markers (Table 3).However, the genetic diversity observed in red spruce in our study was similar to that reported for transcontinental white spruce based on six EST microsatellites (Fageria and Rajora, 2013), three of which were the same as we used in our study.Because eight of the nine microsatellite markers we used in our study were EST microsatellites, a comparison with the results obtained from EST microsatellites is more valid than from genomic microsatellites as genetic diversity at EST microsatellites is much lower than at genomic microsatellites (Fageria and Rajora, 2013;Fageria and Rajora, 2014;Rajora et al., 2023).
Species with narrower or regional distribution ranges typically have lower genetic diversity than the species with broad or transcontinental ranges (Hamrick et al., 1992).This is well reflected in Picea rubens and Picea sitchensis, which have similar geographic distribution patterns following the dominant mountain systems in their respective regions.Measured with the same marker system (allozymes or microsatellites), allelic diversity and heterozygosity in red spruce were generally lower than in transcontinental or wide-spread spruce species (e.g., P. glauca, P. mariana, P. abies), but similar to other spruce species that have regional or narrow distribution ranges or to transcontinental white spruce based on EST microsatellites (Table 6).Previously reported allelic diversity for allozyme markers in red spruce varied from A=1.47 (Hawley and DeHayes, 1994) to A=1.60 (Rajora et al., 2000a) and 2.40 (Eckert, 1989).This is similar to the mean allozyme allelic diversity of A=1.83 alleles per locus averaged across 102 studies in gymnosperms (Hamrick et al., 1992).Although P. rubens has lower overall genetic diversity levels than sympatric black spruce and white spruce and some other spruce species with very wide or transcontinental distributions, we cannot conclude that this species is genetically depauperate as previously reported (Perron et al., 2000).Southern populations of red spruce demonstrated slightly lower nuclear microsatellite allelic (11.6%) and genotypic (1.6% or 16.5%) diversity, N e (9.6%) and expected heterozygosity (7.7%) compared to the northern populations, although the differences were not statistically significant (Table 3).Hawley and DeHayes (1994) also reported 4.5% higher allelic richness in the northern red spruce populations.Introgressive hybridization with P. mariana may have enriched the allelic diversity in the northern populations of P. rubens in the sympatric zone.On balance, southern red spruce populations showed slightly higher (3.3%) observed heterozygosity than the northern populations.This is in contrast with the report of higher observed heterozygosity (0.0885) in northern versus southern (0.0747) populations by Hawley and DeHayes (1994).
Our results demonstrated that red spruce and black spruce have a substantial number of species-specific alleles for the nuclear microsatellites, and that the northern red spruce populations have significant influx of black spruce alleles (Table S1).Deviations from the Hardy-Weinberg equilibrium were observed in all populations of red spruce (Table S2).Repeated colonization events and introgressive hybridization with P. mariana in the northern part of the species range, and migration to uplands and air pollution effects in the southern populations may be the likely cause of these deviations (Bashalkhanov et al., 2013).
Our study suggests that most of the nuclear genetic diversity resides among individuals within populations in red spruce with inter-population genetic differentiation of about 8% (Tables 4).
These results are consistent with those obtained previously for red spruce with allozymes (Hawley and DeHayes, 1994;Rajora et al., 2000a) and are typical for conifers.The results indicate that the southern red spruce populations are more genetically differentiated from black spruce than the northern populations (Tables 4).This may be a result of more recent introgressive hybridization with black spruce in the sympatric northern populations.Our data also demonstrate that the southern populations of red spruce are less genetically differentiated (F ST =0.048) and have higher gene flow (Nm=4.96)among themselves than the northern populations (F ST =0.089; Nm=2.56).Indeed, the lowest F ST and highest gene flow was observed between TN and WV populations (F ST =0.033; Nm=7.33) (Table S4).In contrast, Hawley and DeHayes (1994) reported higher allozyme divergence among southern versus northern red spruce populations.In our study, the two southernmost red spruce populations from Tennessee and West Virginia show a high degree of genetic similarity despite having been in isolation since mid-Holocene, which indicates that genetic drift has not played a significant role so far in these populations.Microsatellite data suggest significant gene flow occurred between these populations in the past: Nm=7.33, which contradicts previously reported high divergence between the southern red spruce populations (Nm=1.6)and the subsequent assumption of manifestation of genetic drift in southern populations (Hawley and DeHayes, 1994).The southern TN and WV populations may be remnants of a pure red spruce glacial refugium in Appalachians documented earlier by fossil data (Jackson et al., 2000).
Postglacial migrations and evolution of Picea rubens, and Picea rubens -Picea mariana complex Lack of pronounced isolation by distance among the red spruce populations across the current distribution range, found in our study, may be an indicator of a somewhat complex post-glacial migration history of this species.Late Quaternary pollen and macrofossil data are available for many sites worldwide.In conjunction with other records of climate and environmental change they provide valuable information about the distribution of extant and extinct taxa at various spatial and temporal scales.Spruce macrofossils during the Last Glacial Maximum (ca 21000 years BP) were found at most sites to the south of the ice margin, and they are most abundant in Mississippi and Louisiana (Jackson et al., 2000).Molecular phylogeographic reconstruction indicated that the Mississippi Valley may have been one of the several glacial refugia for P. mariana (Jaramillo-Correa et al., 2004).With the retreat of the ice shield, spruce was among the first species to colonize the newly exposed areasa rapid migration northeast was recorded approximately 16000-12000 years BP, followed by a recession during warmer mid-Holocene periods (Watts, 1979;Jackson et al., 2000;Lindbladh et al., 2003).Cooler areas on the East Coast may have served as a secondary refugium during that period (Jackson and Overpeck, 2000;Schauffler and Jacobson, 2002;Lindbladh et al., 2003).Fossil pollen records indicate that red spruce was rare in New England during late Glacial and early Holocene (14000-8000 years BP), except for two smaller sites in the coastal area (Lindbladh et al., 2003), and it probably coexisted there with black spruce during the Late Holocene (last 1400 years), at least in Maine.It did not exist in the northeastern part of its current range until 1000-500 years BP (Lindbladh et al., 2003).It is often challenging to separate P. rubens and P. mariana from their pollen alone, as they have a great deal of similarity in morphology (Lindbladh et al., 2003).This means that P. rubens may be underestimated in the palaeoecological record where it was rare and there are some challenges in assessing the pollen record of P. rubes due to this scarcity.
As the red spruce populations from TN and WV are genetically close, and given the previously published fossil data, we suggest that during the Last Glacial Maximum red spruce probably occupied a single refugium in Southern Appalachians, and then migrated northward along the East Coast.Being a late successional, shadetolerant species, it was much less abundant than the early successional black spruce, which perhaps migrated to the New England states from its Mississippi refugium.During the recession in warmer mid-Holocene, the cooler and moist Atlantic coastal areas may have sheltered both P. rubens and P. mariana.Since the two species are closely related, extensive introgressive hybridization may have occurred.Cooler climates in the region observed during the last 1500 years facilitated another rapid range expansion of red spruce and black spruce.The contemporary sympatric populations of red and black spruce may be descendants from the secondary mid-Holocene refugium, whereas southern allopatric red spruce populations probably represent the remnants of its glacial refugium.
Results from approximate Bayesian computation suggest a pattern of admixture with introgression occurring between red spruce and black spruce beginning approximately 343 generations ago.Beginning 4290 generations ago with an ancestral N e of 1800, red spruce and black spruce diverged from their common ancestor.Then, around 343 generations ago, following the LGM, recontact and introgression occurred in the sympatric zone.This scenario explains the current genetic subdivision of red spruce into northern and southern variants indicated by the maximum likelihood and IAM-based trees and Bayesian structure clustering, as well as the influx of black sprucespecific alleles in the northern red spruce populations, which is detectable even in the old-growth stands.Earlier the presence of red spruce-specific mitotype in the eastern populations of black spruce was reported (Jaramillo-Correa et al., 2004).Occurrence of red spruce mitotype in the pure populations of P. mariana in the sympatric zone is an important indication of the two-way nature of the hybridization process.Furthermore, significant interspecific gene flow was documented in morphologically "pure" sympatric populations of P. rubens and P. mariana with species-specific RAPD markers (Nkongolo et al., 2003).Although RAPD markers have certain inherent limitations (dominance, questionable band homology, inconsistency), their results are in good agreement with the microsatellite data in the present study.We found a significant proportion of black spruce alleles in the presumably pure old-growth red spruce stands from Abraham Lake, Nova Scotia, and Loch Alva Lake, New Brunswick.Inversely, Bayesian approach suggested existence of admixed red spruce lineages in the pure black spruce population from Goose Bay, Labrador.Genetic distances based on allozyme data published by Hawley and DeHayes (1994), also provide evidence of hybridization between red spruce and black spruce in New Brunswick and Nova Scotia, although the authors did not consider it significant.Since the reproductive barriers between P. rubens and P. mariana are weak, but still significant (Major et al., 2005), these findings can be interpreted as a result of long-term introgression, rather than local interspecific gene flow, which is consistent with the post-glacial migration and hybridization scenario outlined above.Previous studies of the introgressive hybridization between P. rubens and P. mariana may have underestimated the scale of the process.

Red spruceblack spruce evolutionary relationships
Based on high genetic similarity between red spruce and black spruce and the observation that genetic diversity in red spruce was a subset of the genetic diversity in black spruce, it has been hypothesized that P. rubens is a derivative species from P. mariana (Perron et al., 2000;Jaramillo-Correa and Bousquet, 2003).However, this conclusion is questionable because speciesspecific polymorphisms for red spruce and black spruce have been identified in both nuclear and organellar genomes (Bobola et al., 1992;Perron et al., 1995;Bobola et al., 1996;Germano and Klein, 1999;Nkongolo et al., 2003), and fossil evidence suggests that these species have co-existed in various locations since the LGM (Watts, 1979;Jackson et al., 2000;Lindbladh et al., 2003).
Although black spruce shares a significant proportion of its allelic diversity with red spruce, both species were found to have very distinct patterns of distribution of microsatellite alleles.Thirtysix microsatellite alleles were specific to red spruce and ten microsatellite alleles were specific to black spruce (Table S1).In a previous study, a number of species-specific impenetrable SNP loci were identified in P. rubens and P. mariana (de Lafontaine et al., 2015).This makes it difficult to conclude that the overall genetic diversity of P. rubens is a subset of P. mariana's gene pool suggesting that the previously reported progenitor-derivative relationship between these two species (Perron et al., 2000;Jaramillo-Correa and Bousquet, 2003) is unlikely.This conclusion is further supported by the co-existence of these species in the fossil records (Watts, 1979;Jackson et al., 2000;Lindbladh et al., 2003), higher karyotype similarities of black spruce to white spruce than to red spruce (Nkongolo, 1996), and presence of many species-specific genetic variants in both the nuclear and organellar genomes (Bobola et al., 1992;Perron et al., 1995;Bobola et al., 1996;Germano and Klein, 1999;Nkongolo et al., 2003).A broader phylogenetic reconstruction of the Picea genus identified P. rubens as a sister species, rather than a derivative of P. mariana (Lockwood et al., 2013;Parmar et al., 2022), which further corroborates our viewpoint.Recent isolation and genetic drift were hypothesized to be the main driving forces for red spruce speciation (Perron et al., 2000;Jaramillo-Correa and Bousquet, 2003).However, we did not find any significant evidence for bottleneck events and genetic drift in the southern pure red spruce populations.Our results, based on sufficiently large population sample sizes (60), are in contrast from results using genomic markers that have shown a reduction in population size and long-term population decline in N e (potentially indicating drift) following the LGM based on whole exome sequencing (Capblancq et al., 2020).However, the Capblancq et al. (2020) study was based on a very small sample size of 2 to 11 individuals per population.While the two species are closely related and obviously share a common ancestor, the existence of shared alleles may be a result of common ancestry, relatively recent gene flow, and introgression occurring after the LGM as indicated by our approximate Bayesian analysis.A similar picture was reported in the interspecific hybridization zones for Quercus in Europe (Lexer et al., 2006) and Larix in Asia (Semerikov and Lascoux, 2003).
We, therefore, conclude that red spruce is not likely a derivative species from black spruce and the postglacial migration history and introgressive hybridization should be considered when describing evolutionary relationships between these two closely related species.

Conservation implications
Previous large-scale climatic fluctuations have played a major role in shaping the current population structure of red spruce, causing long distance migrations and gene pool rearrangement.Given the projected climate warming rates, we expect that this species might face a significant evolutionary impact in the near future.The isolated pure red spruce populations in Southern Appalachians have maintained their genetic diversity levels and they represent a significant proportion of the species' gene pool.Although these populations have little commercial value, they form an essential habitat for several endangered plant and animal species (Blum, 1990).Warming climates may eliminate the ecological niche for the high elevation southern red spruce populations within the next several decades.The central populations of red spruce in New Hampshire and Vermont may become more fragmented because of the rapid elevation shift in their ecological optima (Beckage et al., 2008), which places them under the risk of extinction in the future.Introgressive hybridization with P. mariana enriches the genetic diversity in P. rubens, while these two species remain ecologically different.Conservation efforts to preserve the ecologically important pure red spruce in Southern Appalachians should be strengthened.Considerable advances have been achieved in developing genomic resources and molecular markers for Picea, and future population genomics studies should take advantage of this newly developed resources, while taking into account the minimum sample sizes required for making reliable inference based on observed genetic variation.

FIGURE 1
FIGURE 1Geographic distribution of Picea rubens and Picea mariana in eastern North America, adapted from Little(1971).Species ranges are indicated by the red (southern red spruce), and dark grey (black spruce) shading.Purple shading indicates the northern red spruce sympatric zone with black spruce.Red box on the inset map indicates the extent of the study area.Sampled populations (back triangles) are indicated and labeled as per Table1.Red spruce populations sampled: TN, Tennessee; WV, West Virginia; NH, New Hampshire; NY, New York; ME, Maine; QC, Quebec; NB, New Brunswick; NS, Nova Scotia; Black spruce: NL, Labrador; MB, Manitoba.
FIGURE 2(A) Bar plot estimation of the membership coefficient (Q) for each spruce individual grouped on the geographic location.The figure is shown for K = 2 through 5, under the admixture model.The Evanno method(Evanno et al., 2005) suggested K=2 as the optimal clustering level (FigureS2).At K = 2 the split is between red and black spruce.K = 3 was selected as the sub-structure grouping level based on the Evano method (FigureS3).Red spruce populations: Tennessee, West Virginia, New York, New Hampshire, Maine, Quebec, New Brunswick, Nova Scotia.Black spruce: Labrador, Manitoba.Allopatric southern red spruce populations from Tennessee, West Virginia, and New York are well defined, while the northern populations show high degree of admixture.Note the presence of genotypes similar to red spruce lineages in "pure" black spruce population from Labrador.Historical patterns of admixture and introgression between red and black spruce were assessed using approximate Bayesian computation.Tested divergence scenarios included: (B) an admixture model where red and black spruce diverged from a common ancestor and the admixture population resulted from geneflow between the red and black spruce populations following divergence, (C) an ancestral model where all three populations diverged from a common ancestor, and (D) a hierarchical model where black and red spruce diverged from a common ancestor.The three populations tested include Southern Red Spruce (SRS)blue line, Northern Red Spruce (NRS)red line, and Black Spruce (BS)green line.(D) NA = starting effective population size, T2 = divergence time (in generations) from most recent common ancestor, T1 = time (in generations) of admixture event, ra = gene flow rate from red spruce, and 1-ra is gene flow rate from black spruce.Relative posterior probabilities were computed to provide statistical support to select the most likely scenario.1,000,000 simulations were performed, and the most likely scenario was evaluated by comparing the posterior probabilities using logistic regression on 1% of simulated datasets closest to the observed data.

TABLE 1
Red spruce and black spruce populations sampled and their geographic locations.

TABLE 2
Microsatellite primers sequences, amplification conditions, amplicon size range, and number of alleles observed in Picea rubens.

TABLE 3
Population genetic diversity parameters in 8 red and 2 black spruce populations.
A T , total number of alleles for all loci; A, mean number of alleles; A E , effective number of alleles per locus; LGP, latent genetic potential; G A (O), observed number of genotypes (genotypic additivity); G A (E), expected genotypic additivity; I, Shannon's information index; Ne, effective population size inferred from coalescent q estimates.Population groups: SRS, Southern populations of Picea rubens (according to the Structure Analysis): TN, WV, and NY; NRS, Northern populations of Picea rubens: NH, ME, QC, NB, and NS; RS, All populations of red spruce; BS, All populations of Picea mariana: NL, and MB.Individual population names are listed in Table1.p-values are given for one-way ANOVA among the corresponding population groups.Bold p values are significant at 95% confidence.Duncan means separation test at a=0.05 was done on the same groups.
genetic distance for microsatellite markers.Population names and locations are provided in Table1.Bootstrap support values are given for 1000 replications.Red spruce populations: TN -

TABLE 4
Pairwise genetic subdivision estimates between and within population groups.

TABLE 5
Parameter estimates from approximate Bayesian computation admixture scenario.e 1, effective population size of SRS; N e 2, effective population size of BS; N e 3, effective population size of NRS.NA, starting effective population size; t2, divergence time (in generations) from most recent common ancestor; t1, time (in generations) of admixture event; ra, gene flow rate from red spruce; and 1-ra is the gene flow rate from black spruce.Generation time is assumed to be 29 years.MRCA, most recent common ancestor. N

TABLE 6
Genetic diversity and fixation index (F) estimates in representative Picea species with endemic or regional (R) and widespread or transcontinental (T) distribution.