A Reevaluation of Superior Tree Performance After 48 Years for a Loblolly Pine Progeny Test in Southern Arkansas

A plus-tree progeny test of full- and half-sib “superior” loblolly pine (Pinus taeda) was installed in 1969 on the Crossett Experimental Forest (CEF) to consider the performance of 28 improved families with unimproved planting stock from the CEF (family W29). Performance was evaluated using data from young (3-year-old; early 1970s), maturing (25-year-old; 1994), and mature (48-year-old; 2017) trees. With the exception of a single improved family, early survival was high (>80%), with most families exceeding 90%. Three years post-planting, fusiform rust infection rates were also low, with most families having less than 1% of seedlings infected. At this early stage, the unimproved CEF family W29 only slightly underperformed the best full- and half-sib superior families. By 1994, W29 had slightly higher than average merchantable volume. This trend continued for W29 when remeasured in 2017, with the average merchantable volume yield for W29 statistically similar to the most productive families. This study found only limited volume performance gains from crossing plus-trees. However, it was important to note that several of the best height growth-performing families in 1972 were not the highest merchantable volume producers at 25 or 48 years, and some of the worst early performers moved into the upper tiers by the later remeasurements. These outcomes suggest that depending solely on early height performance to select families for long-term (>50 year) volume (especially if adjusted for wood density) or biomass yields may not be the best approach for forest managers seeking to increase carbon sequestration.


INTRODUCTION
Forest genetics and tree improvement programs have greatly benefited forestry in the southern United States (Borders and Bailey, 2001;Allen et al., 2005;White et al., 2014;Wheeler et al., 2015). Driven by the desire to increase productivity, disease resistance, seedling survival, and shorten harvest rotations, decades of increasingly sophisticated efforts have resulted in extensive plantations of improved loblolly pine (Pinus taeda), helping the South to become the most productive timber region in the world (Allen et al., 2005). This work progressed rapidly as researchers and managers overcame specific challenges. For example, by identifying rust-resistant loblolly pine families and selectively breeding them, tree improvement efforts greatly decreased the occurrence and economic impact of fusiform rust (Cronartium quercuum f. sp. fusiforme) in just a couple of decades (Randolph et al., 2015;Walker and McKeand, 2018). Further successes in southern pines (including hybrids) have likewise improved wood volume production, bole straightness, branching patterns, and other targeted attributes (e.g., Dorman, 1976;McKeand et al., 2003;Belaber et al., 2018;Lauer et al., 2021).
The most improved pines are generally deployed under the most intensive management, which focus on short (less than 25 year) rotations to effectively recoup investments in seedlings, site preparation, competition control, fertilization, etc. Further study of loblolly pine has led to new approaches to the propagation of certain preferred traits using somatic embryogenesis (Gupta and Durzan, 1987), genomic selection (e.g., Resende et al., 2012;Isik, 2014), and new analysis approaches based on mate selection algorithms derived from other breeding programs (Isik and McKeand, 2019). The recent sequencing of the loblolly genome (Zimin et al., 2014) also offers promise for additional gains. However, these successes do not mean that there have not been failures, or at least undesired outcomes. As examples, there are often negative correlations between families chosen for fast volume growth and other desired wood quality properties such as high stiffness and strength (e.g., Martin et al., 2001;Apiolaza, 2008;Santos et al., 2021). Even the successful deployment of improved, shortrotation southern pine plantations has come with significant and often negative social and environmental consequences following the widespread conversion of natural-origin pine, pinehardwood, and hardwood forests Greis, 2002, 2013;McGrath et al., 2004).
Traditionally, provenance and progeny tests are the mechanism of choice for determining the performance of different families, but there are only a few multidecadal tests in loblolly and shortleaf (Pinus echinata) pines and these rarely exceed 35 years (e.g., Wakeley and Bercaw, 1965;Wells and Rink, 1984;Rink and Wells, 1988;Buford, 1989;Schmidtling and Froehlich, 1993). Unlike efforts related to short-term wood volume production, the long-term performance of genetically improved loblolly pine is less certain (Allen et al., 2005) for ecosystem goods and services such as biomass production and carbon sequestration. However, sustaining progeny tests is particularly challenging, given their oft-limited scope (a few families tested with small sample sizes at a few locations), vulnerability to loss, and the resources needed to maintain such efforts. Major projects such as the large Southwide Pine Seed Source Study (SPSSS) (Wells and Wakeley, 1966) have demonstrated these challenges. For example, after the initial establishment challenges and drought losses of the SPSSS failed to derail this study, those pioneering southern pine progeny tests were followed without major incident to 25 years (Wells, 1983). However, in the next decade a series of disturbance events occurred that reduced this study to a handful of viable locations (Buford, 1989). Furthermore, with many land managers seeking increasingly shorter pine plantation rotations, long-term progeny tests have not been considered useful or necessary, especially given the likelihood of new and even more improved families becoming more available. Conventional wisdom-supported by carefully targeted research-holds that the volume-based performance of loblolly pine families can usually be determined after a short amount of time (e.g., McKeand, 1988;Lowe and van Buijtenen, 1989;Isik and McKeand, 2019;Maynor et al., 2021), with early success being maintained through at least midrotation (Bridgwater and McKeand, 1997;McKeand, 1988;Raley et al., 2003;Farjat et al., 2017). A recent example of rapid evaluation of the progeny of cloned and half-sib loblolly pine genotypes is found in Shalizi et al. (2020), who made family choice recommendations after seedlings were only 4 years in the ground. If consistent and reliable, such early determinations can considerably speed up family deployment, especially if particularly poor performing genotypes (e.g., disease-susceptible families or low bole quality) are quickly recognized.
DNA marker-based technologies and analysis offer the promise of even faster genomic selections and more efficient tree breeding programs (Isik, 2014;Isik and McKeand, 2019). However, there are potential issues that may only be identified following extended observations of progeny tests. For instance, Bridgwater and McKeand (1997) suggested a "Type A" family, which grows poorly in early stages of a trial, only to be a good performer later in the rotation. One such example could be found in the grass stage of longleaf pine (Pinus palustris): Ford (2017) reported that early height measurements (at ages 3 and 7 years) were poor predictors of this species' volume at 17 and 40 years. Martin et al. (2001) suggested the implications of families selected for high early performance measures that failed to meet expectations under longer rotations, especially if test conditions did not match field circumstances. Furthermore, some outputs (e.g., specialty products, such as poles or pilings, or maximizing carbon to be sequestered under contracts) may not be optimized by families selected for maximal early volume growth and short rotations. As an example, a growing body of evidence suggests significant carbon accumulation in older trees and stands that could alter strategies (much longer rotations) for using forests to sequester atmospheric carbon dioxide (e.g., Carey et al., 2001;Luyssaert et al., 2008;Stephenson et al., 2014;Sillett et al., 2019;Leverett et al., 2021).
Hence, when feasible, old progeny tests should be maintained and periodically remeasured to search for any interesting or unexpected results. This paper evaluates the data from a recently remeasured loblolly pine progeny test 48 years after it was planted on the USDA Forest Service's (hereafter, USFS) Crossett Experimental Forest (CEF) in southern Arkansas. This progeny test was originally designed to consider the survival, fusiform resistance, and productivity differences between full-and halfsib superior pines identified across an industrial landowner in southern Arkansas and northern Louisiana, using seedlings from unimproved loblolly pine on the CEF as the standard for comparison. Although the original study has long been closed, this progeny test offers the opportunity to determine if the top performing families when young (age 3 years) remained the best choice decades later (at 25 and 48 years). Ultimately, results from such long-term progeny test observations may be adapted to improve management decisions for the extended (50+ year) rotations of loblolly pine required to support carbon markets and similar ecosystem services.

Site Description
While best known for silviculture of naturally regenerated southern pine, tree improvement and forest genetics studies were also conducted on the CEF from 1951 until 1975 (Bragg et al., 2016). In 1969, USFS Plant Geneticist Hoy Grigsby installed the final outplanting of an 8.1-ha superior pine progeny test in the eastern half of Compartment 3 on the CEF (Figure 1). 1 This site, selected because its relatively level ground and soils (Bude and Providence silt loams, with a nominal loblolly pine site index of 26.0-27.5 m at 50 years; Gill et al., 1979), was considered representative of this part of the Upper West Gulf Coastal Plain. The existing pine-dominated forest was cleared and the site prepared prior to the first outplanting in February of 1966; competing vegetation was controlled prior to planting of seedlings each year until the final outplanting was installed in 1969.

Progeny Test Design, Implementation, and Measurement
Details on the 1969 outplanting are limited; most of the information on this superior pine progeny test came from a series of unpublished establishment and progress reports between 1967 and 1969 (Grigsby, 1969). Additional information on these progeny tests can also be gleaned from an unpublished closing report written by a later investigator, USFS Plant Geneticist Warren Nance (Nance, 1978). This outplanting (colored areas on Figure 1) included 29 families, 28 of which were produced by crossing superior pines (22 full-sibs and 6 half-sibs) and one which was an open-pollinated (using unimproved trees in natural-origin stands; hereafter, "woods-run") CEF-origin loblolly pine family ( Table 1). The families were bred from superior pines identified by trained foresters on Georgia-Pacific (GP) lands in southern Arkansas and northern Louisiana; the parent trees in the 1969 test all came from Ashley County, Arkansas (Grigsby, 1969). These superior pines were selected using multiple traits (see Table 1), not all of which focused on increasing volume. For instance, families were also chosen for bole straightness and minimal forking to help GP supply the world's first southern pine plywood mills in Crossett and nearby Fordyce, Arkansas (Love, 1996).
The original intent of this progeny test was to determine the influence of superior pine family on various performance measures, including survival, vigor, growth, form, wood specific gravity, and fusiform rust susceptibility. Full-and half-sib families pollinated in seed orchards established on the CEF and nearby GP lands, and the resulting seeds were placed in cold storage at the CEF. All families were germinated and raised to 1-year-old at the CEF before being hand-planted as bare-root seedlings in Compartment 3 on 2.43-m by 2.43-m spacing in a series of replicated blocks (Grigsby, 1969). Installed in February 1969, this superior pine progeny tests using five blocks, each of which contained 29 14.63-m by 14.63-m plots with 36 planting points, for a total of 5,220 loblolly pine seedlings (Grigsby, 1969). An additional five slightly smaller plots (called "Arboretum" plots; see Figure 1) were planted in 1969 to fill in the last open space in this part of the Compartment 3; four of these plots and their 124 trees are included in this analysis.
According to his study plans, Grigsby intended to assess the 1969 progeny tests at the end of their first, third, fifth, and tenth years, with a final decision on the disposition of this study at the end of their fifteenth growing season (Grigsby, 1969). However, the retirement of Project Leader Russ Reynolds shortly after the installation of this superior pine progeny test soon led to the closing of the CEF, and Grigsby was reassigned to the genetics unit in Saucier, Mississippi (Bragg et al., 2016). The CEF archives do not have any year one measurements; Grigsby measured the study at the end of the third year, but then retired. Nance assumed responsibility for the open CEF genetics studies and completed the fifth-year measurement for the 1969 outplanting. Based on these results, Nance (1978) decided there was insufficient value to continue and formally closed the study.
When the CEF reopened in the late 1970s, its mission had shifted to developing low-cost, natural regeneration-based management options for small, non-industrial landowners, and none of the studies related to the forest improvement program were restarted. However, because of their investment in the effort and the utility of the information for their management purposes, GP staff continued to measure the progeny tests in Compartment 3 in the 1980s and 1990s (for this paper, only the 25-year remeasurement from February-March 1994 are used). The 1994 assessment included measurement of DBH (to the nearest 0.25 cm) and total tree height (to the nearest 3 cm) using unspecified tools and techniques (probably diameter tapes and clinometers for heights). During their long history, the progeny tests in Compartment 3 have been operationally thinned and salvaged several times; with a few exceptions, all remaining plots currently contain between 3 and 6 trees.
Recently, the development of genetic markers and other more sophisticated DNA-based analysis has rekindled agency interest in these old progeny tests. To determine what information may remain, we relocated and documented the progeny tests remaining on the CEF started in late 2016. The 1969 outplanting was chosen to be the basis for the initial effort because of its clear planting patterns and relative intactness. Plot corners were reestablished in the 1969 outplanting and all surviving loblolly pines were tagged, had their diameter at breast height (DBH, to the nearest 0.25 cm with a diameter tape) and total tree height (to the nearest 15 cm using a TruPulse R 200X laser hypsometer and the sine method, Bragg et al., 2011) measured between January and April 2017. Any obvious signs of damage (e.g., from ice storms) or disease (e.g., fusiform cankers) or competing hardwood or potentially influential site factors (e.g., tree growing on a prairie mound) were also noted.

Data Analysis
In an unpublished table in the CEF study files, Grigsby reported family-level summaries by block, from which Bragg (2018) produced means and standard deviations. Rather than using these same summaries for this paper, the original data sheets  Table 1) indicated [for example, the Crossett woods-run pines (W29) in Block I were located in Plot 405].
were relocated from the late February-early March 1972 field measurements and used in a new analysis of this data on survivorship and survivor height. In 1969, there were 5,344 loblolly pines planted (5,220 seedlings in the official design and 124 from four Arboretum plots). Overall survivorship (in percent) was determined in 1972 by those planted pines still recorded as alive (versus dead or missing). Total tree height was directly measured with a height pole to the nearest 3 cm for the 1972 survivors. Survivorship data (percentages) were arcsine transformed; both survivorship and total height data were compared for statistical significance (α = 0.05) using one-way analysis of variance (ANOVA) and Tukey's honestly significant difference (HSD) test for unequal n for the means separation (Zar, 2010). Fusiform infection was not included in the 1972 data sheets for the 1969 outplanting (or the Arboretum plots), so it was not recalculated. Hence, to determine fusiform rust occurrence (in terms of percent of seedlings with signs of the disease), Grigsby's original analysis was used, with block-level plot means were treated as replicates; since there were five blocks, n = 5 for each family; these represent an assessment of 4,814 loblolly pines that survived the first 3 years of this outplanting. Fusiform rates were also compared using one-way ANOVA and Tukey's HSD for unequal n on arcsine transformed percentages. While CEF management records are incomplete, Compartment 3 (including all or part of the progeny tests) were thinned in at least 1985, 1996, and 2002 using varying standards and objectives (Bragg, 2018). In addition to thinnings, these progeny tests were also periodically salvaged following other mortality events (e.g., lightning, wind, ice, insects) over the last four decades. Because they targeted damaged or diseased individuals, these thinnings and salvage removals prevent further assessments of survival and fusiform occurrence; hence, they were not compared for the 1994 and 2017 remeasurements. However, it was possible at these later rates to compare DBH and height, as well as merchantable inside-bark (wood) volume. Merchantable volume (V, converted from ft 3 to m 3 ) was calculated using a regional loblolly pine volume equation (Van Deusen et al., 1981): where DBH and HT (total tree height) were originally in inches and feet, respectively, and R is a top-diameter conversion ratio (for trees of this size, assumed to equal 1.0). A coefficient of variation (CV) was also derived for all families (for height only in the 1972 measurements; DBH, total tree height, and merchantable volume for the 1994 and 2017 measurements). DBH, total tree height, and merchantable volume were compared for the 1994 and 2017 data sets using one-way ANOVA, with mean separation done using Tukey's HSD test for unequal sample sizes (α = 0.05).
Because of the lack of long-term control over this single surviving outplanting and a desire to consider family performance as realized following 3, 25, and 48 years postplanting, more conventional heritability analyses were not attempted. Rather, changes in top performers over time were evaluated, particularly in contrast with the local woods-run family, with an emphasis on the implications of such decisions over time for stands to be manage for long-term carbon storage.

Biomass-Focused Comparisons
As the Crossett woods-run family represented a collection of locally sourced seed of unknown parentage, it provided the best baseline to compare against the top-performing families in this progeny test suggested at the early (3 years post-planting), midrotation (25 years), and late rotation (48 years) analysis. At 3 years, the presumed top performer (F15, at 2.1 m on average) was the family with the tallest average height; at 25 and 48 years, the top performers were identified as those with the highest merchantable volumes at those respective years (F8 and H23, at 0.59 and 2.00 m 3 , respectively). The merchantable volumes of four families (F8, F15, H23, and W29) at ages 25 and 48 years were then adjusted from green to oven-dry volume by reducing them by 12.3% as the volumetric shrinkage of loblolly pine to 0% moisture content (FPL, 2010, their Table 4-3).
Since there are relationships between loblolly pine wood specific gravity, genetics, tree age, and silvicultural practices (e.g., Zobel et al., 1969;Koch, 1972;Megraw, 1985;Schimleck et al., 2018), I compared a range of specific gravities for these best performing families against the Crossett woods-run family. This was done because whole tree estimates of wood specific gravity were not made on this progeny test. Hence, oven-dry merchantable volume was multiplied by wood density at 0% moisture content for families F8, F15, and H23, assuming three levels of specific gravity (0.45, 0.50, and 0.55). This range of specific gravities is consistent with that possible in loblolly pine, which reflects both a degree of genetic control and its response to silvicultural practices (e.g., Saucier and Taras, 1969;Wahlgren and Schumann, 1975). For W29, two age-based specific gravities (0.45 at 25 years and 0.48 at 48 years) were derived from surveys Data above are percentages, but statistical test applied to arcsine transformed fractions. a Averages with the same letters are not significantly different at α = 0.05 (arcsine transformation of percentage data, followed by ANOVA and Tukey's HSD test for unequal n).
of this variable for local-origin loblolly pine in the Arkansas area (e.g., Wahlgren and Schumann, 1975;Tauer and Loo-Dinkins, 1990). The product of oven-dry merchantable volume and wood density yields a per-tree oven-dry biomass estimate; multiplying that by 350, 500, and 650 trees per hectare at 25 years and 150, 250, and 350 trees per hectare at age 48 years provides a perhectare oven-dry biomass quantity to compare different families across a range of specific gravities.

Measurement Reanalysis
When installed, the 1969 outplanting consisted of 4,054 loblolly pines from full-sib families, 1,110 from half-sib families, and 180 were CEF woods-run seedlings ( Table 2). When measured in 1972, mortality had been almost universally low-19 of the 22 full-sib families and the Crossett woods-run (W29) family averaged between 92.8 and 98.3% survivorship. Two of the remaining full-sib families and all of the half-sib families had at least 83% survivorship, but due to higher levels of variation in plot-level survival, these were not significantly less than the best (only F8, at 54.4%, experienced significantly (p < 0.05) lower survivorship). Most families tested had at least one plot with 100% survival, and only two families (F7 and F8) had plots with <75% survival. Out of the dozens of plots evaluated, only two had a fusiform rust infection rate of approximately 6%, and only a single family (F8) averaged more than 2% infected ( Table 3). It is likely that the higher fusiform infection rate in F8 contributed to this family having the lowest survival rate (54.4%) after three growing seasons. Along with 16 other families, W29 showed no evidence of fusiform infection when checked in February of 1972. 2 Substantial variation in height within families resulted in many of them being statistically indistinguishable from each other after 3 years (Table 4). On average, the tallest family was F15 at 2.1 m; families F2 and F6 also averaged at least 1.8 m tall.  Data above are percentages, but statistical test applied to arcsine transformed fractions. a Averages with the same letters are not significantly different at α = 0.05 (arcsine transformation of percentage data, followed by ANOVA and Tukey's HSD test for unequal n).
Most families had good (1.5-1.8 m) to fair (1.2-1.5 m) height performance and only two (F11 and F12) proved to be poor (<1.2 m) at this age (Table 4). At 1.6 m, W29 was on the lower end of good height performance. Nance (1978, his Table 27) reported similar findings; his analysis of height performance at 5 years noted only five families exceeded W29 by 5% or more and some families averaged more than 25% shorter. Both families F2 and F15 had maximum heights > 3 m, while all families had minimum heights of <1 m. A reanalysis of the 1972 plot-level data found that most tested families generally had good to excellent survivorship, fair to good height growth, and low to very low fusiform infection rates at 3 years, with the W29 seedlings performing well in terms of survivorship, total height, and fusiform occurrence. Using a series of ad hoc relative performance thresholds (Table 5), only one family (F6) fell into the excellent category in all three measures of success (survivorship, total height, and fusiform rate). Families F2, F10, F14, F15, F21, and F22 each had the highest rating in two of the three categories, with most families (including W29) scored at least one "excellent" rating. Most families also had two or more "good" ratings, and in 1972, only F8 seemed to be poorly suited for the CEF, having received only one good and two poor ratings.

Measurement Analysis
After 25 years in the ground, three families reached significantly different heights than the Crossett woods-run family (Figure 2 and Supplementary Table 1). Family F2, which recorded one of the tallest saplings (3.2 m) and had one of the higher average total heights (1.8 m) in 1972, averaged 22.1 m in 1994. This compares to the 21.0 m average height of Family W29, which was intermediate amongst the families tested (Figure 2). Two families (F10 and F12) proved to be significantly shorter on average than W29 at the time of this remeasurement; F10 also had the shortest tree in 1994 at 12.8 m. It is important to note that all families produced at least one specimen that exceeded 22.3 m in height  after 25 years in the ground, which is only modestly less than the tallest (24.4 m) at this stage. Family W29 remained in the upper third of the largest diameter families in 1994, with no other families having average DBHs that were significantly greater than W29's 29.5 cm mean (Figure 2 and Supplementary Table 2). The family with the greatest DBH in 1994, F8, averaged 31.0 cm (SD = 3.3 cm) but did not produce the individual tree with the largest diameter after 25 years in the ground-that specimen came from F7 and had a DBH of 42.7 cm. Nine families were significantly less Asterisk (stacked vertically) above data reflect families significantly lower or higher than W29 at the following significance levels (* = 0.05 ≤ p < 0.01; ** = 0.01 ≤ p < 0.001; *** = p ≤ 0.001).
in DBH. This was not a surprising result, given that there was considerably greater variability in this attribute amongst families than height (CVs of 7-15% for height compared to 3-8% for DBH; Supplementary Tables 1, 2). Only two families (F13 and F11) averaged less than 25.4 cm DBH after 25 years, and the smallest individual loblolly pine registered only 11.9 cm DBH.
Calculated merchantable tree volume in 1994 yielded several statistically significant differences by family after a quartercentury of growth. Because Equation [1] is most strongly influenced by DBH, those families with the highest average diameters also tended to yield the highest merchantable volumes (Figure 2 and Supplementary Table 3). Families F8 and F2 were both tall and had large DBHs, ensuring that under this formulation they yielded the largest average individual tree volumes in 1994 (0.59 and 0.56 m 3 , respectively). Note that although these two families averaged the highest tree-level volumes, the largest individual specimens in this progeny test came from other full-sib families (approaching 1 m 3 in MVOL; Supplementary Table 3). After 25 years of growth, the Crossett woods-run family W29 was still a strong performer, with an Asterisk (stacked vertically) above data reflect families significantly lower or higher than W29 at the following significance levels (* = 0.05 ≤ p < 0.01; ** = 0.01 ≤ p < 0.001; *** = p ≤ 0.001).
average individual tree merchantable volume 0.51 m 3 and some specimens of this family exceeding 0.8 m 3 (Figure 2). The worst performers were significantly lower, yielding 0.4 m 3 or less per average tree.

Measurements Analysis
In early 2017, crews measured 615 live loblolly pines in the 1969 outplanting of the progeny tests in CEF's Compartment 3 (Supplementary Table 4). All of the original 29 families had at least 13 trees; the family with the highest number remaining in 2017 (H26) had 28 trees. Because of the thinnings and salvage done over the years, it is not possible to compare survival trends through 2017. Very little evidence of fusiform could be found in the 2017 sample-only 2 of the 615 pines were identified with cankers. This limited fusiform presence can be attributed to the inherently low infection rate of local-origin families, mortality of infected pines early in the study from the disease or other causes, and likely removals of infected trees later during the various harvest removals. Crown damage (evidenced by distorted branches and bole crooks) from a significant ice storm in the late 1990s produced some of the shorter trees, but most of the severely damaged pines were salvaged in the immediate aftermath and hence are not represented in this assessment. While the extent and nature of storm damage can be related to family-based vulnerabilities (e.g., Xiong et al., 2010;Pile et al., 2016), insufficient evidence and control in this progeny test (coupled with post-event salvage) prevented further evaluation of a genetic link. While under some conditions genetics could prove a confounding analytical factor, such a storm two decades in the past is not likely to meaningfully influence the interpretation of these data, especially given that southern pines usually quickly recover lost height after top damage (e.g., Wiley and Zeide, 1991;Dipesh et al., 2015).
Significant family-based differences in tree height were apparent 48 years after planting in this progeny test, although with a considerable amount of variability within families (Figure 3 and Supplementary Table 4). Individual specimens from most (23 of 29)  Several families had individual specimens that exceeded 55 cm in DBH at 48 years, including one that exceeded 59 cm, and all families reached at least 31 cm DBH at this age (Supplementary Table 5). By 2017, Family H23 had emerged as the having the largest average tree DBH at 49.2 cm (Figure 3). Although now significantly shorter than the tallest families, W29 remained in the upper half of average tree DBH (average of 46.4 cm), and proved to be significantly greater in DBH than two of the full-sib families (F13 and F11, at about 40 cm DBH).
Again, volume trends at 48 years after planting were most sensitive to DBH. While all but two tested families had individual specimens that reached or exceeded 2.0 m 3 of green merchantable volume, only one family (H23) averaged 2.0 m 3 (Supplementary Table 6). The smallest individual tree merchantable volumes found in most families tended to range from 0.75 to 1.25 m 3 , or about half as large as the biggest specimens. Because of the considerable degree of variation in merchantable volume for all families (CV from 15 to 30%; Supplementary Table 6), the Crossett woods-run family was not significantly different in size than any of the other tested families (Figure 3). Although not significant, W29's average merchantable volume (1.71 m 3 ) was approximately 0.3 m 3 per tree less than the most productive family and 0.4 m 3 more than the least productive family. However, the largest average volume families (e.g., H23, F2, F8, F7) were significantly bigger than the average volumes of smallest families (F13, F11, F14); these differences were on the order of 0.5 m 3 per tree.

Family Performance and Biomass Differences Over Time
The best performing family suggested by evaluating the height growth performance of 3-year-old loblolly pine in this progeny test, F15, did not remain the best performing family over time ( Table 6). By 25 years post-planting, F15 had fallen to 11th best performer of the 29 compared and was only 80% as large in terms of merchantable volume as the best performer at age 25 and by age 48, F15 had fallen to 20th (still about 80% of peak). The best-performing family at age 25, F8, had only been the 9th best performer at age 3 years (and had survived the worst at that age); by age 48 years, F8 had slipped slightly to thirdbest overall, almost 7% less than the best-performing family at that time. According to the results of this particular progeny test on the CEF, at 48 years of age, half-sib family H23 had outperformed the next best performer by over 5% (in terms of merchantable volume; Supplementary Table 6), largely due to its greater girth (Supplementary Table 5). Open-pollinated Family W29 ranged from 12th best (age 3 years) to 9th best (age 25) to 11th best (age 48). Table 7 demonstrates that volume alone is an insufficient arbitrator in determining biomass production performance for loblolly pine (so long as number of trees per unit area is held constant). Regardless of family, the advantage of higher individual tree volume yield at a given age can be more than offset by a less voluminous family with noticeably higher wood specific gravity. For instance, W29 was predicted to have greater ovendry biomass than either F15 or H23 at age 25 years when their specific gravities were all 0.45 because of the greater individual tree volume of W29 at this age (Table 7). Regrettably, this study lacked the family-averaged specific gravities needed to determine if this relationship influence the CEF progeny test biomass production results.

Early Performance Results
Early survival of this progeny test was excellent for most families. While a few Arkansas-based studies have had comparable survivorship (e.g., Grigsby, 1973), especially for local-origin loblolly pine, this high survivorship ( Table 2) contrasted sharply with several seed source and progeny tests of loblolly pine in this area. For instance, Schmidtling (1987) noted 3-year postplanting survival rates of a variety of loblolly pine families of 59% at Horatio, Arkansas, 75% at Crossett, Arkansas, and 85% at Stewart, Mississippi. Though reasons behind these variations in planted loblolly pine survival are unclear, it seems likely that this study's high survivorship can be attributed to excellent planting practices, effective site preparation, and perhaps most importantly, the good fortune of having adequate precipitation immediately prior to and after planting. Weather records from the CEF showed a wet year in 1968 (just over 1930 mm of precipitation; the CEF averages about 1410 mm annually) and slightly drier than average years in 1969 and 1970 (1245 and 1346 mm, respectively), followed by a major drought in 1971 (988 mm) and a return to average (1417 mm) in 1972. High early survivorship may also be partially attributed to low fusiform rust infection at 3-5 years post-planting. Because of this low rate, Nance (1978) paid little attention to family-based differences; however, his Figure 26 indicated most families had a somewhat more fusiform than W29 at 5 years. This is not a surprising result-loblolly pines in the Upper West Gulf Coastal Plain (especially southeastern Arkansas) often have a relatively low fusiform rust rates (less than 10% infected), although this depends on family (Grigsby, 1973(Grigsby, , 1975a(Grigsby, ,b, 1977Randolph et al., 2015). For example, when planted in other regions and exposed to a wider range of environmental conditions, woods-run loblolly pine from the vicinity of the CEF generally have good survivorship and low (less than 20%) fusiform infection after their first decade [although see Grigsby (1975b) for some higher rates]. Other studies of fusiform infection rates indicated that the progeny of superior (improved) pines may fare somewhat better than those of conventional woods-run sources (e.g., Grigsby, 1975a;Walker and McKeand, 2018). Given that the families in this CEF progeny test had been pre-screened for fusiform resistance, low levels of this disease are not surprising.

Consistency of Growth Performance Over Time
Although this limited assessment (one outplanting of a progeny test of a small number of loblolly pine families) constrains the applicability of the results, a number of interesting patterns can be seen from a comparison of relative family success at 3, 25, and 48 years post-planting. First, while the Crossett woods-run family W29 performed well, it was not able to consistently match the best performing full-sib families in terms of height over time. However, it was only TABLE 7 | Oven-dry merchantable biomass predictions (using different levels of wood specific gravity) of loblolly pine plantations at 25 and 48 years after planting using the top-performing family identified at 3, 25, and 48 years of age (F15, F8, and H23, respectively), compared with the Crossett woods-run family (W29) as the standard. modestly shorter than the tallest seedlings at the early stage of evaluation in this progeny test (Table 4), which helped prompt Nance (1978) to dismiss the potential of this superior pine trial. Grigsby, in his unpublished correspondence related to his review of Nance's report, believed this dismissal to be too hasty. The fact that later measurements did find significant differences between families-even if not necessarily with the Crossett woods-run source-could have favored further investigations. Second, analysis of this progeny test at later dates (Figures 2, 3) indicated that whatever measurement of growth performance is used-height, DBH, or volume-family-level success varied, sometimes dramatically, over time. For example, of the 10 tallest families at 3 years after planting (Table 6), only two remained in the same top 10 (compared in terms of average merchantable volume) at 25 and 48 years. The families that produced the greatest merchantable volumes at 25 and 48 years often had modest performance at the earliest observation stage. Rank orders also change considerably from ages 25-48 years, as found in the tallest family at 3 years (F15), which proved to be only 11th largest in merchantable volume at 25 years and average only 20th largest at 48 years; Family H23 behaved in the opposite fashion, being only 22nd tallest (on average) at 3 years, increasing to 10th largest in average merchantable volume at 25 years, and then largest at 48 years.
These findings are not in agreement with the consensus of the tree improvement community, which has found good evidence of much greater fidelity in performance rank over time than suggested by this CEF data set (e.g., Rehfeldt, 1984;Sluder, 1984;Bridgwater and McKeand, 1997;Raley et al., 2003). Such provisional findings in this limited progeny test study does not imply that the other studies are wrong or their recommendations misplaced-indeed, their documentation of performance gains are impressive at the stand-, landscape-, and regional-levels (e.g., Aspinwall et al., 2012;Restrepo et al., 2019;McKeand et al., 2021). There are several possible reasons for the apparent discrepancies of this study, including a limited sample size and lack of replications on different sites (the original 1969 study was also installed at two other locations, but these were lost after the study was closed in the 1970s), confounding impacts of random environmental effects that overwhelm inherent genetic performance differences between families, a lack of control over the random effects that may have unduly influenced the results, the young age (3 years) at which the first observations were made which might have been too early to properly control for dominance patterns, or differences in response to competition (self-thinning) as trees aged and the canopy closed (Rehfeldt, 1984;Talbert and Strub, 1987). Furthermore, there are definite breeding program advantages and cost effectiveness opportunities if best performing families can be identified in a few years versus decades, which may be a more meaningful to landowners seeking increase timber yields.
Perhaps the most compelling value of this study is the value in considering family performance differences to a much later stage of development, if possible. After all, while Rehfeldt (1984) argued that early selections for specific genetic traits (such as height) was a better basis for comparison than a measure of performance (such as growth, which is an integration of multiple traits), forest management decisions are primarily driven by how a family will perform to meet the desired objective(s). Considerations of desired management outcomes are often based on different attributes than just early volume production (e.g., Maynor et al., 2021) because tree responses to environmental uncertainty and influences accrue over time. For instance, Restrepo et al. (2019) applied a meta-analysis of loblolly pine growth and yield drivers across the southeastern US and found that physiographic region had a differential impact on tree diameter, height, stand basal area, and volume, with lower coastal plain stands performing notably better at early ages, but flattening out quicker compared to upper coastal plain plantations. This performance "switch" based on physiographic region suggests similar changes for other attributes related to genetics may be possible. Hence, those looking to maximize long-term woody biomass production or optimize carbon sequestration, particularly those realized only many years later, should base their family-selection decisions on long-term performance rather than idealized controls implemented for different short-term goals. Unlike research studies, managers cannot simply ignore or control for the environmental factors such as ice damage, longterm droughts, or changes to carbon allocation patterns their plantations will likely experience under ordinary circumstances.

Other Outcomes From This Progeny Test
Regardless of family, tree form and quality (including branch attributes and self-pruning) were also important elements of superior pine determination in Grigsby's work-sawtimber production had long dominated this region of the US South. Additionally, the development of a southern pine-based plywood industry in the early 1960s had also greatly increased demand for clear, straight logs. The emphasis of tree form over sheer wood fiber production may have also influenced the early evaluations of the success/failure of these superior pine progeny tests. After all, the priorities of a landowner interested in supplying a southern pine plywood mill are not the same as one producing fiber for a pulp mill (Van Buijtenen et al., 1971). After nearly 50 years post-planting, it is very apparent throughout this progeny test that bole quality is almost universally superb, with long, straight stems and only infrequent defect (Figure 4). Although bole grading was not attempted for this (or any previous) analysis, it seems almost certain than average quality of this progeny test is substantially greater than would have been expected even in a well-tended naturalorigin stand.
Perhaps this was a major reason why GP remained interested in this outplanting for years after the temporary closure of the CEF in the 1970s. Although Grigsby's successor Nance dismissed the early results of these superior pine crosses and had ended the USFS's role in this set of progeny tests, GP's tree improvement staff continued to follow these trees into the 1990s. When it became a member of the Western Gulf Forest Tree Improvement Program (WGFTIP) in 1974, GP sent their periodic remeasurements (typically done every 5 years) to the WGFTIP for evaluation. The 1969 CEF progeny test was successful enough during this period to get assigned a test identifier (GP-258), and a number of the GP-sourced superior pine families proved productive enough to be added to the WGFTIP's catalog of preferred family lines and used to help complete GP's second generation seed orchard (Texas Forest Service, 1974, 1976, 1977. Additional work (not presented here) was also done on the general combining ability and specific gravity heritabilities for GP-258 (Texas Forest Service, 1979;Byram and Lowe, 1994).
The results shown in Figures 2, 3 support another conclusion. Local volume equations that do not incorporate differences in tree height (and most do not) are inappropriate for comparing volumes between local families and those chosen from more distant seed sources. Although the families in this CEF progeny test were from nearby locations in southern Arkansas and northern Louisiana, the differences in height in these superior pine offspring are clear when compared to the Crossett woods-run family. Local volume equations based solely on DBH (or height, or some other single variable) may not capture significant differences in allometry and lead to inaccuracies in volumetric predictions that could meaningfully impact predictions (Avery and Burkhart, 1983).

Implications for Southern Pine Silviculture
While the early performance of either the full-or half-sib families tested against the Crossett woods-run saplings was not initially impressive, a number of families did perform at a statistically higher level over the long run, particularly in terms of height growth. A reevaluation of the potential gains of some of the tested families may be in order, particularly if forest managers desire to retain local loblolly pine genetics rather than importing those of more distant sources or look to avoid simplifying their stands' genetic composition for the sake of maximizing short-term stand-level yields. After all, using new tools to focus on the "best" selections based on a limited suite of contemporary timber priorities to improve economic gains (e.g., Isik and McKeand, 2019;McKeand et al., 2021) comes with inherent risks of selecting families that may not be well-adapted to future conditions, including a rapidly changing climate, new diseases or insect pests, or even socio-political challenges. Again, the ability of long-term progeny tests to reevaluate family performance under these circumstances over time is a distinct opportunity that should be pursued when possible.
Other research has likewise reported trends that did not appear in loblolly pine progeny tests evaluated early in the rotation. For example, Walker et al. (2020) documented that differences in loblolly pine survivorship and stand density between contrasting provenances (the more productive Atlantic Coastal Plain families and the more drought-tolerant Lost Pines of Texas) did not become apparent until at least 10 and often 20 years after establishment. While Walker et al.'s results did not demonstrate any dramatic changes in performance by individual families, some of these switches may have occurred. Restrepo et al.'s (2019) similar findings for physiographic region further supports reconsideration of some assumptions about the stability of these early predictions. Though the influence of attributes such as bole quality, disease susceptibility, drought tolerance, or even wood specific gravity may support certain planting stock choices, currently early growth performance still dominates decision-making.

Implications for Carbon Management
The results of this long-term progeny test study also have significant implications for carbon-driven management. Forests, particularly those of the southern US, are well-recognized for their ability to sequester atmospheric carbon dioxide (Birdsey and Heath, 1997;Johnsen et al., 2001). The overwhelmingly privately owned southern forests are also appealing for carbon markets because of the general willingness of many of these landowners to actively manage their forests to achieve the additionality required (e.g., Johnsen et al., 2001;Nepal et al., 2012;Clay et al., 2019). However, decisions made to manage loblolly pine for carbon sequestration purposes are not inherently the same as those for more conventional products such as lumber, veneer, or pulpwood, and suggest the role of silvicultural practices or planting stock must be evaluated differently (e.g., Johnsen et al., 2001;Bragg and Guldin, 2010;Aspinwall et al., 2012;Zhao et al., 2016;Clay et al., 2019). For instance, both the additionality and permanence requirements of forest-related carbon offsets can be much longer and more stringent than business-as-usual management (e.g., Ruseva et al., 2017), thus altering motivations and treatment considerations.
As the observations from this extended progeny test suggest, selecting the best family for long-term carbon sequestration purposes may require a different set of considerations. Assuming that any family-based differences in long-term survival or disease resistance are negligible, choosing the fastest growing family (F15) based on early height performance would have resulted in 4-13% less oven-dry biomass per hectare after 48 years when tree specific gravity was 0.5 or less compared to the Crossett woods-run trees ( Table 7). Choosing the best performers at ages 25 (F8) and 48 years (H23) greatly narrowed or even reversed that trend. For instance, at the lowest evaluated specific gravity (0.45), H23 produced almost 10% more ovendry biomass at 48 years than W29, a ratio that increased to over 34% more if H23 had a specific gravity of 0.55 ( Table 7). In terms of absolute production differences, H23 could generate from 24 to 86 tons more oven-dry biomass per hectare if the stand had been managed for 350 stems/ha than W29 under this range of specific gravities. Such an increase would not only sequester more carbon over the life of a contract but could generate additional revenue for the landowner without requiring any additional management actions (e.g., fertilization) that could reduce net income or diminish carbon storage benefits.
Further, the measures of success in tree improvement programs optimized for the identification and selection of early growth performance may miss other opportunities to bolster the contributions of loblolly pine plantations established for long-term carbon storage. Certainly, one of the most desirable outcomes of current tree improvement programsshortening stand rotation lengths-is antithetical to in situ (not product-driven) carbon markets, which seek to add a degree of permanence to the sequestered carbon in living trees . For instance, Johnsen et al. (2001) projected a range of different loblolly pine plantation harvest rotations over a 100-year evaluation period, and predicted contrasting trends based on different considerationsa steady decline in total stemwood carbon gained with increasingly long rotation lengths versus a steady increase in mean standing stemwood carbon over that same range of rotations. This difference arose because Johnsen et al. (2001) included harvested products in their total stemwood carbon gains (although not specified, presumably, Johnson et al. incorporated decay through the application of a "storage factor, " Bates et al., 2017), whereas standing included only the onsite live trees. Other changes to wood composition (particularly in terms of carbon content), specific gravity, e.g., Zobel et al., 1969), and tree allometry over time are also critical to determining the actual quantities of carbon stored and these factors are not inherently featured in current tree improvement programs.
Longer rotations of loblolly pines planted to increase biomass for carbon offsets or credits do come with additional considerations. For instance, Maynor et al. (2021) acknowledged that biomass plantations established for short-term purposes (e.g., pellet production) could utilize higher-risk genotypes to take advantage of higher yields but that sawtimber-size products would favor more conservative genotypes with better stem form. It is logical to extend Maynor et al.'s (2021) reasoning that optimizing biomass production for long-term (50-75 years or longer) periods could be refined to feature families that sequester more carbon (a product of both wood volume and wood density) rather than maximizing early volume production. Short-term observations of key total tree biomass determinants such as bole taper, specific gravity, and non-bole (e.g., branch or foliage) biomass contributions may vary considerably by families and over time (as the tree ages), further distorting the patterns observed in young trees, especially if extrapolated.

CONCLUSION
This study suggests the value of retaining loblolly pine progeny tests well past that required to select families for early growth performance. The utility of such long-term observations is particularly evident if planted pines are to be retained much longer than conventional silvicultural rotations (currently, between 20 and 30 years in the Upper West Gulf Coastal Plain). While further analysis and a larger data set are required, the change in rank order of the most and least "successful" families after 48 years could mean that certain objectives (such as carbon sequestered under longterm contracts) may be better served by a more measured evaluation of growth performance. After all, the top early performers could prove to be the best option for loblolly pine plantations managed to store carbon 50, 75, or even 100 years into the future. Without further observations of performance in later decades, determining the best options may not be feasible.
There are also other possibilities suggested by this work. The desire to intensify southern pine silviculture to improve volume gain, shorten harvest rotations, improve bole quality, and decrease disease susceptibility has largely driven tree improvement efforts for conventional timber products from loblolly pine, particularly in recent decades (McKeand, 2019;McKeand et al., 2021). This interest, coupled with economic incentives to control competition, manage planting density, and ameliorate site deficiencies to shorten rotation lengths have been the focus of much of this work to date, and have been highly successful in supporting the forest products industry in the southern US . More recently, the benefits of loblolly pine tree improvement programs for supporting alternative management priorities, such as bolstering the southern US contribution toward carbon sequestration and increasing forest resilience to climate change, have received growing attention (e.g., Aspinwall et al., 2012;Matallana-Ramirez et al., 2021;Maynor et al., 2021). Long-running progeny tests of loblolly pine offer promise for addressing not only these more conventional concerns, but new questions regarding other ecosystem goods and services or the response of known families to a changing environment.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
DB reviewed the historical data, participated in and supervised the field data collection in 2017, reanalyzed parts of these data, and wrote the manuscript.

FUNDING
Most of this work was funded by the USFS; additional support over the years (especially in the original study establishment and maintenance into the early 1970s) was provided by Georgia-Pacific.

ACKNOWLEDGMENTS
I would like to thank the following for their contributions to this effort: Hoy Grigsby, Warren Nance, Russ Reynolds, and other CEF staff (now all retired and deceased USFS employees) for their work in establishing, maintaining, and originally measuring the 1969 superior pine outplanting. More recently, Kirby Sneed, Rick Stagg, and Dr. Jim Guldin (all of the USFS); Jess Riddle (now with Georgia ForestWatch); and Dr. John Dennis (now of Nicholls State University) help to relocate and remeasure these trees. Nancy Koerth and Gina Franke (both of the USFS), Dr. Matt Olson (now of Stockton University), and Dr. Joshua Adams (Louisiana Tech) also aided in the development of this manuscript. Conner Fristoe and Weyerhaeuser graciously provided copies of their data on this progeny test. This manuscript was written and prepared by a U.S. Government employee on official time, and therefore it is in the public domain and not subject to copyright. The findings and conclusions in this publication are those of the author(s) and should not be construed to represent an official USDA, Forest Service, or U.S. Government determination or policy. Original (pre-2017) information on CEF progeny tests gathered and analyzed by a number of persons, including Hoy Grigsby and Warren Nance.