Fitness Estimation for Ecological Studies: An Evaluation in Columbian Ground Squirrels

The topics of evolution by natural selection and ecological interactions are closely intertwined. Thus, measurements of evolutionary fitness are ubiquitous in the ecological literature. As an empirical problem, the components of fitness, reproduction and survival, may be analyzed to produce fitness estimates in several ways. These can be divided into annual estimates that are most appropriate for short-term (e.g., annual) and experimental studies, and lifetime fitness estimates that are most appropriate for evaluating functional organismal traits. The latter are appropriate for comparative studies of natural selection in species with lifetimes that extend over several years. These estimates may also be particularly useful for estimating the direct and indirect components of inclusive fitness, an important topic for the evolution of cooperation. We reviewed examples of some of these alternatives from our research on Columbian ground squirrels (Urocitellus columbianus). Using empirical data, we also test the degree of correspondence of annual fitness, lifetime reproductive success, and individual fitness measures that are based on matrix methods. We conclude that correspondence of different methods is not strong, though each method appears most appropriate for different types of traits and research questions about fitness differences among trait forms.


INTRODUCTION
In his seminal book on evolutionary ecology, G. Evelyn Hutchinson described the interaction between ecology and evolution as "the ecological theater and the evolutionary play" (Hutchinson, 1965). The idea is that ecological conditions provide the selective influences on phenotypic traits that evolve through the process of natural selection (Darwin, 1859). In a population, phenotypic traits may change in frequency over time, and this change can be inferred from fitness differences, patterns of reproduction and survival that are associated with different trait forms. This evolutionary principle, that trait adaptation by natural section occurs in ecological environments, has been applied to many empirical studies, and is assumed by many more. Studies of natural selection on traits commonly measure changes in trait frequencies that occur over time, either from year to year or from generation to generation [e.g., studies reviewed by Endler (1986) and Charmantier et al. (2014)]. Such changes are produced by variations in reproduction and survival, and are used as surrogate measures of evolutionary fitness for the trait forms that individuals carry.
Empirical studies often group individuals that express alternative forms of a trait into "trait groups" or use individuals to represent a continuum of trait values (termed "trait forms" for continuous traits). Although trait fitness is typically estimated through the reproduction and survival of individuals that carry specific trait forms, It is the change in frequency of traitsnot individuals -that is the result of natural selection. In this sense, trait groups can be considered alternative adaptations that respond to environmental conditions. Good examples are studies of the influence of global changes in climate on the timing of reproductive events or other elements of a species' lifecycle (e.g., Visser and Both, 2005;Chamaillé-Jammes et al., 2006;Parmesan, 2006;Lane et al., 2012;Tafani et al., 2013;Dobson et al., 2016;Radchuk et al., 2019). An important caveat is that individuals have many traits, some of which may be genetically correlated through linkage disequilibrium or pleiotropy (e.g., Duckworth and Kruuk, 2009;Bize et al., 2017;Mullon et al., 2018). Thus, single-trait studies may not identify the traits that are the targets of selection, including important evolutionary tradeoffs, and the actual interactions with agents of selection in the environment (Falconer, 1952;Lande and Arnold, 1983;Roff and Fairbairn, 2012). In addition, the fitness of a trait form is relative to the success of alternative trait forms, as populations dynamically change. Hence, the fitness of traits is relative to what other individuals are doing at a given moment in time for a population that may be increasing or decreasing. Measuring fitness for alternative trait forms is thus complicated by the need to account for population demographics (Metcalf and Pavard, 2007).
Nonetheless, studies of single or a few traits provide a good starting place for more in-depth examinations of suites of traits and suites of environmental factors. Early studies examined the influence of environmental events on morphological traits and differential survival (e.g., Bumpus, 1899;Kettlewell, 1961;Lande and Arnold, 1983), but later studies examined both reproduction and survival (e.g., Pemberton et al., 1999). Measures that combine reproduction and survival (e.g., annual fitness; Qvarnström et al., 2006) have been applied to examine traits expressed on an annual basis such as reproductive phenology (e.g., Lane et al., 2012;Dobson et al., 2017). Other measures examine traits expressed for a lifetime, like foraging patterns (using lifetime reproductive success, LRS; Clutton-Brock, 1988;Altmann, 1991) or age at first breeding (via individual fitness from matrix methods; McGraw and Caswell, 1996). Lifetime reproductive success is not sensitive to the timing of successful reproductive events (that might occur at any point in a lifecycle), but individual fitness measures are sensitive to when such events occur (Brommer et al., 2002). The impact of this difference on studies of natural selection is currently unknown (Brommer et al., 2004;Reid et al., 2019).
Fitness is a population parameter, though it is often estimated for individuals. A variety of fitness estimates have been developed for individuals, using demographic data from populations (e.g., Clutton-Brock, 1988;McGraw and Caswell, 1996;Qvarnström et al., 2006;Viblanc et al., 2010;Rubach et al., in press). However, natural selection operates on trait groups in populations, or rather on suites of correlated traits. Thus, individuals are divided into groups, effectively sub-populations, that exhibit alternative expressions of the trait(s) of interest. When we compare these alternative trait groups (or forms, for continuous traits), a rough idea of the association of the trait form and fitness (reproduction or survival, but best is a combination of the two) can be obtained. This is a rough estimate of selection on the trait forms. One problem is that the trait of interest may be genetically correlated with other traits that constrain or magnify the change in trait forms from one generation to the next (e.g., Lande and Arnold, 1983;Price and Langen, 1992). Thus, identifying which trait forms are the actual "targets" of selection can be difficult. Another problem for inference about natural selection is that we may have little idea about the heritability of trait forms. However, given caution, it is instructive to understand which trait forms appear associated with fitness differences and which do not, remembering that selection and response to selection are two different things.
Our purpose is to compare and contrast different methods of measuring fitness, applied during the course of our long-term research program on Columbian ground squirrels (Urocitellus columbianus). We provide examples from past studies using different fitness measures, all based on empirical studies of reproduction and survival during 28 years of research on ground squirrels. In these examples, we used information about reproduction, survival, and traits of individuals to evaluate fitness of alternative trait groups or trait forms. We used annual fitness estimates in some studies, and lifetime fitness in others. In some cases, experimental treatments provided the alternative trait groups, but we also applied continuous variables to estimate trait forms. In one study, we examined direct and indirect components of inclusive fitness of cooperative behavior from helpful close kin. Herein, our goals were: first, to review our longterm studies of reproduction and survival in a comparison of annual and lifetime estimates of fitness, and lifetime reproductive success and individual fitness estimates from matrix models. We expected that annual and lifetime estimates of fitness might not correspond closely, since environmental conditions may vary among years (Lane et al., 2012;Dobson et al., 2016). Second, to test this expectation, by comparing annual and alternative lifetime fitness measures for correspondence. Different methods should correspond closely, if they give an accurate estimate of the fitness of individuals and their traits.

Field Site and Ground Squirrels
Columbian ground squirrels (Urocitellus columbianus) were studied from 1992 to 2019 in the Sheep River Provincial Park in Alberta, Canada [50 • 38 N, 114 • 39 W; elevation 1,550 m; see population #3 of Figure 1 in Dobson (1994)]. The study population was about 1.8 ha of meadow surrounded by pinespruce forest on 3 sides and the gorge of the Sheep River on the final side. The meadowland contained grasses and forbs that provided forage for the ground squirrels, and was honeycombed with burrows that were dug out each year by these semi-fossorial rodents. Natural predators included coyotes (Canis latrans), red foxes (Vulpes vulpes), long-tailed weasels (Mustela franata), golden eagles (Aquila chrysaetos), and northern goshawks (Accipiter gentilis). Columbian ground squirrels hibernate for 8-9 months each year and have a single breeding season during their short period of annual activity (Dobson and Murie, 1987;Dobson et al., 1992). Our studies have focused primarily on female ground squirrels, because they are matrilocal and many were monitored over their complete lifespans (only 3 of 338 adult females were immigrants, over 28 years). Female ground squirrels can live up to 9-10 years (an exceptional female lived to 14 years), and usually begin reproduction at 2-3 years of age (Broussard et al., 2008), though yearling females occasionally reproduce in our population (Rubach et al., in press).
Each spring, we trapped all ground squirrels within 3 days of emergence from hibernation in live traps (13 × 13 × 40 cm; Tomahawk Live Trap, Hazelhurst, WIS, United States) baited with a small amount of peanut butter. Each individual was induced to enter a cloth bag for initial handling. We weighed each ground squirrel to the nearest 5 g in the handling bag with a Pesola spring-slide balance (1000 g; Schindellegi, Switzerland), fitted untagged individuals with a pair of numbered metal ear tags (Monel #1005-1; National Band and Tag, Newport, Kentucky), and measured their head width (zygomatic arch breadth) to the closest 0.1 mm with a pair of dial calipers. Each ground squirrel was examined for sexual condition (males abdominal or scrotal testes; females for appearance of the vulva, closed, open, open and swollen, and with or without a copulatory plug), presence and abundance of fleas, and wounds. We gave each ground squirrel a unique black mark on the pelage with black dye (human hair dye; Lady Clairol Hydrience, Procter & Gamble Co., Cincinnati, OH, United States) for later visual identification. Because unmarked individuals could be quickly identified and trapped, we were able to capture, mark, and examine every ground squirrel in the population that emerged from hibernation in every year of the study.
In the spring, female ground squirrels were watched daily for characteristic reproductive behavior (and regularly trapped to check the appearance of their vulva), to determine their single annual day of estrus, during which they are typically receptive to mating males for a period of 5-6 h. Subsequently, females typically went through 24 days of gestation and gave birth underground in single-entrance "natal burrows" where the young are nursed and protected by territorial mothers (Murie and Harris, 1988) during about 27 days of lactation (Murie and Harris, 1982). Juvenile ground squirrels first emerge above ground from their natal burrows near the time of weaning. Because mothers kept juveniles in single-entrance natal burrows, we were able to both identify the mother and capture emerging juveniles, usually on their first day above ground. Juveniles were trapped (Tomahawk or other cage traps), ear-tagged, examined to identify sex, examined for presence and number of fleas, and dye-marked with unique symbols. Males typically dispersed to other populations as yearlings, or remained residents on the study site but resided away from the area of their natal nest [and thus only occasionally interacted with female kin; dispersal pattern reviewed in Neuhaus (2006)]. In contrast, females were highly philopatric and interactive, very rarely dispersing to other populations ). Thus, we were able to build lifetime records of reproduction and survival for females that could be used to estimate a variety of fitness measures.

Annual Fitness
Our fitness estimates were applied to females, because females were matrilocal  and paternity was not measured in all years of our study (Raveh et al., 2010;Balmer et al., 2019). Using the long-term data, we calculated annual fitness as the mean number of gene copies that each female had represented in the following year (Qvarnström et al., 2006). A female had one copy if she survived to the next year, plus a half copy for each of her new offspring that also survived to the following year. For the estimate, these values were added together.

Lifetime Reproductive Success
Lifetime reproductive success (LRS) was measured by simply adding up the number of offspring that each female produced over her lifetime. We did this for offspring produced near the time of weaning, when young first emerged from natal burrows. We also constructed a relative index of lifetime reproductive success of each mother compared to their peers, by regressing lifetime reproductive success for each female onto the lifetime reproductive success of her cohort. The residuals of this regression were used as a relative index (LRS rel ).

Individual (Lifetime) Fitness
We calculated individual fitness estimates (λ ind ) for each female of the population that had lived-out her entire lifespan (thus, excluding those alive at the end of the study), following the matrix approach of McGraw and Caswell (1996). For each female, we constructed a matrix that had half her reproductive output (at either weaning of offspring, or offspring that survived their first hibernation season) specified on the top row of the matrix. This represented her annual contribution to reproductive production, the other half being from the male. The matrix had ones on the off diagonal, and zeros in all other unfilled elements (for an example, see Viblanc et al., 2010). The dominant right eigenvalue of this matrix (of value λ ind ) was a growth parameter for the matrix, and was taken as the lifetime individual fitness of the female and thus also her phenotype. The method was explained in detail by McGraw and Caswell (1996).
The individual fitness measure of McGraw and Caswell (1996) required additional attention. Any estimate of fitness of an individual is relative to others in the population. Over a lifetime that may extend several years, rodent populations are well known to fluctuate (e.g., Boonstra and Krebs, 2012;Fauteux et al., 2016;Brommer et al., 2017;Bonnet and Postma, 2018). For example, our population initially grew for about 10 years, then declined by close to 50% over 2 years, and then had a relative stable and gently increasing period for about 15 years (Figure 1). During times of population increase, a female with an estimated λ ind of 1.0 does poorly compared to others in the population, since population growth (and the estimated fitness of an average female in the population) would have a λ value greater than 1.0. A female would be doing quite well, however, if the population was decreasing (population λ < 1.0). Thus, it may be necessary to adjust λ ind for changes in population size or population growth, if one wishes to compare the traits among females that experience different populations changes over their lifetimes.
We adjusted our lifetime individual fitness measure for changes in the population in two ways. First, we regressed individual fitness (λ ind ) on a measure of population change in which comparative population values were produced by specifying proportional year-to-year population changes into a matrix in lieu of reproductive fertility values, but still with values of one in the off diagonal (to create λ N , after Viblanc et al., 2010;Dobson et al., 2012). These matrices were constructed for the same years as each female's lifetime, and gave an indication of how the population, on average and over all females, was changing over time. Then λ ind values were regressed on λ N values. The residual values of this analysis adjust for changes in population size during a female's lifetime, and we added 1.0 to these to make interpretation easier (producing λ rel N ). Second, we calculated actual population growth using a Leslie matrix (Leslie, 1945) for the cohort of each female, during that female's lifetime, and then similarly regressed λ ind on λ Leslie and added 1.0 to the residuals of the regression (producing λ relL ; after Rubach et al., in press). λ rel N and λ relL were used as estimates of a female's fitness relative to her competitors, compared to the population at large and compared to her cohort, during her lifetime.

Direct and Indirect Components of Inclusive Fitness:
We estimated the direct and indirect components of inclusive fitness in two related ways. Inclusive fitness was relative to a particular trait, the presence of close kin ("genial neighbors") that demonstrably improve reproductive success and λ ind relative to changes in population size (λ N ), to produce λ rel (after Viblanc et al., 2010;Dobson et al., 2012). First, we computed λ rel N for mothers with and without co-breeding close kin and compared these values. Next, we estimated inclusive fitness from a direct component (the mean relative individual fitness of mothers without kin present) and an indirect component (averaged: relative individual fitness for relatives present minus the mean fitness of mothers without kin present, times the degree of relatedness). These estimates were calculated based on the number of weaned offspring and included as kin only those relatives that appear to be recognized as close kin in the field (King, 1989). "Uterine kin" (viz., mothers, daughters, and littermate sisters) recognize one another via social familiarization in the natal nest-burrow (Hare and Murie, 1996). Other females were classed as distant and non-kin, and used for comparison to close kin. Further details of inclusive fitness calculations can be found in Dobson et al. (2012).

Fitness Estimates in Columbian Ground Squirrels: Review From a Long-Term Study
Annual Fitness Estimates Raveh et al. (2015) conducted an experiment on reproductive Columbian ground squirrel females that involved the removal of fleas (Oropsylla spp.) using a spot-on pet insecticide. The experiment artificially created the trait groups "with natural flea loads" and "with no fleas, " and thus tested for a fitness cost to parasitism. The results indicated no significant difference between treated and untreated mothers using the Qvarnström et al. (2006) annual fitness measure (respectively; 1.13 ± 0.13, n = 28; and 1.06 ± 0.14, n = 26; mixed model, x 2 = 0.06, df = 1, P = 0.80). In addition, they found no significant difference in the annual fitness measure for mothers and young that occupied nests that were infested with fleas during lactation from nests that were not infested (respectively; 1.25 ± 0.16, n = 20; and 1.08 ± 0.14, n = 26; mixed model, x2 = 1.29, df = 1, P = 0.26). The conclusion of the study was that the removal of ectoparasites was not a significant influence on annual fitness, nor on several other measures related to maternal fitness. The Qvarnström et al. (2006) method was an appropriate measure for the experimental contrast and natural comparison of the ectoparasite "treatments, " and two trait groups (viz., with and without parasites) were used in both comparisons. This method tracks the number of gene copies that were passed on in the population from 1 year to the next: one complete copy if the mother survives, and a half copy of her genes passes on through each of her male and female offspring. Viblanc et al. (2016) applied annual fitness to a comparative network study of aggression by adult female Columbian ground squirrels. There were no significant differences in fitness among reproductive females according to the aggression that they received from other mothers. But aggression directed toward non-close kin was 2.3 times greater than aggression directed toward reproductive close kin [randomized network analysis, P < 0.001; Figure 3 in Viblanc et al. (2016)]. Mothers that most commonly committed aggression toward other females had significantly greater annual fitness [randomized network analysis, P = 0.004; see Figure 4 in Viblanc et al. (2016)]. The trait forms were different levels of committed aggression (chases and fights). Here, the Qvarnström et al. (2006) method was an appropriate measure of network analyses from a single year of behavioral data, but in this case the trait form was a continuous axis.
Annual fitness can also be applied to longer-term studies, such as those examining influences of changing climate. In Columbian ground squirrels, variations in spring and summer climates have significant influences on annual fitness (Lane et al., 2012;Dobson et al., 2016). Over a 20-year period, when adult females emerged from hibernation earlier, their annual fitness was greater (Figure 2). For this study, the trait forms were different dates of emergence from hibernation, a variable that was repeated for most individuals among years. Emergence from hibernation was influenced by snowmelt, and lower fitness ensued when spring melt-off of snowpack was later (Figure 3). Additionally, dry hot conditions during summer also had a strong negative influence on fitness (Figure 4). These seasonal climatic influences on the fitness of adult females produced strong influences on population size as well, so that a year of especially late spring melt-off of snow and hot and dry conditions in summer were associated with a nearly 50% decline in the population . In these studies, annual fitness was an appropriate index for comparisons, in part because annual events were studied. The Qvarnström et al. (2006) method has been used in a similar manner in several comparative and experimental field studies (e.g., Arnaud et al., 2013;Lane et al., 2015;Hoogland and Brown, 2016;Lane et al., 2019).

Lifetime Reproductive Success and Individual Fitness Estimates
Interest in the individuals that carry traits often leads to comparisons of the lifetime reproductive success of individuals with different phenotypic values of traits. But natural selection applies to traits and combinations of traits, rather than individuals per se. The frequencies of trait groups change over time when natural selection occurs. Thus, rather than the number of offspring or even grand-offspring for measuring changes due to natural selection (Brommer et al., 2004;Reid et al., 2019), the most useful measure is the growth rate of trait forms or associations of trait forms among generations. The most commonly used measure of increase in trait forms is lifetime reproductive success (e.g., Clutton-Brock, 1988;Grafen, 1988;Merilä and Sheldon, 2000;Jensen et al., 2004;Descamps et al., 2006;McLoughlin et al., 2007).
The growth of trait groups can also be estimated by calculation of individual fitness over the lifetime of individuals in the population (McGraw and Caswell, 1996). This method uses matrix algebra (classically used for estimating population growth) to estimate the increase in trait forms from changes in the frequency of phenotypic traits. The logic of doing this follows a long history of measuring fitness from the intrinsic growth rate of individuals that carry different trait forms (Stearns, 1992;Roff, 2002). A major advantage of the individual fitness approach is that it is sensitive to the timing of reproduction, so that offspring produced early in a mother's life contribute more to fitness than those produced later (Brommer et al., 2002).

Age at first reproduction
Demographic theory suggests that offspring produced early in life, if they carry a particular trait form, may themselves begin to reproduce earlier, thus contributing to further spread of the trait form over time (Stearns, 1992). Individual fitness was devised for an examination of age at reproductive maturity in Sparrowhawks (Accipiter nisus) and blue tits (Parus caeruleus) (McGraw and Caswell, 1996). For Columbian ground squirrels, a similar exploration of individual fitness revealed greater individual fitness values for individuals that begin to reproduce successfully at an earlier age (Rubach et al., in press). This was done with a calculation similar to that of McGraw and Caswell (1996). The number of weaned offspring was used to estimate reproductive success. Regression of λ ind [as in McGraw and Caswell (1996)] on λ Leslie (cohort growth) showed that an adjustment for changes in population growth was needed (R 2 = 0.52, F = 52.6, P < 0.0001, n = 148). Thus, Rubach et al. (in press) used the residuals of this regression (λ relL ) to estimate relative individual fitness. λ relL differed significantly among females that first reproduced at ages 1, 2, and 3-5 years old (Figure 5). Lifetime reproductive success, however, showed no significant difference among the trait groups  (i.e., females that first reproduce at ages of 1, 2, and 3 and above). Statistically comparing lifetime reproductive success for mothers to that of their peers (i.e., their cohort lifetime reproductive success, to produce LRS rel ) produced little difference among females that first reproduced at different ages. Additionally, an earlier study that used lifetime reproductive success as a fitness measure found no significant difference between females that first reproduced at ages 2 and 3 (Neuhaus et al., 2004).
Examples that examine age at maturity have presented the strongest case for using individual fitness estimates, since the age at which reproduction begins has a strong influence on population growth (Cole, 1954;Oli and Dobson, 2003). Examples of fitness differences that appear to favor early breeding include blue tits and sparrowhawks (McGraw and Caswell, 1996), Ural owls (Brommer et al., 1998), wood ducks (Aix sponsa) (Oli et al., 2002), and yellow-bellied marmots (Marmota flaviventris) (Oli and Armitage, 2003). Oli and Armitage (2008), found that female marmots that delayed breeding suffered a loss of inclusive fitness, even when direct fitness was augmented by indirect fitness benefits from reproduction of close kin. The attempt of Rubach et al. (in press) to compare the lifetime reproductive success of individual females to that of their cohorts (LRS rel ) yielded no beneficial insights to simple use of lifetime reproductive success. Thus, based on present evidence, no measure of fitness, whether annual or over the lifespan, is certain to provide an accurate description of natural selection or evolutionary response to natural selection. For long-term studies that examine complete lifespans, a cautious approach might be to apply a relative measure based on individual fitness and either on changes in population size or on population growth rate.
Relative individual fitness (e.g., λ relL or λ rel N ) takes changes in population size into account, and gives an estimate of the growth rate of the different trait groups. By asking how fitness differed among the trait groups, Rubach et al. (in press) assumed that females that reproduce at different ages have a trait that can be passed on to future generations: this is the assumption of genetic variation and heritability. While this idea might be challenged, the whole point of looking at a fitness measure is the search for an evolutionary advantage. If a trait undergoes selection, but exhibits no response to selection (viz., due to limited heritability, antagonistic pleiotropy, or genetic correlations with other traits) the results are perhaps less interesting. Additionally, when the number of offspring that survived until their first possible reproductive season was used to estimate reproductive success, the advantage for earlier reproduction by mothers was not quite significant, though it still had a small to medium effect size. Production of the next generation is meaningful for natural selection in terms of offspring that themselves survive to reproduce in the next generation (e.g., Boyce and Perrins, 1987). Naturally, the different trait groups might vary due to environmental factors or random variations in resource acquisition over time, so it is important to remember that offspring may not express the trait forms of their parents, particularly for phenotypically plastic traits, where expression might be influenced by the environment.

Kin Selection and Inclusive Fitness
Individual fitness measures are most appropriate for traits that are expressed once during an individual's lifetime, as is the case with many developmental traits. Age at maturity is one such, but many traits of temperate species are expressed on an annual basis (e.g., litter size, phenology of reproduction, seasonal cycles in activity or body mass, etc.). Nonetheless, some evolutionary characteristics may vary during an individual's lifetime, but the primary interest is in the cumulative effects of the social or ecological environment on fitness. Inclusive fitness is an example FIGURE 6 | Estimated inclusive fitness and number of co-surviving close kin (mother-daughter and littermate sister dyads) that were both of reproductive age (2 years old and older) and actively reproducing at the same time (r = 0.425, n = 35, P = 0.005; data from 1992 to 2008) [used with permission, from Dobson et al. (2012)].
of an advantageous phenomenon (the presence of cooperative and reproductive close kin that augment maternal fitness) that can accumulate over a lifespan. Here, the trait is usually some sort of behavioral cooperation with close genetic relatives, so that kin selection is a possible influence on the behaviors (Hamilton, 1964). In this case, inclusive fitness (an individual's total fitness) has two components: fitness accrued by an individual in the absence of help from genetic relatives (the direct component) and fitness accrued from the help that the individual gives to genetic relatives (the indirect component).
Our examination of possible kinship advantages in fitness terms began with an examination of whether there was a difference in the direct fitness component between female Columbian ground squirrels that reproduced in the presence of close kin and those that had no co-reproductive close kin with which to cooperate (Viblanc et al., 2010). The form that cooperation took was lowered aggression (viz. greater tolerance) during co-reproduction of adult females and their mother, littermate sisters, and daughters (King, 1989;Viblanc et al., 2016). The numbers of co-breeding close kin females per year (during the reproductive lifespan) were the trait forms. We found a significant association of the number of co-breeding close female kin and number of offspring at weaning, though at a medium effect size (estimated from path analysis of λ rel N ; ρ = 0.29, P = 0.01, n = 70). In turn, litter size was highly significantly associated with relative individual fitness (ρ = 0.79, P < 0.0001, n = 70), resulting in a small-tomedium indirect effect of number of close kin on fitness (indirect path coefficient, ρ = 0.23, significant when both direct coefficients are significant, Cohen, 1988). Viblanc et al.'s (2010) use of relative individual fitness (λ rel N ) was necessary because individual fitness (λ ind ) was significantly associated with changes in population density (estimated from λ N ; R 2 = 0.33, F = 33.387, n = 70, P = 0.001, n = 70).
To extend our study of kin effects, we estimated inclusive fitness, to include the indirect component in addition to the direct component previously calculated . Our goal was to evaluate the possible importance of an indirect component to inclusive fitness in Columbian ground squirrels. Mothers were classified as having co-breeding close kin versus not having close kin during their reproductive lifespans, so the alternative trait groups were the presence or absence of potentially helpful close kin. The number of weaned offspring was used to estimate female reproductive success, and the analyses used an adjustment for changes in the population during a female's lifetime (in this case, λ ind was regressed on λ N and 1.0 was added to the FIGURE 7 | Comparison of annual fitness values for each reproducing female, and her lifetime adjusted individual fitness (measured as λ rel ) and lifetime reproductive success (both lifetime estimates calculated from number of offspring at weaning). For annual fitness, all mothers 3-years old (n = 101), 4-years old (n = 80), 5-years old (n = 59), and 6-years old (n = 32) were included. residuals, to produce λ rel N for each female). Relative indirect fitness accounted for over 40% of a mother's inclusive fitness, a substantial and significant amount (0.43 ± 0.08 SE, t = 5.71, d.f. = 28, P < 0.0001). In addition, as the number of close cobreeding close relatives increased, a mother's inclusive fitness increased significantly (Figure 6). The analyses were greatly facilitated by use of the relative individual fitness approach.

Empirical Comparison of Annual Fitness, Lifetime Reproductive Success, and Individual Fitness
The use of annual and lifetime fitness estimates made us question how closely these estimates correspond. To compare annual and lifetime measures, we used samples of females when they were of different prime breeding ages, namely 3, 4, 5, and 6 years old (respectively; 101, 80, 59, 32 mothers). Of course, these values for individual females were not independent and were from different years over the 28-year study period. The number of offspring at weaning was used to estimate reproductive success for these subsequent analyses. We compared the annual fitness values for the females in each age group separately to the relative lifetime individual fitness estimates (λ rel N ) and to the lifetime reproductive success of these same females with correlations (Figure 7; e.g., a single datum would be an annual value for a 3-year-old female and her lifetime fitness, the latter estimated by relative individual fitness λ rel N or estimated by lifetime reproductive success). The values of the correlations were used as indications of effect size (Cohen, 1988) for the similarities of annual and lifetime values, with small (r = 0.10), medium (r = 0.30), and large effects (r = 0.50). Similarity of estimates of the association of annual and individual fitness (unadjusted) were consistently between medium and large, averaging around 40% (r = 0.380). Similarity of annual fitness and lifetime reproductive success also averaged around 40% (r = 0.423), but were much more variable. The annual estimates of fitness were meant to reveal the influence of an experiment or annual comparisons.
Since year-to-year variations in the environment occurred and these variations might well average out during an extended lifetime (9-12 years for 6 of the females in our sample of 101 mothers), a modest effect size might have been expected.
When relative individual fitness (λ rel N ) was compared to lifetime reproductive success, the correlation was significant but fairly moderate (Figure 8; r = 0.345, n = 132 mothers, t = 4.186, P < 0.0001). When the cohort Leslie matrix was used to estimate relative individual fitness (λ relL ), however, the correlation with lifetime reproductive success was very low and insignificant (r = 0.016, t = 0.184, P = 0.85). The two estimates of relative individual fitness (λ rel N and λ relL ) were strongly associated (r = 0.817, t = 16.15, P < 0.0001). The lack of strong association of the relative indices of individual fitness and lifetime reproductive success indicate that the widespread use of lifetime reproductive success as a fitness measure might have to be reconsidered. On the other hand, both individual fitness and lifetime reproductive success were equally moderate predictors of the number of maternal gene copies passed on to future generations in collared flycatchers and Ural owls (respectively, Ficedula albicollis, Strix uralensis; Brommer et al., 2004), and in a study that included both male and female gene copies in song sparrows

CONCLUSION
In conclusion, when conducting experiments or looking at annual events, Qvarnström et al.'s (2006) annual fitness method seems to be a good tool that takes both reproduction and survival into account. Annual events were illustrated by the experimental ectoparasite (flea) removal treatment study (Raveh et al., 2015) and response to annual climatic conditions (Lane et al., 2012;Dobson et al., 2016). However, this method may be less appropriate for traits that occur as part of the ontogenetic sequence of events during the lifespan, or for judging longerterm success for conditions that have a cumulative influence on fitness. These latter cases were illustrated by the study of age at maturity (Rubach et al., in press), and the studies of kin selection (Viblanc et al., 2010) and inclusive fitness . We summarize when we think these different estimates might be most appropriate in Table 1. Between the two ways that individual fitness (λ ind ) might be adjusted for changes in population dynamics (λ rel N and λ relL ), we prefer λ rel N . This measure is based on changes in population size over time, and includes all competitors (in the present case, all females) in the population. By contrast, λ relL compares a reproductive female only to others in her cohort, a subset of the population. Yet over their lifetime, individual in numerous species are exposed not only to individuals from the same cohort, but to overlapping generations of multiple cohorts.
Thus, a fair comparison should contrast a particular individual against all individuals of the population over her lifetime. In any case, the choice of empirical fitness measures should be carefully considered, the one most appropriate to the research question, and amenable for the species under study (e.g., long vs. short-lived).
Other methods for estimating fitness are not directly applicable to the problem of comparing trait forms in empirical research, but may hold promise for future improvements. Perhaps the most attractive is the use of offspring from a pedigree, perhaps even including both male and female relatives (Brommer et al., 2004;Reid et al., 2019). An alternative to our use of changes in population size to adjust for environmental variation might involve fitness measures that take demographic and environmental stochasticity into account (e.g., Benton and Grant, 2000;Engen et al., 2009;Saether and Engen, 2015). These measures have not yet been applied to alternative trait forms. Finally, methods that examine the comparative sensitivity of population growth to reproduction and survival (e.g., Dobson and Oli, 2001;Oli and Dobson, 2003;Coulson et al., 2006) might be used to answer similar questions about individual fitness.

DATA AVAILABILITY STATEMENT
The datasets generated for this study are available on request to the corresponding author.

ETHICS STATEMENT
The animal study was reviewed and approved by the Institutional Animal Care and Use Committee, Auburn University.

AUTHOR CONTRIBUTIONS
FD and VV designed the study. FD wrote the manuscript. FD, VV, and JM collected the data and revised drafts of the manuscript. All authors contributed to the article and approved the submitted version.

IMPACT STATEMENT
Evolutionary biology studies how natural selection operates on traits and combinations of traits by comparing differences in fitness for individuals that exhibit different trait forms. Thus, how fitness is measured is a key issue for every evolutionary study. We used examples from our past research on Columbian ground squirrels to compare and contrast methods of measuring fitness. These include short-term, usually annual measurements, and alternative measures of lifetime reproduction. In particular, lifetime reproductive success and "individual fitness" measures (based on matrix methods) have provided conflicting results in past studies. In direct comparisons using our long-term, 28-year data set, alternative methods exhibited moderate but disappointing associations with one another. Methods to estimate fitness must be carefully chosen and considered with caution.

FUNDING
The long-term research was funded through successive collaborative grants, including a Natural Sciences and Engineering Research Council of Canada grant to JM, a National Science Foundation grant (DEB-0089473) to FD, a post-doctoral research grant from the AXA Research Fund to VV, a Fyssen Research grant to VV, and a CNRS Projet International de Coopération Scientifique grant (PICS-07143) to VV. We thank the Institute of Advanced Studies of the University of Strasbourg for their financial support through an USIAS fellowship for FD, and the Région Grand Est and the Eurométropole de Strasbourg for the award of a Gutenberg Excellence Chair to FD during the writing of this project.

ACKNOWLEDGMENTS
Our field research on Columbian ground squirrels has been assisted by many volunteers, under-graduate and graduate students, and technicians. We express our warmest thanks to all of them. We thank the Biogeosciences Institute of the University of Calgary for providing housing and laboratory space in the field, especially Director E. Johnson, Station Managers J. Mappin-Buchannan and A. Cunnings, and K. Ruckstuhl (faculty organizer and researcher at the R.B. Miller Research Station). Alberta Ministry of Environment and Paris, Division of Parks provided permits for research in Sheep River Provincial Park. Fish and Wildlife Division provided permits for research, as well as capture, marking, and release of wild Columbian ground squirrels. All field work occurred under several approved protocols by the Institutional Animal Care and Use Committee of Auburn University, and co-acceptance of these permits by the University of Calgary. Discussions with colleagues J. E. Lane, P. Neuhaus, and S. Raveh have greatly furthered our understanding of many aspects of ground squirrel biology. Two reviewers provided helpful suggestions on the manuscript.