Elevational Gradients as a Model for Understanding Associations Among Temperature, Breeding Phenology and Success

Climate change is associated with advancing phenology of seasonal traits in many taxa, but shifts by higher trophic levels are generally reduced compared with those of lower trophic levels. For example, the eclosion date of caterpillars and the lay date of insectivorous passerine birds have both advanced recently, but the former has done so more than the latter. While the ensuring phenological mismatch between predator and prey is well-documented, our understanding of the origins of this mismatch is more limited. Here we shed light on the interplay between ambient temperature, breeding phenology and reproductive success in a single population of blue tits (Cyanistes caeruleus) nesting over a 1,000 m (∼5°C) elevational gradient in the French Pyrenees. During the 6 years of this study, we found that average breeding phenology varied by 2–9 days among years, but was on average 11 days earlier at low versus high elevation. Despite the delay, breeding at high elevation was associated with lower and more variable temperatures during breeding. Early breeders within a given year generally had larger clutch sizes than late breeders, which led to more offspring fledged as typically found in other studies. However, in three of the 6 years, the probability of producing fledglings was actually lower among early layers. Additionally, birds breeding at high elevations who experience conditions typical of early breeders in other populations had reduced hatching success and were significantly less likely to fledge any young compared with those breeding at lower elevation. Reduced success at high elevation was not obviously driven by higher nest predation, which was exceptionally low, or reduced food availability because high elevation birds laid clutches of comparable size and fledged the same number of offspring of comparable mass as those breeding at low elevation. Our study reveals the capacity for substantial variation in breeding phenology within a population, but that the success of early breeders varies across years and temperature gradients. We suggest that the evolution of phenological advancements by small endotherms might be curtailed by increased probability of experiencing, and failure under, challenging meteorological conditions in late winter or very early spring.


INTRODUCTION
Recent meta-analyses show that organisms of diverse taxonomic groups are responding to earlier springs by advancing the timing of key life events (Thackeray et al., 2010(Thackeray et al., , 2016. However, there appears to be variation in the extent of advances across trophic levels, with lower trophic levels advancing their phenology more than higher levels (Both et al., 2009b;Thackeray et al., 2010). A classic example is seen in invertebrates and insectivorous birds breeding in northern temperate latitudes, with invertebrate prey advancing their date of emergence more than predatory birds are advancing their lay dates (Visser et al., 1998;Both et al., 2009a,b). This differential in phenological change leads to the well-documented concept of phenological mismatch, with predators increasingly rearing their offspring after the peak of prey availability (e.g., Durant et al., 2007;Visser et al., 2012). However, why this mismatch should arise is not clear. One possibility is that, with their faster generation times, prey are able to evolve adaptive responses to advancing springs more rapidly than predators with longer generation times (Cushing, 1969;Visser and Both, 2005). Another possibility, however, is that endothermic predators, such as birds, are constrained from advancing breeding phenology to the same extent as their invertebrate prey because they suffer more directly and/or indirectly from cold weather (Visser et al., 2015). While climate is warming and springs are advancing, weather can be prohibitively cold early in the year because day lengths are shorter, resulting in weakened selection for ever-advancing breeding phenology in endothermic predators. Testing this 'environmental constraints' hypothesis requires investigating the interplay between temperature, breeding phenology and success which can be challenging in natural settings.
For example, the obvious way of exploring this interplay is to investigate the relationship between breeding phenology and success throughout a breeding season, but such an approach is not valid. This is because although early breeding should be associated with colder weather, it will typically be associated with a closer match to peak prey availability (Verhulst and Tinbergen, 1991;Winkler and Allen, 1996;Verhulst and Nilsson, 2008;Emmenegger et al., 2014), which will confound the expected positive relationship between temperature and breeding success under the environmental constraints hypothesis. Indeed, early breeding is typically associated with increased, not reduced, success (Kluyver, 1951;Verhulst and Tinbergen, 1991;Barba et al., 1995;McCleery and Perrins, 1998). An alternative approach is to compare the relationship between phenology and success among years that vary in average temperature, although the magnitude of any effect measured is contingent upon the magnitude of inter-annual variation in temperature, which is often modest. Nevertheless, long-term longitudinal studies capturing sufficient inter-annual temperature variation provide some evidence for the environmental constraints hypothesis. For example, a 24-yearlong study in pied flycatchers (Ficedula hypoleuca) showed that low temperatures during early breeding are associated with reductions in fledging success (Moreno et al., 2015). Similarly, in blue tits (Cyanistes caeruleus) low temperatures during egg-laying was linked to hatching delays and reduced breeding success (Kluen et al., 2011). Whilst such longitudinal studies clearly provide important insights into the associations among ambient temperature, breeding phenology and success, their duration also means that the results are likely to be driven by a combination of plasticity and adaptation (Charmantier et al., 2008;Ramakers et al., 2019). Further, the inevitable time taken to establish such studies coupled with the pressing need to understand such relationships in a time of rapid climate change, provides an incentive for alternative approaches.
One complementary approach might be to use elevational gradients within a single population of a given species as a means of investigating temperature effects on breeding phenology and its impacts on metrics of breeding success. Such an approach can work in principal because temperature declines linearly with elevation: ∼0.65 • C for every increase in elevation of 100 m, but day length stays constant across the gradient. In accordance, recent meta-analytical (Boyle et al., 2016) and survey-based (Saracco et al., 2019) approaches demonstrate that avian breeding phenology is delayed at higher elevations. However, a potential problem is that habitat structure and invertebrate prey abundance might also vary across extensive elevational gradients and do so non-linearly (Körner, 2007;Nice et al., 2019) confounding the ability to surrogate temperature through elevation. To reduce the impact of these potential confounds we need a targeted elevational approach that provides representative variation in temperatures expected under climate change, but minimizes systematic variation in other significant ecological parameters, principally habitat type, cover from predators and food types or abundance. However, few previous studies have used such a targeted elevational approach, meaning effects might often be influenced by significant habitat differences, day length or local adaptation (Boyle et al., 2016).
Here we used a targeted elevation approach in the French Pyrenees to investigate the associations among elevation, breeding phenology and success in a nest box population of blue tits (C. caeruleus) across 6 years. Our approach is targeted in two ways. First, the ∼600 nest boxes are located within 5 main areas of contiguous habitat and were within 16 km of each other (median = 5 km). This distance is well within the known dispersal distance of blue tits (Paradis et al., 1998) and indeed we have observed several instances of among-site dispersal. Thus, our nest boxes encapsulate a single breeding population of blue tits. Second, the elevational gradient is a relatively modest 1,000 m and critically stops at 1,530 m a.s.l, ∼300 m below the upper end of the deciduous tree line in the French Pyrenees. While this range is sufficient to generate a ∼5 • C difference in mean daily (24 h) between low and high elevations, it is insufficient to have a major impact on habitat. For example, the habitat is characterized by mixed deciduous woodland across the elevation gradient with no obvious differences in tree height, and all species are represented at all elevations, although there is a shift from oak (Quercus robur) domination to beech (Fagus sylvatica) domination with increasing elevation (Bründl, 2018). Finally, observations of nestling provisioning show that feeding rates marginally decrease but that prey type delivered is comparable across the elevational gradient (Bründl, 2018). Thus, our available evidence suggests that our elevation gradient can be used as a natural experiment to investigate the influence of temperature variation on breeding phenology and success, without significant confounds of local adaptation or ecology.
The blue tit is a short-lived, small passerine bird with high fecundity (Perrins, 1979). Previous longitudinal studies have shown that they adjust lay date in response to spring temperatures and that clutch size and fledging success generally decline with delayed phenology (e.g., Haywood, 1993;Lambrechts et al., 1996;Källander et al., 2017;Shutt et al., 2019). However, cold temperatures during egg-laying have been shown to reduce the success of early breeders through reduced hatching success and lower nestling body mass in a Finnish population of blue tits, providing some evidence of the environmental constraints hypothesis (Kluen et al., 2011). First, we describe elevational (1,000 m) and annual (2012-2017) variation in breeding phenology, and its effects on average temperatures experienced during incubation and nestling provisioning. Second, we investigate the associations among year, elevation and lay date, on clutch size and hatching success. Finally, we test the effects of each on fledging success and nestling mass. The environmental constraints hypothesis predicts that advanced breeding phenology is associated with reduced temperatures during breeding, and that as a consequence metrics of breeding success will be reduced among early breeders in some years particularly at high elevations.

Study Population and Habitat
Climate and reproductive data were collected near the research Station for Theoretical and Experimental Ecology of Moulis (SETE, UMR 5321; 42 • 57 29 N, 1 • 05 12 E), in the French Pyrenees during the breeding seasons 2012-2017. Overall, our 14 woodlots within 5 main sites contained a total of 626-641 Woodcrete Schwegler TM 2 M nest boxes (32 mm entrance hole diameter) per year spaced at ∼50 m intervals from 430 to 1,530 m elevation (Figure 1). The median pair-wise distance between woodlots was 5 km (range = 0.6-16 km). All woodlots are connected by a contiguous mosaic of mixed deciduous woodland, primarily oak (Quercus robur), ash (Fraxinus excelsior), hazel (Corylus avellana), and beech (Fagus sylvatica), with the former three species being more common at lower elevations and beech at higher elevations. Temperature data was recorded from the 2015 breeding season at three locations across the elevational gradient (565, 847, and 1,335 m a.s.l.) using Tinytag TM loggers (TGP-4500 and TGP-4505) positioned on tree trunks 2 m from the ground. This temperature data, which was recorded every 30 min, allows us to clarify temperature differences during incubation and nestling as a function of lay date across the elevational gradient. We, therefore, use the temperature data to validate the utility of using elevation gradients as a means of examining potential associations between temperature and breeding parameters. But we did not analyze detailed impacts of temperature per se as it is limited to just three sites in three years and is highly correlated with elevation. Precipitation was not included as it is not expected to decrease in a linear fashion with elevation (see Körner, 2007). Overall, temperature decreased by an average 5 • C along the elevational cline throughout the breeding season (see section "Results" for specific details).

Breeding Phenology, Investment and Success
We recorded lay date, clutch size, hatching failure and fledging success in all years (2012)(2013)(2014)(2015)(2016)(2017). Each of these parameters was known with precision owing to nest checks every 3-5 days, which increased to daily during critical periods. These critical periods were before the onset of laying for recording lay date, from the sixth egg to clutch completion to determine clutch size and the start of incubation, from day 11 of incubation to determine hatching and from day 18 after nestling hatching to determine fledging success. Our blue tit population is single brooded, although pairs are known to make a second nesting attempt if the initial brood is abandoned early in the season (personal observations). No differentiation between first and any second attempts was possible, since blue tits are known to also use natural cavities in our population. Nevertheless, abandonment is uncommon overall (10% of 535 nesting attempts) and is uninfluenced by elevation (Supplementary Figure 1). In this study, we removed the 16 nesting attempts that abandoned before the onset of egg-laying from all analyses, and removed the 41 that abandoned during incubation from analyses of hatching and post-hatching analyses. As these latter cases were late in the season, they presumably occurred in response to declining food supplies. Thus the total number of hatchlings was determined as the number of eggs that hatched successfully from clutches wherein at least one hatched. The total number of fledglings was estimated as the number of nestlings at ringing (ca. day 15) minus those found dead after the rest of the brood fledged. Starting in 2013, all nestlings were weighed to the nearest 0.1 g (days 11-18 after hatching) using electronic scales. Our full data set comprised 519 blue tit nests for which lay date was known with precision and a full clutch of eggs was laid. However, this sample was reduced in subsequent analyses owing to rare cases of missing observations, the use of some nests in experiments for other purposes and nest abandonment. In 2013-2014, 58 experimental nests were excluded from the clutch size analysis, as we modified egg-laying in these nests (N = 461 remaining). However, this manipulation did not affect subsequent breeding parameters, since variation in the number of eggs incubated and hatchling numbers were returned to natural levels at incubation onset ). The precise sample size for each analysis is provided below.

Statistical Analyses
Statistical analyses were performed in the R environment, version 3.5.1 (R Core Team, 2018). Distributions of dependent variables and model residuals were visually inspected for normality. Normal response terms were analyzed using linear models (LMs) in the standard 'stats' package (R Core Team, 2018). If the data were non-normal, generalized linear models (GLMs, package = MASS; Venables and Ripley, 2002) were used adjusting residual variance structure accordingly, i.e., the error distribution family and link function (see SOM tables of each analysis;  Thomas et al., 2013). Fitting nest box identity as random terms in the models explained none of the variance and had no qualitative impact on the results (see Supplementary Table 1), presumably because the number of nest boxes far exceeded those occupied by blue tits and the inter-annual survival and philopatry of breeders were low [21% for banded females returned to breed in a nest box in a subsequent year (mean number of breeding attempts = 1.2 per banded female, maximum = 4)]. To test the effect of the random term we used corrected Akaike's Information Criterion (AICc -for finite sample sizes) set to a Delta of two (Zuur et al., 2009). Nest box identity was thus removed as a random term from all the subsequent, non-mixed models. All models underwent checks for overdispersion and heteroscedasticity of residuals (Zuur et al., 2009). Collinearity among explanatory terms was tested using a variance inflation factor (VIF) analysis (package = car; Fox et al., 2018) which if above 3-7 degrees indicates biased high contribution of a variable to the standard error of a regression, i.e., multicollinearity (Zuur et al., 2010;Dormann et al., 2013). However, the VIF between the main potential collinear terms of lay date and elevation was low (1.22) and thus both were included as continuous variables in the same models. Non-mixed model selection was based on changes in deviance between full models and models excluding each factor using the ANOVA function in R (significance set at α < 0.05) (Zuur et al., 2009).
Overall, we conducted six basic models pertaining to: breeding phenology (lay date); clutch size; hatching success; the probability that at least one nestling fledged; the number of nestlings fledged from successful nests and mean nestling mass per brood. In all models, we fitted lay date (except in the lay date analysis), elevation, and year as the primary fixed terms of interest, as well as two-way interactions including lay date and/or elevation, year and clutch size (see S2-S7 for more details). Although elevation was fitted as a linear predictor in all statistical models, we sometimes split the elevational gradient into three elevational ranges in figures to facilitate visualization and interpretation only (see Figures). The three categories -low (430-633 m), mid (702-904 m), and high (923-1,530 m) elevations -were determined where the greatest gaps in elevation between occupied nest boxes were observed (see also Schöll et al., 2016), and corresponded to the location of the temperature data loggers (central in each elevational range). We included the possibility of a nonlinear (2nd order polynomial) main effect of lay date since the success of very early and late nests might be expected to be compromised, but it was never significant (Supplementary  Tables 3-7). However, elevation was included only as a linear term as we have no clear predictions about non-linear effects and visualization of raw data suggested that no non-linear patterns between elevation and y parameters were likely to be present.
First, we investigated how breeding phenology (lay date) varied with elevation and year (N = 519). Second, we analyzed how clutch size was affected by lay date, elevation and year (N = 461), including both the separate effects of elevation and lay date on clutch size and the interaction between the two variables. LMs with normal error structure were applied for both lay date and clutch size analyses. To investigate the probability of hatch failure, i.e., whether or not nests failed to hatch any eggs, we applied a GLM with binomial error structure (N = 476). In this model, the number of eggs incubated was fitted as a covariate since the area of large clutches exceeds the area of the brood patch, making them more challenging to incubate (Haftorn, 1983;Engstrand and Bryant, 2002;Niizuma et al., 2005). Fledging success was investigated as a two-step process: first by investigating the factors associated with the probability of fledging at least one nestling (excluding nests with no hatchlings; N = 438), and second, for those that did fledge at least one offspring, the factors influencing the number of nestlings that fledged (N = 369; 16% of the 438 nests failed to fledge young). This two-step process was performed because alternative zeroinflated methods failed to converge when the interactions central to the question were included. Finally, we also investigated factors affecting mean nestling mass per brood in a LM (N = 345 broods with 2,230 nestlings). In addition to the primary predictors of interest (lay date, elevation and year), linear and quadratic effects of brood age and brood size were added as covariates. We fitted brood age rather than linear predictors, such as tarsus, since age was known with precision, and tarsus length is itself a partly condition-dependent trait (Merilä and Fry, 1998).

Elevation and Year Effects on Phenology and Consequences for Temperatures During Incubation and Nestling Provisioning
Over the 6 years of study, clutches were initiated between 27 March and 11 June, with a mean of 16 April [±10 days (SD), N = 519 total breeding attempts; Table 1]. Some late nesting attempts are likely to be explained by re-clutching following rare early abandonment or failure, but blue tits are not doublebrooded in our population. Both elevation and year had a significant impact on average breeding phenology (elevation: F 1,512 = 184.96, P < 0.001; year: F 5,512 = 32.31, P < 0.001; Supplementary Table 2). More specifically, the mean lay date was 13 April at low elevations (430-633 m) (±7 SD), but averaged 5 days later at mid elevations (702-904 m) (18 April ±12 SD), and 11 days later at high elevations (923-1,438 m) (24 April ±15 SD) (Figure 2A). Similarly, for example, lay dates were an average of 7 days earlier in 2017 and 5 days later in 2013 than the overall mean of the population across all years (Table 1). Finally, there was a significant interaction between year and elevation on lay dates, with lay date being delayed to a greater extent at high elevation in some years (e.g., 2013) than others (e.g., 2017) (F 5,507 = 8.46, P < 0.001; Figure 2B).
Early and high elevation breeding were associated with reduced and more variable temperatures during incubation and nestling provisioning. For example, at low elevation, early breeders, as opposed to late breeders, experienced average day-time temperatures (7 am -7 pm) that were ∼2 • C lower during both incubation (∼12 vs. ∼14 • C; Figure 2C) and nestling rearing (∼13 vs. ∼15 • C; Figure 2D). At high elevation, early breeders experienced temperatures that were ∼4 • C lower than late breeders during incubation (∼8 vs. ∼12 • C), although temperatures during nestling rearing averaged ∼11 • C irrespective of phenology. In addition, early phenology, particularly at high elevation, was associated with high coefficients of variation in temperatures during breeding. In this case, early breeding at low elevation was associated with twofold greater variation in day time temperatures during incubation and fourfold greater variation at high elevation ( Figure 2E). During nestling rearing, early breeders experienced double the variation in day-time temperatures at low elevation and three times the variation at high elevation, compared with late breeders (Figure 2F). These results support the assumption of the environmental constraints hypothesis that early breeding is associated with lower and more variable temperatures. The question is, are temperatures early in the season and at high elevation sufficiently low (on average or through greater variability) to compromise metrics of success, as predicted by the environmental constraints hypothesis?

Elevational and Annual Variation in Clutch Size and Hatching Success
Average clutch size in our population was 8.2 eggs (±1.4 SD, range: 4-12; Supplementary Table 3). The greatest contributor to variation in clutch size was lay date, with clutch size declining by one egg for every 2-week delay in the onset of laying over the ∼2 months laying period (F 1,453 = 99.11, P < 0.001; Supplementary Table 3). After controlling for effects of lay date, we found that clutch size increased with elevation (F 1,453 = 14.54, P < 0.001; Figure 3A) and varied among years (F 5,453 = 4.41, P < 0.001; Figure 3B). For a given lay date, clutches were on average 0.6 eggs (8%) larger at high elevation compared with low elevation and differed by up to 0.8 eggs (9%) between years (e.g., 2013 versus 2015). Elevation failed to predict clutch size in the absence of lay date in the model, and the trend was reversed (estimate: −0.00046, F 1,454 = 1.82, P = 0.18; Supplementary  Figure 2). In other words, clutch sizes were only larger at higher elevations relative to their later lay dates, but in absolute terms were of comparable size to those at low elevation despite their later phenology. We found no evidence to suggest that clutch size The difference in sample size arose because 58 experimental nests were excluded from the clutch size analysis, as we modified egg-laying in these nests, though this did not affect hatching (see Supplementary Tables 3, 4 for further details). 95% confidence intervals are presented around lines. Table 3).

was influenced by interactions between lay date and elevation or lay date and year (Supplementary
After excluding nests with complete hatch failure (see section "Materials and Methods"), we found that in 62% of nests at least one egg remained unhatched (mode = 0, range = 0-8 unhatched eggs), leading to an average of 6.9 hatchlings per nest (±1.8 SD, range: 1-11). Hatching success was not affected by lay date (χ 2 1,465 = −0.41, P = 0.52), but was influenced by clutch size, elevation and year (Supplementary Table 4). Larger clutches were more likely to be associated with at least one egg failing to hatch (χ 2 1,466 = −8.16, P = 0.0043). The probability of partial hatching success also increased with elevation (χ 2 1,466 = −6.48, P = 0.011), with an average of 7% more clutches failing to hatch all eggs at high versus low elevations ( Figure 3C). The probability that all eggs hatched in clutches varied significantly among years, with almost all clutches in the mid-early year of 2012 and mid-late year of 2015 having at least one egg remaining unhatched, while significantly fewer nests (50-75% overall) had unhatched eggs in the other years (χ 2 5,466 = −71.49, P < 0.001, Figure 3D). An apparently significant interaction between lay date and year was found to be driven by two late nests in 2015, and no other twoway interactions involving lay date, elevation, year and number of eggs incubated were significant ( Figure 3D, Supplementary  Figure 3, and Supplementary Table 4).
An average of 6.0 nestlings fledged from nests that did not experience complete brood failure (±1.9 SD, range: 1-11; Supplementary Table 6). Later-breeding nests fledged fewer young than early nests, with 0.08 fewer nestlings fledged per day delay in laying of the first egg (F 1,362 = 34.95, P < 0.001). There was no effect of elevation (F 1,361 = 0.004, P = 0.95; Figure 4C) on the number of fledglings produced, although there was significant inter-annual variation in fledging numbers (F 5,362 = 6.71, P < 0.001), ranging from an average of five fledglings in the mid-late year of 2015 to almost seven in the late year of 2013 ( Figure 4D). There were no significant twoway interactions including lay date, number of eggs incubated, elevation or year (Supplementary Table 6).

Nestling Mass
Overall, mean nestling mass in broods between the age of 11-18 days was 10.4 g (±1.0 SD), ranging from 5.9-12.8 g (Supplementary Table 7). Older broods were heavier than younger broods (linear effect: F 1,337 = 11.63, P < 0.001), although age effects tended to asymptote for old broods (quadratic effect: F 1,336 = 3.28, P = 0.071). There were no main effects of lay date (F 1,335 = 0.37, P = 0.54) or elevation (F 1,334 = 0.095, P = 0.76) on nestling mass. There was significant inter-annual variation in nestling mass (F 4,337 = 3.60, P = 0.0068), ranging from an average of 10.2 g in the early year of 2017 to 10.7 g in the mid-year of 2014 ( Figure 5A). Any tendencies for lay date effects on nestling mass to vary among years were driven by outlying late nests (Supplementary Table 7 and Supplementary  Figure 4). However, there was a more robust year * elevation interaction (F 4,332 = 4.083, P = 0.0030; Figure 5B). This interaction was driven primarily by a strong negative association between elevation and nestling mass in 2014, whereas in other years this association was weak or even slightly positive (2015). The interactions between lay date and elevation and between lay date and brood size were not significant, although there was a slight (non-significant) trend for a more positive relationship between nestling mass and lay date with increasing elevation (Supplementary Table 7).

DISCUSSION
By combining a multi-year study with an elevational gradient, we were able to investigate a population's capacity for altering breeding phenology across a broad temperature range and the downstream reproductive consequences. Breeding phenology varied markedly among years and especially across the elevational gradient ( Table 2). In accordance with an assumption of the environmental constraints hypothesis, early phenology, especially at high elevation, was associated with lower and more variable temperatures during breeding. Although early breeders laid larger clutches and fledged more young from successful nests than later breeders on average, we found some evidence to suggest that breeding at low temperatures is associated with reduced success. First, in three of the 6 years the probability of fledging any young was reduced among early breeders, while the number of fledglings produced from success nests was highest in the latest year (2013) and amongst the lowest in the earliest year of our study (2017). Second, both the probability of fledging young and hatching success was reduced at high elevation where temperatures are colder. We have little evidence to suggest that brood failure arose as a result of nest predation nor through reduced food availability. For example, years with high breeding failure did not necessarily have a reduced number of fledglings per successful nest (e.g., 2012, 2013) nor did nestlings have reduced mass (e.g., 2015). Similarly, successful nests fledged the same number of young and at comparable masses across the elevational gradient. Together, our evidence lends support to the hypothesis that the strength of directional selection on advancing phenology can be weakened in small endotherms by an increased probability of experiencing challenging environmental conditions early in the season. This effect could have implications for explaining evolutionary lags between endothermic predators and ectothermic prey.
There is considerable cross-taxonomic support for the suggestion that the phenology of key life events is changing in response to increasing temperatures (Parmesan and Yohe, 2003;Root et al., 2003;Thackeray et al., 2010). However, what is less clear is the degree to which such changes are caused by plastic versus evolved responses, and the limits to advancing phenology (Thackeray et al., 2010;Visser et al., 2015). Longterm studies of tit species breeding in the United Kingdom [1961-2007(Charmantier et al., 2008) and Sweden (1969Källander et al., 2017)] have shown advancements of lay date of ca. 14 and 11 days, respectively, in response to 2-3 • C increases in maximum spring temperatures. While such changes are doubtlessly caused, in part, by plastic responses to changing temperatures (e.g., Gienapp et al., 2008;Merilä and Hendry, 2014;Phillimore et al., 2016), studies of 40 years on short-lived species, where individuals breed in their first year of life, will also provide sufficient time for evolutionary responses to selection (Sheldon et al., 2003;Charmantier et al., 2008). As a consequence, at least part of the changes in phenology documented in these  Supplementary Table 7 for further details). Points show raw values, while predicted lines with 95% confidence intervals control for brood age (B). Boxplots generated from raw data (showing median, interquartile range, minimum and maximum range, and outliers). studies is likely to be a result of evolution (Merilä et al., 2001;Charmantier and Gienapp, 2014;Ramakers et al., 2019). Despite considerable changes in lay date observed over time in such shortlived, temperate, insectivorous passerines, a significant mismatch between the phenology of birds and their prey remains, and it is unclear why birds do not advance lay date more to overcome the detrimental fitness consequences of mismatch (Both et al., 2009a,b;Visser et al., 2012;Radchuk et al., 2019). A better understanding of when and why birds do not (or cannot) breed earlier might be obtained from observations, as presented in our study, in a population experiencing considerable variation in temperature over shorter time periods to avoid 'confounds' of evolutionary responses.
By combining observations across 6 years and a 1,000 m elevational gradient we were able to document variation in breeding phenology over a short time period that complements what we have learned from long term studies. For example, in 2017, laying occurred an average of 9, 19, and 24 days earlier Normal response variables (lay date, clutch size, total number fledging, mean nesting mass per brood) were analyzed using LMs with normal error structure and non-normal response variables (probability of hatch failure, fledging success) using GLMs with binomial error structure and logit link function. Significance was set at α < 0.05. Estimates and standard errors are provided for continuous terms and ranges of estimates are provided for categorical terms. Directionality of continuous response variables in relation to continuous predictors is provided. than in 2013 at low, mid and high elevations, respectively, and females at high elevation began laying 11 days later than those at low elevation, on average. This variation is dramatic, and on par with long-term studies spanning decades described above (e.g., Charmantier et al., 2008;Källander et al., 2017). To put this variation in perspective, at the onset of egg-laying in the late year of 2013, pairs in 2017 were already beginning to incubate their ∼9-egg clutches at low elevation, while, at high elevation, they were in the first week of nestling-rearing (because breeding was proportionally earlier at high elevation in that year). That this variation was observed over just a handful of years suggests that changes in breeding phenology over this study are not a consequence of evolution. However, it is conceivable that later breeding across the elevational gradient is a consequence of local adaptation or genetic drift. While evidence for local adaptation has been observed across short-distances in blue tits across contrasting habitat types (evergreen versus deciduous woodland; Porlier et al., 2012), we think genetic differences are unlikely to offer a valid explanation for the marked phenological variation observed in our study. First, all our nest boxes were located in deciduous woodland, with overlap in tree species composition and prey (Lejeune et al., 2019). Second, our low, medium and high elevation woodlots were located within 0.6-16 km of each other in contiguous woodland habitat; well within 1 SD of average dispersal distances estimated for this species [mean = 5 km ± 15 (SD); Paradis et al., 1998]. Indeed, we have recorded several instances of dispersal between our sites. Finally, although lay date was delayed by an average of 14-19 days at high versus low elevations in five of the years, in 2017 lay date was delayed by just 4 days at high elevation and was sufficiently early in that year to be as early as the second earliest year in low elevation sites (Table 1). Thus, pairs in our population, particularly those breeding at higher elevations, would appear to have the capacity to breed considerably earlier than they typically do in most years. The obvious question is why do they not start breeding earlier, particularly given the demonstrated mismatched phenology of such species with peak invertebrate prey during nestling rearing (e.g., Van Noordwijk et al., 1995;Visser et al., 1998Visser et al., , 2003Visser et al., , 2012?1 The answers to this question are integral to understanding phenological mismatch and are of general importance. While many populations are advancing breeding phenology in response to warming springs (Thackeray et al., 2016), responses are not universal. For example, no systematic change in lay date was observed in a Dutch great tit population studied over more than 20 years , despite spring temperature increasing by up to 2 • C over the same time period (Visser et al., 1998;Husby et al., 2010). Indeed, data from 24 European great tit and blue tit populations suggests significant variation in the phenological responses to increasing spring temperatures, even among neighboring populations (Visser et al., 2003). Further, even for those populations that are responding, higher trophic levels are typically responding with reduced magnitude compared with lower trophic levels. The common explanation is that mismatching is due to evolutionary lags of higher trophic levels with longer generation times (Cushing, 1969;Visser and Both, 2005). However, the results of this study (and others, e.g., Visser et al., 2003;Both et al., 2006;Gienapp et al., 2008;Merilä and Hendry, 2014) highlight that the answer is likely to be nuanced, and influenced in significant part by within-and among-year variation in meteorological patterns (Visser et al., 2015). Understanding why our high elevation populations do not advance breeding despite the ability to do so will provide new insights to phenological mismatch in this and other populations.
If an increasing probability of experiencing more challenging environmental conditions acts as a significant constraint on advancing phenology we would expect early breeders to be sometimes disadvantaged (Zajac, 1995;Visser et al., 2015). It is well known that unfavorable meteorological conditions at critical times can have significant impacts on organisms, with cascading effects on interacting species (Parmesan, 2006;Marrot et al., 2017). For example, the phenology of budding in many plants is highly sensitive to spring temperatures, with plants being killed by cold snaps (Weiser, 1970). Even though the main prey of blue tits, caterpillars, are relatively cold tolerant (Nadolski and Bañbura, 2010), insectivorous prey are less active during colder conditions, and thus harder to find (Taylor, 1963). However, the degree to which early breeding endotherms are disadvantaged by a return of wintery conditions during breeding is less clear. Nevertheless, in house sparrows (Passer domesticus), hatching success was negatively affected by extremely cold days during incubation (Pipoly et al., 2013). Likewise, wintery conditions are known to cause delays to the onset of incubation and hatching and to be associated with reduced reproductive success in Polish great tit and blue tit populations (Kluen et al., 2011;Gladalski et al., 2018Gladalski et al., , 2020. That blue tits in our population can suffer from challenging meteorological conditions, including early in the season, comes from at least two sources. First, hatching failure was significantly higher in 2012 and 2015, and the former at least had unusually cold weather during egg-laying and incubation of early breeders. Furthermore, early breeders tended to have increased brood failure in 2012, 2013 and 2015 compared with the other years. In these 3 years, early breeders experienced the lowest daily maximum temperatures, while 2012 and 2013 were also the coldest 2 years on average during the month from 20 March. As mentioned above, brood failure is difficult to explain by differences in predation or in prey availability, since clutch sizes, fledgling numbers at successful nests and nestling mass were not reduced in these years compared with the others. Second, hatching failure and complete loss of broods was more common at high elevation where temperatures were significantly colder and more variable during both incubation and nestling periods. Further, if territory quality were inferior at higher elevations per se, we would expect reduced clutch sizes, fledgling numbers in successful nests, and/or nestling mass at high elevation nests compared with those at lower elevation, but none was the case. Together these results are consistent with the environmental constraints hypothesis, that challenging weather conditions more often experienced early in the breeding season, weakens the strength of selection on phenological advancement (Visser et al., 2015).
While environmental constraints on early breeding as outlined above should dilute the strength of directional selection on advancing phenology, they were insufficiently strong to alter the shape of the linear seasonal declines in fecundity and fledgling production. Seasonal declines likely exist because early breeders are often better-quality individuals on better quality territories (Verhulst and Nilsson, 2008), but they might also in part be explained by well-documented reductions in prey availability later in the season (Verhulst and Tinbergen, 1991;Winkler and Allen, 1996;Emmenegger et al., 2014). Clarifying the strength of selection on advancing phenology, therefore, requires a better understanding of the associations among temperature, the cues used to time breeding and the timing of prey availability (Visser and Both, 2005). Nevertheless, our results suggest that such associations might be more complicated than is typically assumed. Most notably, whilst we found no significant variation in the strength of seasonal declines (i.e., slopes) in breeding success among years, there were significant amongyear differences in average breeding success (i.e., intercepts) which are not obviously driven by phenology. Indeed, many patterns we detected were not consistent with typical patterns where early breeders are more successful. Breeding phenology was advanced at high elevation relative to temperature compared to low elevations, but these earlier breeders did not yield higher fitness than low elevation birds. In addition, we found no evidence for a relationship between breeding phenology and clutch size across years and clutch size was comparable across the elevation gradient, despite later lay dates at higher versus lower elevations. Furthermore, while the probability of fledging young showed seasonal declines in early-mid phenology years, the pattern was reversed in mid-late phenology years. Finally, fledgling numbers at successful nests were comparable across the elevational gradient, despite marked variation in phenology, and were not influenced by the average phenology of a given year [e.g., numbers were highest in mid-early (2014) and late (2013) years, lowest in mid-late 2015 and intermediate in the early year of 2017]. One explanation for these patterns is that the associations among temperature, timing cues and prey availability co-vary non-linearly, leading to inter-annual variation in the association between phenology and prey availability. For example, because the developmental rates of ectothermic invertebrates can be more than halved in favorable temperatures (Buckley et al., 2012) yet are more temperature-invariant in endotherms (Buse et al., 1999), it is likely that phenological mismatches are exacerbated in early compared to later phenology years. Thus, we might expect breeding success in species such as blue tits to be maximal in years where conditions are suitably cold early to slow the development of their prey, but not so challenging to compromise their own success. In other words, the fitness impacts of phenological mismatch could paradoxically be more severe in early years whereas the impact of mismatch might be more limited in later phenology years. Either way, the finding that early phenology years do not associate with increased breeding success will likely act as a further impediment to the evolution of advancing phenology in endothermic predators. Thus, even in the absence of challenging conditions, inter-annual variation in the timing and magnitude of environmental conditions might generate a fluctuating selection pressure on absolute timing; further diluting the strength of selection for advancing phenology in iteroparous organisms and compounding the evolutionary lag across trophic levels.
In conclusion, we propose that short-term studies using elevational temperature gradients within populations provide a valuable complement to long-term studies for understanding population responses to climate variation and change. Most importantly, our approach provides a clearer insight into the capacity for populations of a current genotype to respond to meteorological variation, since we are able to introduce such variation to the same population using an elevational gradient. We found that despite a clear capacity for earlier breeding (based on lay dates in 2017), breeding was typically delayed, particularly at high elevation. It is not known whether earlier breeding would have been more beneficial in any of the sites or years, and so the degree of any phenological mismatch is unknown. However, it is noteworthy that breeding at high elevation and early breeding in some years was associated with increased probabilities of brood failure and there was no obvious association between average phenology in a given location or year and breeding success. Together, these results suggest that challenging environmental conditions during breeding can act as an evolutionary brake on advancing phenology and that environmental variation among years dilutes the strength of any directional selection on advancing phenology across evolutionary timescales. The obvious next step is to elucidate the association between the breeding phenology of tits across years and sites and the patterns of prey availability, as well as to identify the environmental cue that underpins phenology in our population.
Multiple cues likely instigate breeding (Gienapp et al., 2010), and identifying such cues are beyond the scope of this study. Suffice to say that if the cues involve day length and temperature (Lack, 1954;Lambrechts et al., 1996;Dawson et al., 2001;Gienapp et al., 2010;Bonamour et al., 2019), it will need to be an interaction between the two to explain why delayed breeding at higher elevations occurs at reduced temperatures than at lower elevations since our population has the same day length on a given date. In order to advance phenology significantly, it might be that it is selection on and evolution of the cues used to time breeding that need to change (Lyon et al., 2008). This is especially important in light of the increased likelihood of extreme weather events under future climate prognosis (Intergovernmental Panel on Climate Change, 2014), and these extreme weather events should particularly impact early breeders (e.g., Gladalski et al., 2014;Moreno et al., 2015). Further studies from a combination of longitudinal, experimental and environmental cline settings are required to unpack the relative contributions of selection for and against advancing breeding phenology under current climate change, with due consideration of constraints and cues.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation, to any qualified researcher.

ETHICS STATEMENT
The animal study was reviewed and approved by the state of Ariége animal experimentation review (Préfecture de l'Ariège, Protection des Populations, no. A09-4) and the Région Midi-Pyrenées (DIREN, no. 2012-07). Bird capture was carried out under permits to ASC from the French bird ringing office (CRBPO; no. 13619; PP576).

AUTHOR CONTRIBUTIONS
ACB, ASC, and AFR conceived the study and wrote the manuscript. All authors collected data. ACB compiled the data and conducted the statistical analyses. All authors gave final approval for publication and agreed to be held accountable for the work performed therein.

FUNDING
This work was funded by grants from the Région Midi-Pyrénées (ACB, ASC, and AFR), the Centre National de la Recherche Scientifique (ACB and ASC), a Natural Environment Research Council studentship to the University of Exeter, as well as research grants from the Agence National pour la Recherche (ANR-JCJC "NetSelect") and the Human Frontiers Science Partnership (RGP0006/2015 "WildCog") to ASC. This work is part of the Laboratoire d'Excellence (LABEX) entitled TULIP (ANR-10-LABX-41).