Process models and model-data fusion in dendroecology

Dendrochronology (i.e. the study of annually dated tree-ring time series) has proved to be a powerful technique to understand tree-growth. This paper intends to show the interest of using ecophysiological modeling not only to understand and predict tree-growth (dendroecology) but also to reconstruct past climates (dendroclimatology). Process models have been used for several decades in dendroclimatology, but it is only recently that methods of model-data fusion have led to significant progress in modeling tree-growth as a function of climate and in reconstructing past climates. These model-data fusion (MDF) methods, mainly based on the Bayesian paradigm, have been shown to be powerful for both model calibration and model inversion. After a rapid survey of tree-growth modeling, we illustrate MDF with examples based on series of Southern France Aleppo pines and Central France oaks. These examples show that if plants experienced CO2 fertilization, this would have a significant effect on tree-growth which in turn would bias the climate reconstructions. This bias could be extended to other environmental non-climatic factors directly or indirectly affecting annual ring formation and not taken into account in classical empirical models, which supports the use of more complex process-based models. Finally, we conclude by showing the interest of the data assimilation methods applied in climatology to produce climate re-analyses.


INTRODUCTION
Relationships between tree-ring proxies (width, density, isotopes) and climate have usually been estimated with statistical approaches (Fritts, 1976;Cook and Kairiukstis, 1990). More or less complex statistical relationships have been calibrated to large datasets either to understand the response of the proxy to climatic changes or to reconstruct past climates. In both cases, these statistical approaches often reduce the problem to linear relationships; they assume that there were no major changes in the tree environment other than the climatic factors considered; finally they consider the underlying processes as a black box. Process-based models developed for tree-growth (Fritts et al., 1991;Berninger et al., 2004;Misson, 2004;Rathgeber et al., 2005;Anchukaitis et al., 2006) and isotope fractionation (Danis et al., 2012) studies give an opportunity to go beyond these limitations. Used in a forward mode, process-based models are driven by climate and other environmental factors (such as CO 2 concentration, nutrient availability and soil moisture) (Gaucherel et al., 2008b), they can take into account competition between individual, and they enable one to better understand the functioning of the trees and the effect of the interaction of different environmental forcings. In an inverse mode, climate information can be extracted by relaxing the above-mentioned constraints (Boucher et al., 2014).
After a rapid overview of the models used or potentially useable in dendroecology, we succinctly present the approaches of Model-Data Fusion (MDF). We then present different ways to merge data with these models depending on the objective, basically calibration of parameters or inversion to estimate input data such as climate. We conclude with the possibilities offered by data assimilation.

TREE-GROWTH PROCESS MODELS
Growth models usually assume that the growth of forests is ultimately limited by the capacity of trees to photosynthesize and acquire nutrients to maintain growth. These models are here roughly classified in four categories related to the processes taken into account (Table 1). (Rathgeber et al., 2000a) used dendroecological data with a biogeochemistry model (BIOME3, Haxeltine and Prentice, 1996) to provide a realistic evaluation of the impact of global change on the productivity of forest stands. By using BIOME3, they applied a global model at a local scale. In such an approach, the spatial and temporal scales at which some vegetative processes were mathematically described did not correspond to the scales at which the data were acquired. This scale issue is overcome by other carbon-based models that take into account tree-growth processes in varying detail and degrees of complexity: (Federer et al., 1989;Fritts et al., 1991;LeBlanc and Foster, 1992;Foster and Leblanc, 1993;Medlyn et al., 1999;Roux et al., 2001;Berninger et al., 2004;Girardin et al., 2008). In their review, (Roux et al., 2001) raised two critical issues for tree-growth models that call for the development and testing of original modeling schemes: the representation of carbon allocation, storage and remobilization. These linked processes possibly explain why the stem growth of trees is often linked from 1 year to the next,  (Rathgeber et al., 2000a) x x VS-light (Tolwinski-Ward et al., 2011) x - (LeBlanc and Terrell, 2001) x StandLEAP (Girardin et al., 2008) x x SIMFORG+SICA (Berninger and Nikinmaa, 1997) x x x MAIDEN (Misson, 2004) x x x CASTANEA  x x x VS (Fritts et al., 1991) x x x x ECOPHYS (Rauscher et al., 1990) x x - (Ogee et al., 2009) x x x x - (Hölttä et al., 2010) x x CAMBIUM (Drew et al., 2010) x x x x MAIDENiso (Danis et al., 2012) x x x x ISOCASTANEA (Eglin et al., 2010) x x x x producing autocorrelation in tree-ring series (Desplanque et al., 1998;Berninger et al., 2004). The model of , i.e., MAIDEN, is particularly well adapted to reproduce these temporal correlations. A simplified approach is provided by models representing the water cycle where tree-growth is correlated with soil moisture (LeBlanc and Terrell, 2001) or where soil moisture is combined with temperature and light through the most limiting factor principle (Tolwinski-Ward et al., 2011). Bioclimatic models (Rathgeber et al., 2005), in which tree-growth is correlated with several climatic variables assumed to drive tree-growth (e.g., soil moisture, chilly temperature, growing season temperature) are of the same type. They are not really ecophysiological models, as important ecophysiological processes such as photosynthesis, respiration and cell multiplication are not implemented. These models can be useful when applied on a large scale where the inclusion of a large number of species and biotopes makes it difficult to apply higher order models.
True ecophysiological models try to represent the interaction between the carbon and the water cycles in the tree (see columns 2 and 3 in Table 1). Several phases in the tree growth are distinguished according to the phenology, from no activity in winter to leaf and root expansion in spring, biomass production in summer, carbohydrate reserve accumulation at the beginning of autumn and leaf and root senescence during late autumn. Photosynthesis and stomatal conductance are often based on the Farquhar biochemical model (Farquhar et al., 1980). Daily temperatures and precipitation drive the estimates of photosynthetic production, respiration and carbon allocation in four compartments (leaf, bole, root and storage) (see column 4 in Table 1). The water cycle includes transpiration, interception, soil evaporation, drainage and runoff. All these processes are explicitly represented in MAIDEN (Misson, 2004), but also in various models such as VS (Fritts et al., 1991), CASTANEA Dufrene et al., 2005), ECOPHYS (Rauscher et al., 1990) or SIMFORG (Berninger and Nikinmaa, 1997). The final output of these models is generally tree productivity but other physiological proxies such as those reflected by cellulose isotopic composition are also simulated by some of these models (column 6 in Table 1, Panek and Waring, 1997;Ogee et al., 2009;Eglin et al., 2010;Danis et al., 2012).
These models are used to project the impact of climatic changes on forests, but they are far from being perfect, in particular when management practices and non-climatic factors must be taken into account. Several processes are not yet included, e.g., the mortality of trees and associated stand dynamics, as in Pedersen (1998) or Bigler and Bugmann (2004). They often assume monospecific forests. Disturbances by insects (Hogg, 1999) or fires (Covington et al., 2001) are not explicitly included. Nutrient availability, which may limit tree productivity, or the influence of hydraulics on the tree performance, are other gaps in several models. Most of them (e.g., Pietsch and Hasenauer, 2002) predict an increase in biomass production proportional to photosynthesis production and then to CO 2 assimilation (Rathgeber et al., 2003;Gaucherel et al., 2008b). The results of Berninger et al. (2004) show that, although photosynthesis increases in response to increasing CO 2 concentrations, tree growth rate cannot parallel the increase in photosynthesis because the potential growth rate is limited directly by temperature through nitrogen mineralization. The apparent fertilization effect of the models, related to water use efficiency (WUE) enhancement, itself due to a decrease in stomatal area, has not been observed in experimental chambers or in tree-ring series (Gedalof and Berg, 2010;Silva and Horwath, 2013). There is therefore much controversy as to whether the increase in CO 2 since the beginning of the industrial period has enhanced growth, with much recent evidence reporting changes in plant physiological processes but not in growth (Huang et al., 2007;Andreu-Hayles et al., 2011;Peñuelas et al., 2011;Girardin et al., 2012;Lévesque et al., 2014). Experiments are often done on young trees while the CO 2 effect is a long term process which cannot be easily investigated by short-term experiments (Medlyn et al., 1999). Even when these experiments are done on old trees (30 years old for Alpine conifers, Handa et al., 2005), the duration of the experiments is necessarily limited in time (3 years in the FACE experiment (Handa et al., 2005). Even when an increased WUE is observed, it is not necessarily related to a tree-growth enhancement (Andreu-Hayles et al., 2011;Peñuelas et al., 2011). Nevertheless some studies have acknowledged that WUE enables a better resistance to drought (Linares and Camarero, 2012;Girardin et al., 2012) and thus a strong fertilization effect (Keenan et al., 2013).
Most models presented in the previous paragraphs simulate biomass production but not cell and wood composition. These processes cannot be ignored when we are interested in wood quality (Houllier et al., 1995). In dendrochronology, the Vaganov-Shashkin (VS) model (Fritts et al., 1991) was a pioneer in this domain. It simulates cambial activity as observed by the number of cells, their size, wall thickness and wood density. In this model, these processes at the level of the cell are described as a function of environmental conditions. (Hölttä et al., 2010) proposed a model where the processes of cell enlargement and cambium activity are related to the short-term carbon dynamics. The CAMBIUM model links productivity to cambium activity and tries to mimic the spatial distribution of the xylem cell structures (Downes et al., 2009;Drew et al., 2010). The first block of the standard VS model is the estimation of the growth rate related to photosynthesis production. The second block is the dynamics of cell growth. The first block has been recently simplified and called VS-light (Tolwinski-Ward et al., 2011): it has been reduced to a product of three limiting climatic factors, i.e., temperature, soil water balance and solar radiation, and is therefore no longer a true process model.

MODEL-DATA FUSION METHODS
Basically, model-data fusion (MDF) does not concern the huge field of standard statistical modeling which is based on the use of data to calibrate the parameters of a probabilistic model. MDF, in its present meaning, concerns complex process models and encompasses calibration, model inversion and data assimilation techniques (Evensen, 2004;Raupach et al., 2005;Tarantola, 2005).
The methods have been reviewed for ecological applications by Peng et al. (2011). They are used to improve the performance of a model by either optimizing/refining the values of unknown parameters and initial conditions or by improving the predictive capacity of a model by constraining the model by data. All the MDF methods reported in the literature have the following features: (i) a model is used to describe the temporal evolution of output variables; (ii) observations correlated with model outputs are available; (iii) an objective function is defined that combines model outputs and observations with any associated prior information and error structure; and (iv) optimization techniques are used to adjust forward model parameters or output variables to minimize the discrepancy between model estimates and observations. The general principle of MDF is a selection of model properties that minimizes the gap between model system representation and real systems based on observations and prior characteristics. An objective or cost function is then constructed to quantify the mismatch between model predictions and observations by also taking the model-data errors into account. Discovering optimal parameters can help to improve predictions or test alternative hypotheses embedded in models. A key feature of MDF schemes is how they incorporate information that pertains to uncertainty for both the model and the observations, providing a best estimate of the true state of a system. As this is also at the core of the Bayesian paradigm, recent approaches are increasingly using Bayesian statistics.
Any optimization technique can operate either on data batches (i.e., all the data available are processed in one batch) or sequential data (i.e., data are used sequentially to update the optimization). Variational data assimilation operates in a batch processing manner over a given time window that contains observational time points (Kalnay, 2003). The four-dimension variational method uses historical observations and requires computation of the adjoint model, i.e., a transformation of the model which makes it possible to go backwards in time (a forward model is used to go forwards in time).
One of the most popular examples of a sequential method is the Kalman filter (KF) (Kalman, 1960). KF is a recursive algorithm that estimates the state of a system at each iteration using a state-space model in combination with (noisy) measurements. The objective of KF is to reduce the influence of noise that occurs in measurements. It provides a convenient representation of model, data and parameter errors (Williams et al., 2005;Wang et al., 2009). KF has two distinct phases: the prediction phase and the updating phase. The prediction phase uses the state estimate from the previous time step to produce an estimate of the state at the current time step. For the updating phase, the current a priori prediction is combined with current observational data to refine the state estimate. The Ensemble Kalman filter (EnKF) is an extension of the traditional KF. EnKF is now used in ecology (Williams et al., 2005;Fox et al., 2009) to optimize model parameters (Quaife et al., 2008;Mo et al., 2008).
Bayesian statistics were conceptually introduced in paleoclimatology by Korhola et al. (2002) and Haslett et al. (2006), but without any reference to a mechanistic model and by Guiot et al. (2000) with a mechanistic model. The idea of the approach is to define an a priori distribution of the parameters (calibration) or the input variables (inversion) based on accumulated expert knowledge or on results obtained with independent data and to calculate their a posteriori distribution based on available data and model simulations. The objective is not to obtain the "best fit" estimations but to integrate the parameter or input variable space to obtain a robust estimation of their probability distribution. These approaches are based on intensive calculation algorithms. A popular type of such algorithms is known as the Monte Carlo Markov Chain (MCMC) algorithm. Let us consider a multi-dimensional mathematical space where each dimension represents a parameter or an input variable (depending on whether the objective is calibration or inversion). We define a vector of parameters as an element of this multi-dimensional climate space. Then the Metropolis-Hastings iterative algorithm is used to browse the climate space according to an acceptance-rejection rule (Metropolis et al., 1953;Hastings, 1970). The output of this algorithm is a path or chain of climate parameters describing the posterior distribution of climate parameters. In the domain of climate reconstruction, the approach is still not widely used (Wu et al., 2007(Wu et al., , 2009).

CALIBRATION AND PREDICTION
All models, from the simplest to the most complex, contain parameters that are often set by knowledge of the underlying processes and/or from information contained in the literature.
When this information is not available, it is necessary to directly estimate the parameters by fitting the model simulations to data. This is the calibration step of modeling. The choice of an applicable calibration technique is necessary for successful model parameter estimation. Techniques range in complexity from manual calibration to algorithms such as global optimization, MCMC and data assimilation techniques. Manual calibration consists in a user inspecting the performance from a model and comparing it to the observations in order to determine the best parameters (Misson, 2004). This method is often time-consuming and sometimes difficult to implement. Global optimization techniques, which attempt to find the most accurate set of parameters with respect to data, are more automatic. Optimization is achieved by maximum likelihood approaches where the model parameters are adjusted by maximizing a likelihood function depending on the data and model parameters. See for example the PEST method in hydrology (Doherty, 2003). This provides accurate, reproducible parameters for a model, but lacks the ability to take into account the uncertainties of the model and data. Bayesian methods are useful to circumvent this problem. MCMC offer the opportunity to estimate the (a posteriori) parameter distribution once the calibration has been done, starting from their predefined (a priori) statistical distributions (Gelman et al., 1995;Andrieu et al., 2001). Such approaches can nevertheless lead to unrealistic results, in the event of inappropriate a priori information or non-relevant parameter distribution extractions. To bypass this difficulty, other kinds of procedures such as Particle Filtering (PF) have been recently proposed on the basis of genetic algorithms Pelletier et al., 2006). Unlike MCMC approaches, the procedure still consists in going though the parameter-space, but with a high number of initial choices, called particles, that are defined by their parameter values (their coordinates).
We present here the comparison of three methods, namely the PEST, MCMC and PF algorithms (Gaucherel et al., 2008a,b), to estimate 11 parameters of the MAIDEN model (Misson, 2004). The study is applied to Pinus halepensis ring width and xylem density series and climatic series from the Aix-en-Provence and Istres weather stations, all from Southern France (Nicault, 1999;Rathgeber et al., 2000b). The process-based model MAIDEN uses daily precipitation, minimum and maximum temperatures as input and was implemented for the 1950-1994 period. The model output of interest is the biomass allocated to the tree bole which is assumed to be comparable to the annual increment index based on wood density and ring width. The 11 parameters were calibrated on the 1962-1994 period using the Aix-en-Provence data and verified on the independent 1950-1961 period using the Istres station. In fact, several parameters were calibrated with daily physiological measurements, but we focus here on the dendrochronological calibration. Figure 1 shows the interest of having a method able to provide uncertainties on the parameters. Some parameters have a Gaussian shape, which indicates that the parameter has an optimum value with a reasonable uncertainty, but the others are not really sensitive as they have a uniform distribution over a large interval: this means that these parameters can have relatively different values without really changing the simulation of the bole biomass increment. The PEST optimum values, the MCMC FIGURE 1 | The eleven MAIDEN parameter probability distributions using the particle filter technique for Pinus halepensis. The heavy black dashed lines indicate the PEST optimum parameter value, the blue dashed lines indicate the 5 and 95% percentiles, while the blue continuous lines show the MCMC parameter modes for comparison. Meaning of the abbreviations: G dd , Growing degree-days in C; a 1 , Leuning factor, linking stomatal conductance and photosynthesis; D o , coefficient linking the stomatal conductance and the humidity deficit at the leaf surface, in P a ; θ c , Soil water stress parameter, in mm; c 2 , Temperature function  Gaucherel et al. (2008a). and PF probability modes of the eleven parameters are used in MAIDEN to estimate the bole biomass increments which are compared to the measured values. Figure 2 compares the three estimated curves with the observed curve for both the calibration period and the verification period. The agreement between the three methods and the observations is good, except for the 1956-1958 interval. In fact February 1956 was extremely cold and this seriously injured the trees which took 3 years to recover. This process is not included in the model so the 1956-1958 interval has been removed from the verification. The PEST optimization gives a r 2 = 0.46. This optimum fit is in good agreement with the MCMC final fit distribution, as the maximum MCMC fit value reached is r 2 = 0.52, while its modal value is 0.37. PEST needed only 16 iterations to reach this value. For the PF algorithm, the final fit gives a r 2 = 0.25 ± 0.02 with 1000 particles, a fit lower than that for the other calibration techniques. When applied to independent meteorological data, the three calibration parameter sets lead to rather similar bole biomass increment variations. Residuals between the standardized final simulations and the observations over the 11 years of validation show that the MAIDEN model has difficulty in simulating some years: namely the freezing years 1956 and 1985 and following, the rainy year 1963 and years 1993-1994 (Figure 2). When considering the residuals between the three final bole increment series and the corresponding observations, it appears that MCMC and PEST present rather similar behavior, distinct from that of the PF algorithm.
PEST is an extremely fast method, but its main disadvantage is that it does not provide any information on the uncertainty of the parameters. The main disadvantage of PF is that the particle number has to be high (from 10 2 to 10 6 ) for the algorithm to be efficient, which is computationally costly. However PF has several advantages over MCMC and PEST (Gaucherel et al., 2008a): (i) it is easy to develop and to implement; (ii) it does not need any burning period (i.e., period necessary to reach equilibrium between inputs and outputs) nor parameter distribution filtering to calculate the a posteriori distributions; (iii) it can be parallelized; (iv) almost no tuning algorithm is necessary. As a sequential approach, the PF algorithm has the ability to indicate the highly constraining years, i.e., those which poorly fit simulations and observations. The final advantage of PF is that as it is based on the law of large numbers, it obtains smoother and more regular (Gaussian) parameter distributions, making it easier to analyze and interpret.
In a second step, using the parameter set obtained from MCMC, the model was applied to climate simulations provided by the ARPEGE climate model driven by the IPCC-B2 scenario (Gibelin and Deque, 2003) after statistical downscaling (Gaucherel et al., 2008b). The CO 2 atmospheric concentration has an influence on the climate and is taken into account by the climate model simulation, but it also has a direct effect through fertilization (already mentioned in Section 2). Figure 3 shows the projection of Pinus halepensis growth for the next century with and without taking into account this fertilization effect (Gaucherel et al., 2008b). It appears clearly that ecophysiological models such as MAIDEN can be very sensitive to this effect. The main reason is that the model does not include any nutrient limiting factor, particularly to link the influence of nitrogen and photosynthetic capacity. Without the fertilization effect, tree growth will decrease because of lack of water (the precipitation deficit is amplified by more evapotranspiration), but if the latter effect is included, tree-growth will increase. As mentioned in the literature review in the introduction section, this fertilization effect is not always observed on living trees and is subject to controversy (Gedalof and Berg, 2010;Silva and Horwath, 2013).

INVERSION AND RECONSTRUCTION
Inverse modeling is a statistical method that can be used to estimate variables that are directly or indirectly related to the measured quantity. An inverse model differs from forward modeling in that it uses dependent variables (e.g., bole increment) to constrain input climate variables rather than using climate to predict dependent variable distributions. It is not an analytical inversion, but an iterative procedure which converges progressively toward the climate probability distribution compatible with the observations. The climatic space is randomly sampled to produce a large variety of climatic scenarios which are introduced in the vegetation model to simulate the corresponding vegetation composition and productivity. In dendroecological applications, the simulated bole increments from the process-based inversion are compared to the tree-ring data and those matching reasonably well are retained. These scenarios are used to build histograms, which are estimates of probability distribution functions of a climate able to generate such bole increments. This implies a measure of coherence between outputs of the model and tree-ring proxies.
Inversion of the model is done under the paradigm of the Bayesian theory (Robert and Casella, 1999). The main concepts used are prior and posterior probability distributions. The prior is the information, summarized in the form of a distribution, which is available before data analysis. The posterior is the information that will be deduced from the combination of data and the hierarchical model. In that respect, the hierarchical model is not restricted to the tree-growth model, but it is the function which relates the prior to the posterior. In statistical terms, it is the probability of climate C conditional to tree-ring data (Y). It is noted, according to Bayes' law: where p(C) is the prior distribution of climate C and f (Y | C) is the likelihood function of tree-ring variables given the climate (as provided by the model from the climate). The prior is an initial guess of the probability distribution of the climate. It can be given by the knowledge we have from modern climate variability or, if we work on periods that are very different from the present one, from the knowledge which has been accumulated in palaeclimatology. The distribution law is then a uniform law defined on the range so defined. The likelihood function is provided by comparing tree-growth model simulations given a large set of climate scenarios to observed tree-growth data. The above-mentioned inversion approach has been developed in the context of a multi-proxy reconstruction in Fontainebleau forest, near Paris, France (Boucher et al., 2014). At this location, different tree-ring proxies (ring widths, carbon and oxygen stable isotopes) have been measured and their sensitivity to climate has been demonstrated (Etien et al., 2008). Furthermore, from the modeling perspective, MAIDENiso was successfully parametrized and developed (in the forward mode) for this specific site (Danis et al., 2012). Thus, running MAIDENiso in the inverse mode allowed (Boucher et al., 2014) to retrieve, for each year since 1850 AD, a probability distribution of climatic inputs. Additionally, since MAIDENiso equations are constrained not only by meteorological conditions, but also by atmospheric CO 2 concentrations (e.g., via stomatal conductance modeling), it was possible to isolate the contribution of CO 2 to the overall reconstruction, and to point out a possible divergence effect associated with a CO 2 enriched atmosphere since the beginning of the industrial era. The idea behind the inversion performed at Fontainebleau was quite simple. For each year in which proxies were measured (but where climate was not available), we modified a reference year to fit the tree ring proxies as well as possible. The reference year (here 1984) was chosen to be as similar as possible to an "average year." The modification of the reference year was performed using the delta method. Deltas are additive (temperature) or multiplicative (precipitation) values that modify the reference year, producing alternative meteorological scenarios that are iteratively injected into MAIDENiso. It is precisely the posterior distribution for these deltas that needs to be found by the inversion algorithm, for each year of the reconstruction.
As a first step in the inversion, a uniform prior is set on both temperature and precipitation delta values (Figure 4). A uniform prior means that only the bounds of paleo-temperature [−5,5] and precipitation [0.25,2] variability are fixed and all the values have equal probability. In a second step, using the MCMC procedure, we iteratively modify the delta values (called proposals) within the bounds of the uniform distribution. Proposals are accepted if two main conditions are met: (1) the proposal must be within the bounds of the prior distribution, and (2) the proposal must result in some gain in simulation accuracy (here, measured by a simple Euclidian distance between simulations and observations). A negative gain is accepted if it is above a small negative threshold. Otherwise, the proposal is rejected. The MCMC algorithm stops when the chain reaches a stable state, meaning that the successive accepted proposals are not correlated with one another (stationarity). All the accepted proposals are used to build a posterior distribution that represents the range of possible delta values required to simulate tree rings that resemble real-world proxies.
For each year of the reconstruction, delta posterior distributions differ because proxies, themselves, are different (Figures 4B,C). For example, the years 1960 and 1990 yield quite different summer temperature and precipitation estimates (black curves). Summer 1960 was relatively cold (median temperature delta = −3.4C) and wetter than normal (median precipitation delta = 1.35, i.e., +35%). In contrast, summer 1990 was warmer than normal (median temperature delta = 1.9C), while precipitation was close to average (median precipitation delta = 1.15, i.e., +15%). However, it is interesting to note that posterior distribution also changes depending on the CO 2 scenario used to constrain the inversion. In the preceding simulations, a realistic CO 2 scenario was used (Robertson et al., 2001), meaning that CO 2 variations are coherent with the increasing trend observed on the global scale [280 to 400 ppmv]. However, if CO 2 concentrations are fixed at preindustrial levels for the whole reconstruction period , then overall temperature reconstructions differ considerably in the low-frequency domain (red curves).

FIGURE 4 | Prior and posterior distributions (N = 10 000 accepted proposals) of delta values for temperature (left) and precipitation (right).
Deltas are additive (temperature) or multiplicative (precipitation) constants that are applied to the reference year (here 1984) to produce alternative meteorological scenarios that are iteratively injected in MAIDENiso, through the inversion algorithm. Prior distributions for temperature and precipitation are shown in the upper panels (A,D). Posterior distributions for temperature (B,C) and precipitation (E,F) are shown in the middle and lower panels, respectively. Black curves are posterior distributions obtained when the inversion is forced by realistic CO 2 data, while red curves are posterior distributions obtained when constrained by CO 2 levels fixed at pre-industrial levels (280 ppmv). Figure 4C, where it is clear that the summer temperature reconstructed for year 1990 is colder by about 2C if pre-industrial CO 2 concentration (280 ppmv) is used instead of actual 1990 concentrations (370 ppmv). This difference becomes even more apparent when the full reconstructions are compared Figure 5. When pre-industrial CO 2 levels are used for the whole time period (red), a colder climate is reconstructed, and most importantly, the warming trend clearly observed in climatic records at Fontainebleau cannot be reproduced.

This effect is partly visible on
Although, as we have previously discussed, isolating the net effect of CO 2 on the forest response is likely to be more complex than the effect we modeled, our results suggest that CO 2 is an important driver for tree growth in Fontainebleau forest, and that the constraint of inversion with a realistic CO 2 scenario makes the reconstruction better converge toward observed climate variability. Abnormally low pre-industrial CO 2 concentrations force the model to compensate by too cold a climate. In the case of Fontainebleau forest, this compensation effect by a colder climate can be explained by links and feedbacks between climate, stomatal conductance and photosynthesis. With lower atmospheric CO 2 concentrations, stomata have to maximize carbon gains by increasing their area to match real-world growth rates. However, by doing so, water losses by evapotranspiration and evaporation at the leaf surface also increase. So, in order to compensate for water losses, reconstructed temperatures need to be lower, a situation that ultimately minimizes evapotranspiration and evaporation fluxes.

TOWARD BETTER ECOPHYSIOLOGICAL MODELS
The inversion of mechanistic dendroecological models offers a great framework for dendroclimatologists to incorporate and deal with non-stationarity (i.e., the so-called divergence D' Arrigo et al., 2008) and with interactions between climate, tree growth and CO 2 concentrations, among other things. This well-known "divergence" problem which is related to the fact that the climatetree-growth relationship was modified after the 1960s or later, is possibly an artifact of inadequate detrending (D'Arrigo et al., 2008;Esper and Frank, 2009) or an effect of CO 2 fertilization or change in water availability (Daux et al., 2011). The problem of tree-growth trend is often discussed in dendrochronology (Melvin and Briffa, 2008;Nicault et al., 2010). Trends in treegrowth time-series may be caused by long-term variations in climate, long-term CO 2 or nitrogen increases, and various factors related to the tree physiology and wood structure. Numerous techniques have been used to try to separate the climate trend from other factors, with more or less success. (Melvin, 2004) showed that the extensive use of concepts taken from tree-growth Black curves (and the associated 90% uncertainty range) are median reconstructed values obtained when the inversion is constrained by realistic CO 2 concentrations. The red curve (and the associated 90% confidence intervals) are median reconstructed values obtained when the inversion is forced by CO 2 concentrations fixed at pre-industrial levels (280 ppmv). The dashed green line represents observed climate series . Linear lines with corresponding colors were added to each series to evaluate linear trends in reconstructed and observed climates. Adapted from Boucher et al. (2014). models and based on the assumption that common external growth forcing operates through its influence on photosynthesis may solve the problem. The example of model inversion shown in the previous section confirms the relevance of this approach.
Increasing atmospheric CO 2 concentration enhances biomass production in MAIDEN as in most ecophysiological models (Berninger et al., 2004). Under elevated CO 2 concentrations, stomatal area may be reduced without reducing internal carbon concentrations (Keenan et al., 2013). Partial stomatal closure increases water retention and increases biomass production per unit of water absorbed by the plant (better WUE). So, less precipitation is required to sustain growth rates. This rate of increase in WUE might be downgraded by various acclimation processes to long term CO 2 increases (Drake et al., 1997): a decrease in Rubisco, a rise in carbohydrate solubility, increases in light use efficiency, or lower rates of dark respiration. These processes are not really integrated in most of the models discussed. Clearly, additional research is required to better understand these processes so that they can be included and accounted for in simulations in both forward and inverse mode. On the one hand, the meta-analysis of Medlyn et al. (1999) showed that current models were adequate to model photosynthesis in elevated CO 2 . On the other hand, (Berninger et al., 2004) concluded that additional processes such as those that control acclimation to CO 2 in the long term as well as other limiting factors (e.g., nitrogen availability) need to be integrated in upcoming model generations to improve the realism of climate reconstructions. A recent study (Li et al., 2014) that CO 2 fertilization over the past 50 years is too small to be distinguished in the ring-width data given ontogenetic trends and interannual variability in climate.
Finally, what is clear is that ecophysiological models of treegrowth still need more development, but that their use instead of oversimplified statistical methods (response functions or transfer functions) has enabled significant progress. It is clear that our understanding of mechanistic dendroecological processes and their integration in deterministic models is a continually evolving and rapidly changing area of research. For that reason, it would be unwise to apply the approach to a wide range of environments and species, without first going through a thorough examination of underlying dendroecological processes and achieving a clear assessment of the model's ability to simulate at least the most fundamental ones.

TOWARD A MORE GENERAL DATA ASSIMILATION SCHEME
Data assimilation sensu stricto has been used extensively in various disciplines (meteorology, oceanography . . . ), where many observational datasets, both in-situ and remotely sensed, are available. A spectacular application can be found in climatology with the so-called climate reanalysis (Kalnay et al., 1996). As ecology cannot call on such massive datasets, more appropriate techniques are those based on Bayesian statistics. Dubinkina et al. (2011) introduced particle filters in paleoclimatological modeling. We propose here to use a similar approach in dendroecology with MAIDENiso.
Models need to state the initial conditions to start the runs. MAIDENiso (Danis et al., 2012) needs several years of daily climate data (minimum and maximum temperature, precipitation). For simplicity, we consider the current and previous years. The scheme of the method is presented in Figure 6. We start the simulations at the year t 0 + 1, using climate at time t 0 + 1 and t 0 . By introducing small perturbations in this 2-year climate vector, we generate an ensemble of N simulations of tree proxies (tree-growth and isotopes) for year t 0 + 1. These simulated proxies are the particles which need to be compared to the observed proxies. A weight w i , proportional to the likelihood of the model state (simulation of the proxies), conditional to the observations, is attributed to them according to their similarities to the observations (in Figure 6 the closest particle has the largest bullet). A posterior probability distribution is thus generated (2): where p(c | d) is the posterior distribution of the climate c given the proxies d, i.e., the probability distribution of the reconstructed climate; k is a normalization constant; H is the measurement model (MAIDENiso simulation); is the observation error; c is the input climate. H(c i ) is the modeled proxies which depend on the climate c i ; the weight w i measures the distance between the i th particle and the observed value. The posterior distribution is resampled, providing M new equal weight particles by propagation using the model. These particles are compared to the observed proxies at year t 0 +2. The procedure is repeated until the end of the observed proxies series. This is the theory, but some difficulties remain to be solved before the method is operational. First the input climate is a large dimension vector. The model needs daily values of three variables, so that the number of possible combinations of the parameters is much larger than the number of particles. In reality, daily values can be derived from a relatively small number of parameters, namely the parameters of the probability distribution of the seasonal temperatures and precipitations. These large dimension vectors can also be reduced by a few principal components. Another problem is the comparison of the simulation H(c i ) to the proxies d. In reality we have here three proxies (bore increment, carbon and oxygen isotopes). They must be combined in Equation 2 and they must be simulated in the same units by the model. For the isotopes, this is the case, but the bore increment-related to the area and the density of the ring in a section, i.e., which is not a mass-must be compared to the annual biomass increment simulated by the model. In a previous section, this was solved by standardizing simulated and observed values. This is not the only solution and maybe not the best one. This scheme of data assimilation must be considered as an avenue of research. To be useful, this approach must be applicable to long series, to provide climate reconstructions over several centuries of the past and projections to the future. As at each step we produce climate reconstructions and these reconstructions are used in the following step, there is a danger of drift. It is not certain that the link between observed and simulated proxies is sufficiently strong to foil the possible drifts. Studies on the stationarity of the process must then be conducted. A more complex but promising approach is given by coupling a climate model like LOVECLIM (Goosse et al., 2010) to a tree-growth model like MAIDENiso (Danis et al., 2012) and by assimilating data to this coupled system. In this case, the observed proxies are used to constrain the climate simulations to make them more realistic. As the climate model is global, this needs a network of proxies and not only a single site, and also a downscaling method (Vrac et al., 2007) to adapt the climate simulations to the dimension of the studied forests.

SOME PERSPECTIVES
In contrast to empirical-statistical models, forward models of tree-growth can explicitly account for non-climatic factors, which reduces the risk of attributing to climate some tree-growth changes that are in fact related to internal processes. This extends their applicability and reduces the risk of violating the uniformitarian principle (which assumes stationarity of the relationships between tree-rings and abiotic factors). These forward models have another advantage: they are excellent tools to simulate proxies in various abiotic conditions and hence to better understand their sensitivity to various factors and evaluate biases and errors associated with reconstructions of past climate. These approaches do not imply that present empirical-statistical methods should be adandoned, but rather that our toolkitsshould be extended to analyze the proxies. The approaches used in dendroclimatology were developed 30 or 40 years ago, and were creative and productive responses to the situation we faced then. We now have a much richer set of records, a more diverse and powerful set of tools for analyzing and using them, and benefit from great advances in understanding tree functioning and climate variations on the long term.

AUTHOR CONTRIBUTIONS
All authors contributed equally to the work with ideas, interpretations, and discussion. Joël Guiot supervised and wrote most of the paper, Etienne Boucher wrote the inversion section and Guillermo Gea-Izquierdo thoroughly revised the initial draft.

ACKNOWLEDGMENTS
Support for Joël Guiot and Guillermo Gea-Izquierdo was provided by the Labex OT-Med (ANR-11-LABEX-0061) funded by the "Investissements dAvenir," French Government program of the French National Research Agency (ANR) through the A * Midex project (ANR-11-IDEX-0001-02). Support for Etienne Boucher was provided by the Fonds de Recherche Quebecois -Nature et Technologies (FRQNT) and the ARCHIVES project through the funding of the National Sciences and Engineering Council (NSERC) and the participation of the Ouranos consortium. This paper is a contribution to the MISTRALS/PALEOMEX program. Fontainebleau data were kindly provided by V. Daux (LSCE, Saclay) and Pinus halepensis data by A. Nicault (ECCOREV, Aix-en-Provence). The english of the manuscript has been much improved by Elizabeth Rowley.