From Ecosystem Observation to Environmental Decision-Making: Model-Data Fusion as an Operational Tool

Supporting a transition to net-zero carbon (C) emissions is a key component of international action to avoid dangerous climate change. Science has outlined potential routes to net-zero, which include using nature-based solutions to grow C sinks and diminish sources linked to land use and land use change. However, decision-makers are challenged by ongoing climate change and the complexity of the biosphere, interacting with socio-economic constraints. Decision-makers need science-based, but easy to use, tools to understand the current and potential future states of the terrestrial C-cycle, and its sensitivity to their decisions. These tools must provide clear uncertainty estimates to help take account of risks, must be flexible enough to be updated as new data become available, and simple enough to be deployed broadly. We argue that model-data fusion approaches, combining the systemic ecological theory embedded in intermediate complexity models with an ever-expanding collection of ecosystem observations from field and remote sensing campaigns, provide the scientific means to address each of these challenges and therefore facilitate management decisions as we face an uncertain future.


INTRODUCTION
Human interference with the global carbon (C) cycle has already led to dangerous climate change.
There are now international efforts to limit the accumulation of atmospheric CO 2 , focussing primarily on the reduction of C emissions from fossil fuels. However, reducing emissions from the land surface and increasing biological stores of C will also play critical roles on our path to netzero C emissions. Determining effective policy and management interventions therefore requires an understanding of the terrestrial biosphere's C balance. In this perspective piece we argue that management of terrestrial ecosystems for net-zero requires tools that address four key challenges; (i) resolving complex behaviours of ecosystems; (ii) maximising information from observations; (iii) characterising model error and uncertainty; and (iv) effective communication of scientific knowledge to decision-makers.
The terrestrial C-cycle is multi-scale, systemic and displays non-linear behaviours. Dynamics arise from ecosystem processes connecting producers (i.e., plants) and consumers (i.e., everything else) in complex food webs. These processes have interactive controls and feedbacks that generate nonlinearities. For instance, the interactions between meteorology, photosynthesis, leaf respiration and leaf traits determine the seasonal cycle of production and its geographic variation. C-cycling operates across time scales ranging from seconds (e.g., molecular processes) to millennia (e.g., C accumulation in peat). This complexity is compounded by the heterogeneity of the terrestrial biosphere, linked to topography, hydrology, geology, disturbance history, and land management.
Effective decision-making relies on being grounded in observations and supported by evidence. Earth observation (EO) provides the means to monitor the entire land surface, with increasingly frequent updates. EO can provide insights into the biological and physical states of the land surface, but field-based monitoring is vital for interpreting EO information, quantifying its biases and generating ecological knowledge. All observations contain errors and uncertainties which are often poorly known, particularly in global products. Furthermore, observations provide an incomplete picture of the terrestrial C-cycle as some key states and processes are not currently observable from space.
Process modelling provides a systemic view of the terrestrial C-cycle, with the means to explore scenarios, to compare the outcomes of alternative decisions. Modelling can generate counterfactuals, the "what-ifs" that are vital for decision-making around land management for net-zero. For example, there is active debate about the potential for afforestation, reforestation, and restoration to act as nature-based solutions. Modelling can help test whether decisions are robust against a backdrop of climate change, and project how poorly observable components such as soil C respond to tree planting. However, process models are highly complicated, often with weakly constrained parameters and lacking uncertainty estimates.
Our fourth and final challenge is the effective communication of scientific understanding to support decision-making. Our knowledge of ecosystem function and future trajectories is uncertain. These uncertainties need to be determined, explained and exchanged so that decision-makers can balance risks and opportunities, to take account of low-probability but extreme events. This information needs to be accessible to non-specialists and targetted to user needs. Solutions need to take account of local biodiversity, soils, disturbance history, and people, who must find ways to coexist with ecosystems in a sustainable way. Co-creation between scientists and end-users is vital to meet this challenge.
Recent developments in observing and modelling have enhanced understanding of the terrestrial C-cycle, however, further efforts are required. Both observational and modelling approaches have deficiencies regarding the challenges laid out above which are explored in the following sections. We then argue that robust coupling of models and observations can begin to solve these challenges, and track progress toward better solutions for decision-making. We discuss how engagement with decision-makers can make science accessible and useful. And we conclude with key recommendations for linking science to decisions.

SCALING AND REPRESENTATION CHALLENGES IN OBSERVATIONS
Over recent decades, concurrent threads of research have transformed our understanding of the terrestrial carbon cycle. The advent of large international field-based observing networks (e.g., FLUXNET, ICOS, GEM, and RAINFOR; Gwynne, 1982;Pastorello et al., 2020;Malhi et al., 2021) has expanded our understanding of the stocks and fluxes of C through terrestrial ecosystems. Additionally, the creation of collaborative repositories has enabled researchers to pool tens of thousands of field observations, revolutionising our understanding of ecosystem traits and their interrelationships (e.g., TRY and GlobAllomeTree database; Henry et al., 2013;Kattge et al., 2020). At the same time, our capacity to monitor ecosystems at global scales has been revolutionised by the rapid expansion of satellite-based remotely sensed EO. EO can uniquely provide observationally informed estimates of ecosystem status and dynamics with global coverage and increasingly, repeated estimates ( Table 1). These data present an opportunity to advance our knowledge of terrestrial ecosystems. However, there remains substantial progress to be made regarding data interpretation and reducing uncertainty and bias in observations. Understanding how best to use these data is critical to providing effective support to decision-making.
Effective use of observations is restricted by measurement error and uncertainty. While field-based observations typically provide the most robust confidence estimates for ecosystem status and function, they tend to represent a small spatial area and/or short time-period. This localisation introduces error and uncertainty when scaling [e.g., via machine learning (ML)] to inform decision-making. Moreover, field-based observing networks are concentrated across temperate latitudes with little spatial and temporal coverage elsewhere, especially tropical ecosystems. For example, the FLUXNET network has just 13 sites across Africa and South America. EO estimates reduce scaling error and uncertainty by providing spatially consistent coverage but require either statistical, ML or process-models to transform the signals observed by satellites to relevant, interpretable ecological metrics. This process introduces a new source of error and uncertainty into estimates. Neither field nor EO data provide a complete picture of ecosystem C dynamics -some processes are not observed (e.g., soil C stocks); some fluxes are net estimates restricting the process knowledge we can infer (e.g., Wang et al., 2020). Constraining the terrestrial C balance therefore requires an integrated view of how stocks and fluxes are connected and correlated to maximise process learning (e.g., Bloom et al., 2020). Furthermore, error and uncertainty associated with observations make it unclear whether estimates of different components of the C-cycle are consistent, e.g., are EO derived estimates of disturbance (e.g., deforestation) consistent with estimates of aboveground These include the biosphere property being estimated, its spatial and temporal resolutions, and the C-cycle process the observation constrains or forces in modeldata fusion. AG, above ground.
biomass (AGB) change generated from a different satellite sensor? Robust error quantification is critical for understanding the utility of observations for decision-making. However, errors are difficult to quantify, frequently underestimated and vary in space and time (Mitchard et al., 2014;Santoro et al., 2021). Zhao et al. (2020) evaluated three EO leaf area index (LAI) products using field estimates of LAI, demonstrating that the EO uncertainty estimates were a 3-5 fold underestimate of their true error. Similarly, the disagreement between independent approaches has been shown to be substantial, often larger than the uncertainty estimated for individual products, e.g., Global GPP (80-170 PgC year −1 ; Shao et al., 2013;Joiner et al., 2018;Jung et al., 2020), EO AGB maps (Mitchard et al., 2014;Avitabile et al., 2016) and LAI (Garrigues et al., 2008;Zhao et al., 2020). Disagreement between LAI products tend to be greatest across the tropics (Garrigues et al., 2008) but there is also a persistent overestimation of seasonality of Boreal needleleaf forests in EO LAI (e.g., Heiskanen et al., 2012). In the case of LAI and AGB, strategies for addressing these challenges are being developed, such as transparent methodologies for validation, determination of uncertainty, and the creation of a robust network of validation sites (Duncanson et al., 2019;Fang et al., 2019).

PARAMETER AND MECHANISTIC UNCERTAINTY IN PROCESS MODELLING
Terrestrial Biosphere Models (TBMs) simulate the whole ecosystem, bridging the gap between observations, using numerical expressions of hypotheses underpinning ecosystem function. TBMs have varied levels of process-representation complexity, often connecting modules for specific processes such as photosynthesis and respiration. While no TBM has an equivalent complexity of real-world ecosystems, their interconnected representation of ecosystem processes provide a means of addressing complex non-linear responses of terrestrial ecosystems. The hypotheses implemented in TBMs are typically derived from analyses of field observations (e.g., Huntingford et al., 2017) and then calibrated at individual sites (e.g., Blyth et al., 2011) or as the average from trait databases (e.g., Harper et al., 2016). From these calibrations a small number of plant functional types (PFT) are typically defined and then applied globally.
TBMs have been the focus of substantial research and development over decades, providing useful insights into ecosystem function and response to changes in their environment, but have tended to become increasingly complicated to include ever more processes and complex process representations. TBMs are used to assess the response to land use and land cover change, and the associated climate change as defined by the shared socio-economic pathways (SSPs, van Vuuren et al., 2017), forming a key component of information feeding into international frameworks such as the IPCC (2021). However, there remains large divergence between Earth System Models (ESMs, within which TBMs represent the biosphere) and observation-orientated estimates of current terrestrial C stocks  and between ESM predictions of the trajectory of the terrestrial C-cycle (Eyring et al., 2016;Exbrayat et al., 2018b;Arora et al., 2020).
The current TBM paradigm faces several obstacles toward effectively supporting decision-making toward net-zero. The process representations within TBMs have become progressively more complex (or complicated), with an increasing number of parameters that are often weakly calibrated at a limited number of sites (e.g., Blyth et al., 2011). Furthermore, model uncertainty and error are not well characterised. For example, PFTs neglect the fact that traits vary in space within a given PFT (Butler et al., 2017;Exbrayat et al., 2018a) resulting in parameters which do not represent real-world variability (Reich et al., 2014;Harper et al., 2016). Relatedly, TBMs typically implement land use/cover change by varying the relative amount of co-existing "tiles" of different PFTs simulated in the same location. Additionally, TBM calibrations widely assume that the terrestrial ecosystems are initially in steady state and are only varying in response to climate change. Therefore, these analyses do not typically include direct management impacts such as residue creation from forest felling, or subsequent regrowth dynamics. A coalition of large forest plot databases has started providing uncertainty-bounded observation constraints of postdisturbance C accumulation rates (Cook-Patton et al., 2020). However, simulating the C dynamics of regrowing forests under varied policy-relevant management regimes and future climates, with biologically plausible parameters and model structures, remains an outstanding target (Braakhekke et al., 2019;Pugh et al., 2019;Shiklomanov et al., 2020).

SUPPORTING DECISION-MAKERS EFFECTIVELY
By 2100 global climate is likely to have changed in ways not seen by life on Earth in hundreds of thousands of years (IPCC, 2021). Furthermore, with the Glasgow Climate Pact (GCP) requiring nations to return with new nationally determined contributions by the end of 2022 and the upcoming global stocktake (UNFCCC, 2021), we expect that there will only be an increasing focus on effective change detection capacity for ecosystem C stocks. The GCP necessitates tools that provide system-level syntheses of C data that can be updated as and when new observations are available (e.g., Quegan et al., 2019). Furthermore, to support adaptation, policy makers and land managers need to understand the impact of management interventions on the resilience of existing and regenerating forests under environmental conditions with no real-world analogues (Hurlbert et al., 2019). TBMs have been and continue to be vital to addressing these challenges, through simulations of the full C-cycle. However, alongside the global and system-level insight provided by TBMs, systemlevel uncertainties will be essential for managing risk (e.g., low probability, high impact scenarios), and identifying key gaps in observations that can reduce model uncertainty and improve future decision-making (Hurlbert et al., 2019). Support for decision-makers would be advanced by giving the models to decision-makers to explore; allowing direct access to scenario investigation. Supporting decision-makers to be model users gives them agency over exploring outcomes. But existing models are largely too complicated for non-specialists to use. For effective engagement, models and their user interfaces need to be simple enough to use and allow rapid scenario investigation, exploring for example management decisions and climate interactions under model uncertainty. The degree of translation from model output to user relevant information will also vary (e.g., policy makers verses foresters). Decisionmakers will also question models whose initial conditions (e.g., woody C stocks) do not match their experience of reality. Models calibrated and initialised to local conditions have greater relevance and validity for regional decision-makers. Scientists can support decision-makers by co-developing model tools. Collaboration is key to ensuring that the information provided to decision-makers is the best available but also delivered in a format which is understandable and relevant to their specific circumstances.

MODEL-DATA FUSION AS A SOLUTION
We argue that Model-Data Fusion (MDF), or data assimilation approaches provide an opportunity to address the challenges we have outlined by exploiting the information available in a diverse range of observations to inform and improve model parameterisations and process representations, while accounting for uncertainty in observational constraints (Figure 1). MDF can be viewed as theoretically informed meta-analyses, capable of synthesising multiple data-streams that constrain disparate aspects of ecosystem through a Bayesian framework. MDF provides uncertainty-bounded calibrated model analyses, which can be used to improve our understanding of ecosystem functioning and make predictions of their likely response to changes in climate, management and disturbance offering invaluable information to policy makers, individual farmers and foresters alike.
MDF approaches which retrieve parameter information have gained increasing traction over recent years, spanning TBMs of radically different process complexities and exploiting different algorithms (e.g., Williams et al., 2005;Fox et al., 2009;Williams et al., 2009;Kuppel et al., 2012;Keenan et al., 2013;Caen et al., 2021;Famiglietti et al., 2021). These and other studies have identified strategies for maximising the value of MDF. For example, quantifying the information content of different types of observations and how they impact TBM predictive capacity has been widely assessed (Kuppel et al., 2012;Keenan et al., 2013;Smallman et al., 2017;Famiglietti et al., 2021). Predictive skill has been shown to increase with model complexity only when sufficiently informed by observations (Smallman et al., 2017;Famiglietti et al., 2021), i.e., we must match model structure and complexity to the data available for constraint. These studies support using simpler C models supported by available observations. Simpler models are also more accessible and thus meet the challenge for expanding the mode-user community and enhancing co-creation with decision-makers. The carbon data model framework (CARDAMOM) is an example of a MDF framework which aims to explore and address the challenges we have outlined. CARDAMOM uses a Bayesian approach within an Adaptive Proposal -Markov Chain Monte Carlo (AP-MCMC, Haario et al., 2001) to estimate the likelihood of parameters as a function of observations, observation uncertainty, ecological theory, local conditions (e.g., meteorology) and model structure (Bloom et al., 2016). The resultant CARDAMOM analysis provides pixel-level, i.e., local, ensembles of parameters for the DALEC suite of intermediate complexity terrestrial ecosystem models (Famiglietti et al., 2021) that can be combined to inform national C balance and climate sensitivity . Ecological and dynamical constraints ensure that accepted parameter combinations and their resultant C stock ratios and dynamics are consistent with ecological theory (for details see Bloom and Williams, 2015).
By retrieving ensembles of location-specific parameters, CARDAMON can quantify the spatial variability of model parameters (i.e. ecosystem traits). Modelling can then FIGURE 1 | Schematic highlighting the strengths (+) and weaknesses (−) of observations and models and how these can be reduced or accounted for to maximise information content through model-data fusion approaches.
directly estimate how the magnitude and spatial variability of uncertainty associated with model (DALEC) parameters affects predictions of C stocks and fluxes. Using this information, we can determine which parts of the terrestrial ecosystem are the least constrained, identify opportunities for new observations to improve constraint , and how uncertainty is propagated into predictions. The DALEC model has an appropriate level of complexity relative to typically available calibration data but is simple enough to use for non-specialists to explore its behaviours and use operationally.
The carbon data model framework is not alone in adopting approaches which aim to maximise the benefits of combining observations with TBMs (e.g., Pinnington et al., 2020;Huang et al., 2021). Each approach has its own advantages and disadvantages which should be considered to ensure that the right analysis framework is used to address a given ecological question. CARDAMOM and similar frameworks are well placed to accelerate progress toward addressing the challenges outlined in the previous sections by building on our existing capacity to provide uncertainty bounded C-cycle analyses using observations of forest regrowth (Smallman et al., 2017), to impose fire (Exbrayat et al., 2018a;Yin et al., 2020) and deforestation/degradation, and simulate ecosystem responses to SSP scenarios within multiple model structures and embedded ecological realism .

RECOMMENDATIONS FOR RESEARCHERS TO SUPPORT DECISION-MAKING
We have identified key challenges in both observations and models of the terrestrial C-cycle which must be overcome to effectively support climate mitigation policies and land management. We have argued that by fusing models and observations within numerical frameworks we can maximise the benefits of both approaches while minimising the limitations. From our discussion above we propose four key guidelines for effectively combining observations and models to support decision-makers, and to accelerate the process of model-data integration: (1) Use simpler models. Increased model complexity only improves predictive capacity when supported by sufficient calibration data. For much of the C-cycle these data are missing. More complex/complicated models are also harder to explain and understand. Simpler models are quicker to build, evaluate, use or discard.
(2) Improve observational error estimates. Observations are expanding in their spatial and temporal coverages but contain bias and precision error -approaches must be developed to evaluate and then ameliorate these errors. (3) Calibrate simpler models (i.e., 1) using observations and their errors (i.e., 2) and give decision-makers access to these models. MDF combines models with a diverse array of observations weighted by their uncertainties. Bayesian approaches allow analyses to be updated when new observations become available. (4) Provide uncertainty estimates on all predictions.
Uncertainties allow decision-makers to balance unlikely but high impact outcomes of a changing climate and human interventions. Prediction errors allow decisionmakers to weigh the strength of the scientific advice against other factors.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

AUTHOR CONTRIBUTIONS
All authors contributed equally to the conception, wrote the manuscript, and approved the submitted version.