Ocean Biogeochemical Predictions—Initialization and Limits of Predictability

Predictions of ocean biogeochemistry, such as primary productivity and CO2 uptake, would help to understand the changing marine environment and the global climate. There is an emerging number of studies where initialization of ocean physics has led to successful predictions of ocean biogeochemistry. It is, however, unclear how much these predictions could be improved by also assimilating biogeochemical data to reduce uncertainties of the initial conditions. Further, the mechanisms that lead to biogeochemical predictability are poorly understood. Here we perform a suite of idealized twin experiments with an Earth System Model (ESM) with the aim to (i) investigate the role of biogeochemical tracers' initial conditions on their predictability, and (ii) understand the physical processes that give rise to, or limit, predictability of ocean carbon uptake and export production. Our results suggest that initialization of the biogeochemical state does not significantly improve interannual-to-decadal predictions, which we relate to the strong control ocean physics exerts on the biogeochemical variability on these time scales. The predictability of ocean carbon uptake generally agrees well with the predictability of the mixed layer depth (MLD), suggesting that the predictable signal comes from the exchange of dissolved inorganic carbon (DIC) with deep-waters. The longest predictability is found in winter in at high latitudes, as for sea surface temperature and salinity, but the predictability of the MLD and carbon exchange is lower as it is more directly influenced by the atmospheric variability, e.g., the wind. The predictability of the annual mean export production is, on the contrary, nearly non-existing at high latitudes, despite the strong predictive skill for annual mean nutrient concentrations in these regions. This is related to the low predictability of the physical state of the summer surface ocean. Due to the shallow mixed layer it is decoupled from the ocean below and therefore strongly influenced by the chaotic atmosphere. Our results show that future studies need to target the predictability of the mixed layer to get a better understanding of the real-world predictability of ocean biogeochemistry.

Predictions of ocean biogeochemistry, such as primary productivity and CO 2 uptake, would help to understand the changing marine environment and the global climate. There is an emerging number of studies where initialization of ocean physics has led to successful predictions of ocean biogeochemistry. It is, however, unclear how much these predictions could be improved by also assimilating biogeochemical data to reduce uncertainties of the initial conditions. Further, the mechanisms that lead to biogeochemical predictability are poorly understood. Here we perform a suite of idealized twin experiments with an Earth System Model (ESM) with the aim to (i) investigate the role of biogeochemical tracers' initial conditions on their predictability, and (ii) understand the physical processes that give rise to, or limit, predictability of ocean carbon uptake and export production. Our results suggest that initialization of the biogeochemical state does not significantly improve interannual-to-decadal predictions, which we relate to the strong control ocean physics exerts on the biogeochemical variability on these time scales. The predictability of ocean carbon uptake generally agrees well with the predictability of the mixed layer depth (MLD), suggesting that the predictable signal comes from the exchange of dissolved inorganic carbon (DIC) with deep-waters. The longest predictability is found in winter in at high latitudes, as for sea surface temperature and salinity, but the predictability of the MLD and carbon exchange is lower as it is more directly influenced by the atmospheric variability, e.g., the wind. The predictability of the annual mean export production is, on the contrary, nearly non-existing at high latitudes, despite the strong predictive skill for annual mean nutrient concentrations in these regions. This is related to the low predictability of the physical state of the summer surface ocean. Due to the shallow mixed layer it is decoupled from the ocean below and therefore strongly influenced by the chaotic atmosphere. Our results show that future studies need to target the predictability of the mixed layer to get a better understanding of the real-world predictability of ocean biogeochemistry. Keywords: biogeochemical, predictions, interannual, decadal, initial conditions, predictability, export production, air-sea CO 2 exchange 1. INTRODUCTION The climate system, and the ocean physical state in particular, have been shown to be predictable several years in advance (Boer, 2004;Pohlmann et al., 2004;Smith et al., 2007;Keenlyside et al., 2008;Langehaug et al., 2017), especially at high latitudes where deep mixed layers connect the deep ocean with surface layers and the atmosphere (Boer, 2004). In the wake of these advances in near-term climate prediction, the predictability of ocean biogeochemistry is now being explored. Predictions of interannual-to-decadal variations of the ocean biogeochemical state would be useful for understanding ongoing changes in marine ecology (Séférian et al., 2014;Payne et al., 2017;Park et al., 2019), ocean pH and CO 2 uptake (Lovenduski et al., 2019), and in particular for delineating natural variations from anthropogenically forced change related to CO 2 emissions, pollution, eutrophication etc. Experiments where models' capabilities to predict reanalyses have been assessed suggest that there is potential for predicting air-sea CO 2 exchange up to several years in advance (Li et al., 2016;Séférian et al., 2018;Lovenduski et al., 2019), in particular at high latitudes where the variability of air-sea CO 2 exchange exhibits lower frequency fluctuations (Li et al., 2016;Séférian et al., 2018). Successful attempts have also been made to retroperspectively predict observations of ocean carbon uptake and primary productivity on interannual-to-decadal time scales by only initializing physical fields (Séférian et al., 2014;Li et al., 2016Li et al., , 2019Park et al., 2019). For example, Li et al. (2016) were able to predict the observed pCO 2 in the North Atlantic Subpolar Gyre (SPG NA ) up to 3 years in advance by nudging the ocean and atmospheric models to observations. Séférian et al. (2014) found that tropical Pacific primary production could be predicted up to three years in advance when starting from physical fields nudged to the observed sea surface temperature (SST). Finally, Park et al. (2019) found a predictability of chlorophyll concentrations of up to 1 year in the tropics, and 2 years in many parts of the subtropical gyres, when starting from ocean reanalyses. In all these examples, no biogeochemical observations were used to initialize the predictions. Both Li et al. (2016) and Séférian et al. (2014) suggested that the predictability might be improved by also assimilating biogeochemical data to optimize the initial conditions, in line with e.g., Brasseur et al. (2008) and Park et al. (2018).
Due to the many difficulties associated with integrating biogeochemistry in ocean reanalyses (Park et al., 2019 and references therein), initialization of forecasts and predictions from biogeochemical observations is still under development. It has been tested on shorter time scales (days to months) and has been implemented in some operational forecast systems (Popova et al., 2002;Ciavatta et al., 2011;Fontana et al., 2013;Teruzzi et al., 2014Teruzzi et al., , 2018Gehlen et al., 2015;Simon et al., 2015;Rousseaux and Gregg, 2017;Skákala et al., 2018). On interannual-to-decadal time scales, however, the benefits that could be gained from initialized biogeochemistry have yet to be explored.
The aim of the present study is to understand whether initialization of biogeochemical state is needed for skillful biogeochemical predictions, and the mechanisms behind the predictability of ocean biogeochemistry on interannual-todecadal time scales, with focus on export production and air-sea CO 2 exchange. To this end, we perform a suite of experiments in a perfect model framework, using the Norwegian Earth System Model (NorESM, Bentsen et al., 2013). First, we investigate the influence of errors in the biogeochemical initial state on predictions, to understand to what extent such predictions can be improved by initializing the biogeochemistry from observations. Second, we analyze differences in perfect predictability of the ocean physics vs. biogeochemistry to understand the factors that underlie, and factors that limit, the predictability of air-sea CO 2 exchange and export production.

Model and Simulations
The NorESM model  is a fully coupled system representing ocean dynamics (MICOM), ocean sea ice (CICE), ocean biogeochemistry (HAMOCC), atmospheric dynamics (CAM4-OSLO) including atmospheric chemistry, and land processes (CLM4) including land ice sheets and river runoff. It has been shown to reproduce the major modes of climatic variability , and has an especially good representation of ENSO variability and its teleconnections (Sperber et al., 2013;Bellenger et al., 2014;Wang et al., 2019). The ocean biogeochemical compartment in NorESM, HAMOCC, is an NPZD model with one generic class of phytoplankton, one generic class of zooplankton, dissolved and particulate organic matter, the nutrients phosphate, nitrate, silicate, dissolved iron, as well as dissolved inorganic carbon, alkalinity and oxygen. The primary production is prognostically computed as a function of phytoplankton growth rate, which is limited by incoming shortwave radiation, temperature, and availability of nutrients (phosphate, nitrate, and dissolved iron). A fixed fraction of phytoplankton and zooplankton biomass are converted into particulate organic matter, through mortality and grazing, and exported below the euphotic layer (100 m). The particulate organic matter has a constant sinking speed and remineralization rate in the ocean interior. The air-sea CO 2 fluxes are determined as the product of the air-sea CO 2 partial pressure difference, the CO 2 solubility, and the gas transfer rate, which is determined from wind speed using the parameterization by Wanninkhof (1992). The performance of the mean state of HAMOCC within NorESM was evaluated by Tjiputra et al. (2013). Perhaps more important for the present application, the variability is more difficult to evaluate due to the lack of long-term observations of ocean carbon data. But the model has been shown to perform reasonably well for ENSO-driven variability of air-sea CO 2 fluxes in the Tropical Pacific (Jin et al., 2019). The representation of the interannual variability in primary production in the Tropics has been demonstrated to be in the top three among 18 ESM's (Anav et al., 2013). Away from the Tropics, the variability of the CO 2 fluxes simulated by HAMOCC in a forced ocean configuration has been shown to compare reasonable well with observations from the Surface Ocean CO 2 Atlas in the North Atlantic (Tjiputra et al., 2012). Further, an earlier version of NorESM has been shown to be comparable to other ESM's in the representation of the effect of the North Atlantic Oscillation on carbon fluxes (Keller et al., 2012).
The present work is a first step to use HAMOCC within the framework of Norwegian Climate Prediction Model (NorCPM Counillon et al., 2014, which merges NorESM with an Ensemble Kalman Filter data-assimilation scheme, to study ocean biogeochemical predictability. The aim is to understand the benefits that could be gained from initialization of the biogeochemical state, and the mechanisms underlying biogeochemical predictability by the use of perfect model experiments. To this end, we have performed two sets of prediction experiments with ten members each, where member one is the Truth and members 2-10 are the predictions ( Table 1). These experiments are initialized from a 10 member ensemble (Hist-REF), which has been integrated from 1850 to 2005 with historical forcing following the Coupled Model Intercomparison Project Phase 5 (CMIP5) protocol (Taylor et al., 2012). Each member of Hist-REF was started from different initial conditions sampled from a stable pre-industrial control run, and was run with interactive atmospheric CO 2 . In our first prediction experiment, referred to as BGC pert , we investigate the influence of perturbations in the ocean biogeochemical initial state on interannual-to-decadal predictions. The purpose is to understand the extent to which assimilation of biogeochemical observations can improve the interannual-to decadal predictability of ocean biogeochemistry. It consists of 22 10-year long simulations, starting on the first of January every fifth year between 1890 and 1995 (Figure 1). The ocean biogeochemical initial conditions for each member 1-10 in BGC pert are taken from restarts of the corresponding member in Hist-REF, while the physical initial conditions, including also the land carbon compartment, of all members are taken from the restarts of member 1 in Hist-REF for each starting year in question. Thus, all members are starting with identical physical initial conditions of the ocean, atmosphere, land, and sea ice, while the ocean biogeochemical initial conditions differ. By using restarts from different members of the historical run for perturbing the initial conditions, we ensure that the perturbation is of the same order of magnitude as the internal (natural) variability of the model (see Figures S1, S2). Note that MICOM is an isopycnal coordinate model that uses potential density as vertical coordinates. For each isopycnal layer the mass of a biogeochemical tracer is the product of the layer thickness with the tracer concentration. In this experiment, only the concentration is perturbed while the layer thickness is the same in all members. A mismatch in the vertical grid size between the physical and biogeochemical restarts could therefore introduce an additional small perturbation in the total tracer mass. To prevent any feedback from the ocean biogeochemistry on the physical state, and to ensure that all members in BGC pert have the same physical variability, the simulations are carried out with a prescribed atmospheric CO 2 that follows a similar evolution as the one in Hist-REF.
In BGC pert we have a perfect knowledge of the initial state of the physics, leading to perfect predictions of the physical state when using a perfect model without biogeochemical feedbacks. This is never the case when doing real predictions because only a part of the ocean is observed and observations are imperfect. To investigate the extent to which the predictability of ocean biogeochemistry is limited by the predictability of physical drivers, we perform a second prediction experiment, PHY pert . Here, both the physical and biogeochemical initial conditions for the Truth and the predictions are taken from member one in Hist-REF, but for each of the prediction members a random perturbation on the order of 10 −10• C is added to the SST. Consequently, all members in the PHY pert experiment have identical biogeochemical initial conditions, but physical initial conditions that are slightly different. This nanoscopic perturbation will grow rapidly with the chaotic mode of the Earth system, causing the physical state of the predictions to diverge from that of the Truth. A wide range of SST perturbations have been applied in previous perfect model experiments; from 10 −14• C (Lovenduski et al., 2019) to 10 −2• C (Koenigk et al., 2012). However, the system is insensitive to the exact size of the initial perturbations as they grow rapidly in the chaotic system (e.g., Lorenz, 1963Lorenz, , 1969. This is illustrated in Figure S3 showing the evolution of the global mean ensemble spread of SST during one of our hindcasts in the PHY pert experiment. One month after the initialization the ensemble spread is already O(0.01 • C), and after two months it is O(0.1 • C). This shows that the size of the initial perturbation does not matter (as long as it smaller than the typical internal variability) for the timescales we consider in this study.
We also performed a third experiment, PHY-BGC pert , where both the physical and biogeochemical initial conditions were perturbed. The aim of that experiment was to investigate if non-linear interactions between the physical and biogeochemical perturbations can influence the predictability. The perturbations of the biogeochemical tracers were created as in the BGC pert experiment, while the physical conditions were perturbed indirectly by running the experiments with an interactive atmospheric CO 2 . When doing so, the perturbations in the biogeochemical initial conditions translate into the atmospheric CO 2 , which affects the radiation balance and consequently the SST. This way of perturbing the initial physical conditions gave similar results for the predictability as in the PHY pert experiment, where the SST was directly perturbed (not shown). This suggests that running with an interactive CO 2 has little implication for interannual-to-decadal prediction.

Data Analysis
For the analysis we focus on the predictability of air-sea CO 2 exchange and export production, and its relations to the predictability of the ocean physics. Export production is defined as the flux of particulate organic carbon (POC) across 100 m depth. The predictive skill is quantified with anomaly correlation coefficients, (ACC, Becker et al., 2014) between the Truth and the ensemble mean of the 9 member predictions. Before the ACC was calculated, the data from the predictions were detrended in each grid cell in order to remove the main signal of the external forcing (the increase in atmospheric CO 2 due to anthropogenic emissions). This was carried out by first, in each grid cell, fitting a second-order polynomial to the time series of each variable from the ensemble mean of Hist-REF (this is illustrated for SST in Figure S4). We chose a second-order polynomial because the increase in atmospheric CO 2 concentration in the simulations follows a 2nd-order trend. Thereafter, to remove the anthropogenic signal from the 10-year long predictions and the Truth, the corresponding 10-year period of the 2nd-order trend was subtracted. After this detrending, the ACC was calculated for each lead year (τ ) and grid cell (s) in the model, according to (Equation 1): Where p is the prediction number and P(=22) is the total number of predictions (Figure 1), V' is the predicted value (anomaly), and O' is the truth (anomaly). We also calculated spatially averaged ACC's following (Becker et al., 2014): where w(s) is a local area weight (the area of the grid cell s) and W is the global sum of the area weights. When calculating the global ACC, W is the global ocean surface area. We used a bootstrapping approach with random sampling over the starting years to compute 95% confidence intervals for the ACC's. For the spatially averaged ACCs (Equation 2) we used 1,000 bootstraps, and for the ACCs in each grid cell (Equation 1) we used 500 bootstraps due to the higher computational cost. We consider any correlation below 0.2 as no skill, i.e., having no practical value, as it only explains 4% of the variance.
The skill of the predictions is benchmarked against the skill of a persistence forecast and an uninitialized forecast. The latter quantifies the prediction skill of the remaining external forcing that is not removed with the second-order trend (e.g., volcanic forcing), and is calculated as the ACC between member one and the ensemble mean of members 2-10 in Hist-REF for the same 10-year intervals as used in the predictions (Figure 1). The persistence forecast is calculated by using the last available observation as a forecast. In our case, this is done by using output from member 1 in the historical run. For each 10-year forecast the annual mean of the year before the initialization is used as a forecast for the next 10 years; this corresponds to the autocorrelation of the quantity.
To understand the physical mechanisms behind biogeochemical predictability, we compare it to that of some key physical variables in the model. These are SST, MLD (mixed layer depth) and sea ice concentration, all of which all are important for air-sea CO 2 exchange and export (primary) production. We also compare with the predictability of sea surface height (SSH), which is a good indicator of ocean dynamics (e.g., gyre strength) and advective fluxes. Finally, we compare to the predictability of an atmospheric variable, down-welling shortwave radiation (DSW), which is important for export production and highly dependent on cloud cover. In HAMOCC the photosynthetic available radiation is assumed to be 40% of the DSW, and would therefore show the same predictability as the DSW.
For the comparison of biogeochemical vs. physical predictability, we analyse the ACC for annual and seasonal means (JFM, AMJ, JAS, OND). The seasonal means were calculated from monthly mean outputs. Unfortunately, outputs of monthly means were not saved for DSW and nutrients in Hist-REF nor in the prediction experiments, prohibiting analyses of the seasonal predictability of these variables. For the analyses we divide the ocean into three areas with different physical characteristics broadly following the biome definitions suggested by Sarmiento et al. (2004); the low latitudes oceans (LL) that are permanently stratified, 35 • S to 35 • N, the temperate oceans that are seasonally stratified, including temperate North Pacific (tNP: 35 • N-Bering Strait), temperate North Atlantic (tNA: 35-62 • N), and temperate Southern Ocean (tSO: 35-57 • S), and ice covered oceans, including the Arctic Ocean (AO) and ice-covered Southern Ocean (iSO). Using this definition, there are still areas in the temperate oceans that were temporary ice-covered during the early period of study. To ensure that we do not include any effect of the sea ice on the predictability in the temperate oceans, we have masked out areas that at any year during the historical period had an annual mean ice concentration of more than 10%.

Impact of Perturbed BGC Initial Conditions
We start by investigating the influence of errors in the biogeochemical initial conditions on the predictability of CO 2 flux and export production. The red dots in Figure 2 show the area-weighted global mean ACC between the Truth and the predictions in the BGC pert experiment. Lead year one is shown separately, while the longer lead times have been binned into 3-year intervals to reduce noise (this only made a difference in the PHY pert experiment). The correlation is close to one for both export production and air sea CO 2 flux across all lead years, but somewhat lower for lead year one, showing that the evolution of the Truth and the predictions from the initial state is nearly identical. Since the predictions and the Truth were initialized with identical physical states, but different biogeochemical ones, these results show that the variability of the different members are dominated by the physics. Figure 3 shows that there is also a very strong correlation regionally, and that the effect of the perturbations in the biogeochemical initial conditions disappear after lead year one. For the airsea CO 2 flux, the lower correlation in lead year one is mainly associated with areas of high interannual variability (Figure 3), such as the subpolar North Atlantic, the Southern Ocean and the Tropical Pacific. For export production, the areas of low correlation in the first lead year are related to the lowproductive subtropical gyres, where phytoplankton growth is strongly nutrient limited due to the permanent stratification. The impact of perturbations in the biogeochemical initial conditions on the predictability of other biogeochemical quantities (e.g., nutrients and primary production) show a similar behavior (not shown).  The approach we adopted for perturbing the biogeochemistry gives perturbations that are of similar size as the natural variability of the system. This means that the perturbations are larger in areas with large interannual variability, such as the surface North Atlantic and the Southern Ocean, than in areas with lower interannual variability, such as the deep ocean and the subtropical gyres. The idea behind this strategy is to mimic the uncertainty that arise when a model is not nudged to observations, i.e., that the model state might be out of phase of the actual state of the natural system. Our results clearly show that these uncertainties in the biogeochemical initial conditions become negligible beyond the first lead year, and that the physically-driven variability dominates in the subsequent years. This suggests that a good knowledge of the physical conditions is sufficient for robust ocean biogeochemical predictions, and that assimilation of biogeochemical observations for creation of initial conditions only have the potential to add marginal skill on interannualto-decadal predictions, at least as long as the biogeochemical fields are initialized with a reasonable climatology. It is, however, important to note that biogeochemical models are highly simplified representations of the real world, especially the ones currently used in Earth system and climate prediction models where the structure and number of tracers have to be kept relatively simple to reduce the computational cost. If increasing the complexity by for instance including more processes, interactions, functional groups and higher trophic levels that could give effects spanning over several years, or including regionally varying parameters as in Tjiputra et al. (2007) and Gharamti et al. (2017), the role of the biogeochemical initial conditions for interannual-to-decadal predictions might be of higher importance. Further, it is possible that biogeochemical models systematically underestimate biogeochemical variability (DeVries et al., 2019), meaning that the perturbations imposed here, might be lower than in the real world.
The drop in correlation in the first lead year indicates that initialization of biogeochemistry likely enhances the predictive skill on seasonal timescales.

Impact of Perturbed Physical Initial Conditions
Before investigating the physical mechanisms behind biogeochemical predictability, we will compare the predictability achieved with our experiments with the predictability presented in previously published works as an evaluation of the predictive skill in our model system.
The blue dots in Figure 2 show the global ACC between the Truth and the predictions for export production and air-sea exchange in the PHY pert experiment, indicating the predictability when having perfect knowledge of the biogeochemical initial conditions, and starting from a slightly perturbed physical state. For export production, the correlation for lead year one is lower than that of BGC pert suggesting that even on seasonal time scales the importance of the physical processes is larger than that of biogeochemical processes for skillfull prediction. PHY pert show significant skill for the physics as its state is initialized with near perfect ocean and atmospheric initial condition. While the prediction skill of the atmosphere decays relatively quickly, the skill of the ocean is larger. Until lead year 2-4, the correlation exceeds the >0.2 threshold and is significantly different from the uninitialized run, indicating a predictability of a few years for both export production and air-sea CO 2 exchange. For global air-sea CO 2 exchange, the perfect predictability lies within the predictability range of 2 years presented by Lovenduski et al. (2019) and 4-6 years (computed with two different methods) by Séférian et al. (2018). The latter calculated the predictability of both the land and ocean carbon uptake, but showed that most of the long-term predictability resides in the ocean. Figures 4, 5 show ACC maps of the annual mean export production and air-sea CO 2 exchange between the Truth and the mean of the predictions in the PHY pert experiment. Similar maps for seasonal means and other biogeochemical fields, are shown in Figures S18-S26. In lead year one there is a strong (> 0.5) correlation, indicating predictability almost everywhere in the global ocean for the air-sea CO 2 exchange. Exceptions are the poleward halves of the subtropical gyres and the ice-covered Arctic Ocean. The pattern looks similar for the export production, with the exception of a low predictability in the Northern Hemisphere subpolar oceans. As the predictions are initialized on the first of January, the better predictability of the Southern Hemisphere export production in lead year one is expected. The reason for why there is essentially no predictability of export production at northern latitudes in lead year 1 will be explored in section 3.3.
For longer lead times, the predictability of the CO 2 exchange quickly degrades, except for in a few areas. In the tropical and subtropical oceans, in particular in the Pacific, there is predictability until lead years 2-4 for air-sea CO 2 exchange. The longest predictability is observed in the SPG NA , where there is a relatively strong correlation until lead year 5-7. This agrees well with Li et al. (2016), who found potential predictability of the winter CO 2 exchange of up to 4-7 years in the western SPG NA . Apart from the SPG NA , the predictability of air-sea CO 2 exchange quickly degrades in the high latitude oceans, including the Southern Ocean. Here, a coherent pattern of predictability can only be seen until lead year 2-4. The spatially averaged ACC over the Southern Ocean indicates a predictability of 1 year ( Table 2). This short predictability agrees fairly well with Lovenduski et al. (2019) who found a significant predictability of up to two years in the Southern Ocean, but is lower than Séférian et al. (2018) who found a predictability of 4-6 years. The difference in predictability found here and by Lovenduski et al. (2019) compared to Séférian et al. (2018) could be related to the differences in the model and simulation setup. It should be noted that while Séférian et al. (2018) made their predictions under pre-industrial forcing, the predictions in the current study and in Lovenduski et al. (2019) were carried out using historical forcing. It is possible that the biogeochemistry becomes less predictable under present forcing. Another reason could be differences in the implementation and parameterization of primary production, which has been shown to be important for regulating the air-sea CO 2 fluxes in the Southern Ocean (Kessler and Tjiputra, 2016). This will be further discussed in the following sections.
For export production (Figure 4), the predictability stays significant in the low latitudes until lead year 2-4, in agreement with Séférian et al. (2014), who found a predictability of NPP (Net Primary Production) of up to 3 years in the tropical Pacific. Due to their strong linkage, the predictability of primary production and phytoplankton concentration is similar to that of export production in our model (Figures S22, S23). As with the air-sea CO 2 exchange, the predictability of the export production quickly degrades in high latitudes in PHY pert , but in contrast, almost no predictability is evident in the SPG NA .
The good agreement of the predictability achieved in our perfect model experiment with perturbed SST with the predictability found in other studies with different models, suggest that our model system has a reasonable representation of the mechanisms giving rise to predictability.
The relatively short predictability of air-sea CO 2 exchange (except in the SPG NA ) and especially export production in the high latitudes, as shown in Figures 4, 5, is rather unexpected. Due to the strong control that ocean physics exerts on ocean biogeochemistry, it has been hypothesized that the long predictability that has been shown to reside in the ocean, in particular for SST and sea surface salinity (SSS) in the Southern Ocean and the North Atlantic (Pohlmann et al., 2004;Zhang et al., 2017;Buckley et al., 2019 and Figures S27-S29, S33), should result in a long predictability for ocean biogeochemistry (e.g., Li et al., 2016;Séférian et al., 2018;Lovenduski et al., 2019). The reason for this lack of predictability will be explored in section 3.3.

Sources and Limits of Predictability
To understand the physical factors that give rise to or limit the predictability of export production and air-sea CO 2 exchange, we will compare it to the predictability of various physical variables. The idea behind this is that the least predictable physical variable (among those known for having a considerable direct, or indirect, impact on biogeochemical variability) should set the predictability horizon of the ocean biogeochemistry. In Figures 2, 6-8 we have therefore plotted the regionally averaged ACC for SSH, SST, MLD, sea ice, and DSW, together with the ACC for airsea CO 2 exchange, for annual and seasonal means. The predictability of export production aligns closely with that of the summer air-sea CO 2 exchange and is therefore shown in the supplementary material (Figures S7-S9). Correlation maps showing the skill for physical properties are shown in Figures S27-S39.
For the global annual means (Figure 2), we note that SSH has the overall highest predictability, followed by SST, MLD and sea ice, while DSW has the lowest. This is expected since the atmosphere has a shorter memory than the ocean (Koenigk et al., 2012;Roberts et al., 2016) and the DSW is influenced by the cloud cover, which is highly unpredictable. Variations in SSH are, on interannual timescales, a measure of  Frontiers in Marine Science | www.frontiersin.org dynamically induced density and ocean circulation changes. In regions that are less stratified SSH represents a column-integrated signal, while in regions where the 1.5 layer approximation holds (i.e., an active less dense layer over a much thicker and denser inactive layer) SSH variations closely relate to upper ocean (density) changes (Wyrtki and Kendall, 1967;Rebert et al., 1985). This relation is strongest in the tropics where the density contrast between the two layers is the greatest. SST and MLD by definition represent surface processes, which away from the tropics are more sensitive to the atmospheric variability than SSH, and therefore are less predictable. The MLD is in general less predictable than the SST (also seen in Figures S27, S30), with the largest difference in the Southern Ocean.
TABLE 2 | Predictability horizon in years for annual mean air-sea CO 2 exchange, export production, nitrate, phosphate, sea surface height, sea surface salinity, sea surface temperature, mixed layer depth, downwelling shortwave radiation and sea ice concentration in Low latitudes (LL), temperate North Atlantic (tNA), temperate North Pacific (tNP), temperate Southern Ocean (tSO), Arctic Ocean (AO), and ice-covered Southern Ocean (iSO).
Area CO 2 exch. Exp. prod. N P SSH SSS SST MLD DSW Ice Predictability horizon is defined as the last lead year that the anomaly correlation is significantly (95% limit) larger than 0.2, than the uninitialized run and the persistence. The significance was calculated with 1,000 bootstraps.

Low Latitude Oceans
In the low latitude oceans (Figure 6), the predictability is similar across the physical variables, and there is an overall perfect-model predictability of 3-5 years, even for DSW ( Table 2). This is a result of the strong coupling between ocean and atmospheric variability through atmospheric convection in this region that influences the cloud cover and consequently the DSW (Yan, 2005;Sun et al., 2017). The predictability of air-sea CO 2 exchange and export production is similar to that of the physical properties.
Because of the small seasonal variations, the predictability of the annual and seasonal means are similar (Figure 6 and Tables 2-4). Séférian et al. (2014) suggested that the predictability of primary production in the low latitude Pacific is tied to the poleward advection of nutrients anomalies that are initially induced by the ENSO-driven variations in upwelling of nutrients at the Equator. Another, slightly different explanation of the predictability in this region is found in Polkova et al. (2015) and Roberts et al. (2016). They suggested that the predictability of the SSH (up to 2-5 years lead time) in the subtropics is a result of baroclinic Rossby-waves, carrying the signal of the initial conditions westward. This would also apply to ocean biogeochemistry, as Rossby waves modify horizontal velocities and the vertical displacement of the thermocline, which affects the horizontal advection and exchange of nutrients and carbon with deep waters (e.g., Uz et al., 2001;Sakamoto et al., 2004;Charria et al., 2008). Indeed, the spatial pattern of predictability in the low latitude Pacific shown in Figures 4, 5 resembles the spatial pattern of the Rossby-wave front presented in Figure 4 of Chelton and Schlax (1996). The links between the predictability and ENSO as suggested by Séférian et al. (2014), and to offequatorial baroclinic Rossby-waves as suggested by Polkova et al. (2015) and Roberts et al. (2016), are however not incompatible, as these waves have been shown to be triggered by ENSO events (Battisti, 1989;Kessler, 1991).

Temperate Oceans
In the temperate oceans there is a large spread in the predictability of the various physical variables (Figure 7 and Table 2), in contrast to what was shown for the low latitudes in the previous section. It ranges from 0 to 1 year for the annual mean DSW, via 4-7 years for the SST and MLD, and up to 6->10 years for the SSH. The variability of DSW in these regions is influenced by the synoptic scale variability and mid-latitude depressions, which are highly unpredictable beyond timescales of a few days. The higher predictability of SSH, SST, and MLD than in the low latitudes is related to the stronger coupling between the surface and deep oceans through winter convection and mixing (e.g., Boer, 2004). The predictability of the annual mean CO 2 exchange is 1 and 3 years in the tSO and tNP, respectively, and 5 years in the tNA. Due the strong seasonality of air-sea CO 2 exchange, an analysis of its seasonal predictability is needed to understand the underlying physical mechanisms.
From the seasonal decomposition (Figure 7 and Tables 3, 4, decomposition into OND and AMJ means are shown in Figures S10-S15), it is clear that the predictability of the summer CO 2 exchange is poor, and that the predictability of the annual mean CO 2 exchange originates from a predictable winter state. The ACC of the winter CO 2 exchange is similar to the ACC of the winter MLD (both in amplitude and in duration), which suggests that the predictable signal originates from the winter vertical mixing and the upwelling of DIC-rich deep waters to the surface. This is consistent with Fröb et al. (2019), who observed a strong relation between interannual variations in DIC, pCO 2 , and MLD in in situ measurements from the subpolar North Atlantic.
It is interesting to note that the predictability of MLD in the North Atlantic is aligned with the predictability of SST, while this relationship is much weaker in the North Pacific and in the Southern Ocean. The depth of the mixed layer is dependent on both mechanical (wind) forcing and buoyancy forcing. In parts of the ocean where there is a large heat release to the atmosphere, the oceanic memory in form of heat content has a relatively larger impact on the mixed layer than the atmospheric forcing. This could explain the better predictability of MLD, and consequently the CO 2 exchange, in the tNA in comparison to the tSO and tNP, as the heat loss to the atmosphere is larger in the North Atlantic (e.g., Talley et al., 2011 and Figure S5).
From Figure 7 it becomes clear that the low predictability of the summer CO 2 exchange (and export production) can be related to an overall low predictability of the ocean physical state; which comes as a result of the stronger control that the unpredictable atmosphere exerts on the upper ocean during this season when the mixed layer depth is shallow and the exchanges between upper and deep ocean water masses are limited. Interestingly, the predictability of the annual mean nutrient concentrations is better than that of export production ( Table 2), suggesting that predictability of annual mean nutrient concentrations does not necessarily lead to predictability of annual mean export production in temperate oceans, i.e., the unpredictable dynamics of the summer mixed layer (including light availability and exchange of nutrients with the deep waters) is more important.

Ice-Covered Oceans
As one of the regulators of gas exchange and the amount of light and momentum reaching the upper ocean, sea ice has a considerable impact on air-sea CO 2 exchange and export production. There is also an indirect control of the sea ice on these processes as its melting and formation influences the buoyancy and thus the mixed layer depth. The predictability of sea ice should therefore have an impact on the predictability of CO 2 exchange and export production. Figure 8 and Table 2 show that there is a mean predictability of sea-ice concentration of 3 years in the Arctic Ocean and in the Southern Ocean, which is in agreement with other studies for the annual mean ice cover (Guemas et al., 2016) and with a conceptual model for the Barents Sea (Onarheim et al., 2015). As expectedly shown in Figure 8, the predictability of the annual mean CO 2 exchange agrees well with that of the sea ice in the Arctic Ocean. This is however not the case in the Southern Ocean, where the CO 2 is only predictable on a time horizon of one year compared to three years for sea ice.
From the seasonal decomposition of the predictability in iSO and AO, we note that the predictable signal of the CO 2 exchange also in these regions comes from the winter state, and that it closely follows the predictability of sea ice concentration and MLD (Figure 8 and Tables 3, 4). As for  the temperate oceans, the predictability of the summer CO 2 exchange, MLD and sea ice concentration is practically absent, indicating a large influence of the atmospheric forcing during this season. The relative importance of the seasons for the annual mean predictability depends on how much the interannual variations of the summer and winter means contribute to the interannual variations in annual means, and how predictable they are. For example, in regions where the interannual variability in CO 2 exchange during summer dominates over that in winter, the predictability of the annual mean CO 2 exchange will be lower than that of the winter. If, however, the winter variability contributes to a large part of the interannual variability (as would be the case for MLD), the predictability of the annual mean will approach that of the winter state. This could explain the differences in predictability of air-sea CO 2 exchange in the Southern Ocean between the present study and Séférian et al. (2018). It could also explain why the predictability of the annual mean CO 2 exchange is lower than the predictability of the annual mean sea ice cover in iSO, even though they compare well in summer and winter, i.e., interannual sea ice variability is driven by winter variations, while that of CO 2 exchange is driven by summer variations.

CONCLUSIONS
In this study we have performed two perfect model experiments with the aim to investigate (i) to what extent perturbations in the initial state of biogeochemical tracers influence inter-annual to decadal predictions of ocean biogeochemistry and (ii) to understand the physical mechanisms that gives rise to, or limit, predictability of biogeochemical processes such as air-sea CO 2 exchange and export production.
Perturbations in biogeochemical initial conditions only degrade predictions in the first lead year. In the following lead years the ocean biogeochemistry re-adjusts to the physics, and the influence of perturbations in the initial conditions becomes negligible. This suggests that initialization of biogeochemistry through e.g., assimilation of biogeochemical observations only brings marginal improvements to interannual-to-decadal biogeochemical predictions, while the initialization of the physics is of high importance, at least in climate prediction models with a similar complexity as the one used in the current study. The results may change if a more complex biogeochemical model, e.g., one that includes higher trophic levels, is used. To further assess the robustness of our findings, similar experiments should be conducted with other models, in particular those with enhanced complexity. Further, assimilation of biogeochemical observations would be useful for example for parameter estimation, and could be used to improve the initial physical fields (Yu et al., 2018), especially in remote parts of the oceans that are poorly sampled. Despite our results, it is therefore still important to develop this technique.
The predictability horizon that we achieve in our perfect model experiment with perturbed SST agrees overall with other studies using different models, suggesting that our model setup has a reasonable representation of the mechanisms giving rise to predictability. For export production, we found the longest predictability in low latitude oceans, with a similar time scale of predictability as baroclinic Rossby waves (Polkova et al., 2015). In seasonally stratified oceans, there is almost no predictability of export production, even in areas that show strong predictability of annual mean ocean physics and nutrients. This is related to the low predictability of the summer mixed layer that is under strong influence by the unpredictable atmosphere. As a result, the summer air-sea CO 2 exchange, which is predominantly driven by biological productivity in temperate seasonally stratified oceans (Tjiputra and Winguth, 2008;Tjiputra et al., 2014), also shows weak predictability. The predictability of the winter air-sea CO 2 exchange shows a strong relation to the winter mixed layer depth in temperate oceans. In the Southern Ocean and the North Pacific the predictability of the winter mixed layer is weaker than that of the sea surface temperature, which we suggest to be because of its higher sensitivity to the wind ( Figure S6). The relatively long predictability of air-sea CO 2 exchange (and mixed layer depth) in the temperate North Atlantic is explained by the strong impact the ocean thermal memory has on the heat release and on the buoyancy forcing in this area.
To conclude, our results call into question the utility of biogeochemical observations for initialization of biogeochemical predictions. It is however important to note that this is based on results from perfect model experiments with one model system only. Similar experiments should be performed using models with more complex, and potentially more correct representation of marine biology and biogeochemistry. Further, we have shown that the predictability of the mixed layer depth is overall less than that of ocean temperature and salinity and therefore puts an important constraint on the predictability of export production and air-sea CO 2 fluxes. A more throughout investigation of the predictability of the mixed layer depths (in particular during summer) in the real world, which is not very wellexplored, is therefore needed to better understand real world biogeochemical predictability.

DATA AVAILABILITY STATEMENT
The datasets generated for this study will be made available at the NorStore research data archive upon publication (https://archive. sigma2.no/).