Review of algorithms estimating export production from satellite derived properties

Whereas the vertical transport of biomass from productive surface waters to the deep ocean (the biological pump) is a critical component of the global carbon cycle, its magnitude and variability is poorly understood. Global-scale estimates of ocean carbon export vary widely, ranging from ∼ 5 to ∼ 20 Gt C y – 1 due to uncertainties in methods and unclear de ﬁ nitions. Satellite-derived properties such as phytoplankton biomass, sea surface temperature, and light attenuation at depth provide information about the oceanic ecosystem with unprecedented coverage and resolution in time and space. These products have been the basis of an intense effort over several decades to constrain different biogeochemical production rates and ﬂ uxes in the ocean. One critical challenge in this effort has been to estimate the magnitude of the biological pump from satellite-derived properties by establishing how much of the primary production is exported out of the euphotic zone, a ﬂ ux that is called export production. Here we present a review of existing algorithms for estimating export production from satellite-derived properties, available in-situ datasets that can be used for testing the algorithms, and earlier evaluations of the proposed algorithms. The satellite-derived products used in the algorithm evaluation are all based largely on the Ocean Colour Climate Change Initiative (OC-CCI) products, and carbon products derived from them. The different resources are combined in a meta-analysis.


Introduction
The recirculation of major nutrients and carbon in the ocean is strongly controlled by the vertical export of particulate organic matter from the surface ocean to the ocean's interior (Figure 1 and e.g.Falkowski et al., 1998;Sabine et al., 2004;Honjo et al., 2008;Siegel et al., 2022).Marine phytoplankton transform CO 2 to organic carbon via photosynthesis with light as the energy source (Eppley, 1972;Geider et al., 1998), a critical biological process that is the foundation of most marine ecosystems (Sarmiento and Bender, 1994;Pauly and Christensen, 1995).The resulting chemical energy bound as organic carbon is used in marine food webs to build other types of biomass and as energy for autotrophic and heterotrophic organisms.While the carbon fixation by phytoplankton (or Primary Production, PP) in marine ecosystems is of vital significance (e.g.Platt et al., 1989;Pauly and Christensen, 1995;Fasham, 2003;Marinov et al., 2008;Chavez et al., 2011), there has been a longstanding debate about how to quantify its magnitude (Platt et al., 1989;Quay and Karl, 2010;Duarte et al., 2013;Williams et al., 2013).Methods to observe or infer different components of primary production have been developed (Bender et al., 1987;Cullen, 2001;Fasham, 2003) that are valid over varying spatial and temporal domains (Balch et al., 2022), and there are significant differences in how different researchers define biological production (Williams, 1993;Cullen, 2001).Most of the biomass generated by PP in the euphotic (sunlit) zone is consumed by heterotrophs and remineralized in the upper ocean.The remaining part is called Net Community Production (NCP) and if aggregated over sufficiently large temporal and spatial scales, they equate to Export Production (EP).Organic carbon resulting from EP is transported to deeper waters by, among other pathways, the downward vertical flux of Particulate or Dissolved Organic Carbon (POC, DOC), often referred to as the "biological pump" (BCP, Volk and Hoffert, 1985).See Siegel et al. (2022) for an exhaustive discussion about the Biological Carbon Pump (BCP) and other processes that sequester carbon from the surface ocean to deeper waters.As with PP, the understanding of the magnitude and spatiotemporal variability of the biological pump remains limited (Burd et al., 2010;Britten and Primeau, 2016).
Satellite-based ocean color products have provided an unprecedented resource to study ocean biogeochemistry and biological oceanography with high spatiotemporal resolution and coverage (Groom et al., 2019;McClain et al., 2022) and significant effort has been allocated to also assess the biological pump from space with limited success (Siegel et al., 2022).One critical challenge has been to quantify community respiration (Westberry et al., 2012) and to establish the ratio of PP that is exported out from the euphotic zone (Britten and Primeau, 2016;Siegel et al., 2016;Siegel et al., 2022).The large uncertainties associated with satellite-based EP products has led to global-scale estimates of ocean carbon export that vary from ∼ 5 to 20 Gt C y -1 (Dunne et al., 2007;Henson et al., 2011;Laws et al., 2011;Siegel et al., 2016;Siegel et al., 2022).

Fluxes and relationships
The main approach to estimate EP from remotely sensed products is based on empirical correlations identified from regression analysis of in-situ observations of vertical POC fluxes in combination with properties that can be derived from satellite (e.g.Stukel et al., 2015).This method has so far generated algorithms with arguably limited ability to predict EP (e.g.Stukel et al., 2015;Palevsky et al., 2016).The many challenges to estimate export fluxes from satellite-derived properties are further complicated by differences and inconsistencies in how EP and export fluxes are defined and quantified.We will describe the most common definitions in the following sections and summarize them in Table 1.

Gross primary production
GPP is the total rate of carbon production by autotrophic organisms before correction for losses due to excretion or respiration, or in other words the gross conversion of inorganic carbon to its organic state (Cullen, 2001;Fasham, 2003) by autotrophs.GPP can in theory be derived from first principles (e.g.Lawrenz et al., 2013, and references).

Net primary production
NPP is the net rate at which autotrophic organisms assimilate carbon.This is normally defined as GPP minus the fraction used by primary producers for cellular respiration and maintenance.(Bender et al., 1987;Platt et al., 1989;Williams, 1993;Behrenfeld and Falkowski, 1997;Cullen, 2001;Fasham, 2003).NPP is also the portion of carbon fixation from photosynthesis that is available to heterotrophic organisms in the ecosystem (Chavez et al., 2011).NPP has primarily been measured in-situ using the 14 C method developed by Nielsen (1952), where collected samples are incubated with a known amount of radioactive 14 C-bicarbonate that labels the dissolved inorganic carbon pool (e.g.Platt and Jassby, 1976;Bender et al., 1987;Cullen, 2001;Fasham, 2003).Other approaches to estimate NPP are based on measuring changes of O 2 in light-dark incubations and different isotopic methods (e.g. 18O 13 C, Bender et al., 1987;Cullen, 2001;Chavez et al., 2011).Typically, the shorter the duration of the incubation method (of order 1 hour), the more the measurement is considered to approach GPP.Longer incubations (order 10 hours) lead to estimates of NPP.
One major development has been the ability to estimate PP from satellite-derived properties (e.g.Eppley et al., 1985;Platt et al., 1988;Sathyendranath et al., 1991;Behrenfeld and Falkowski, 1997;Friedrichs et al., 2009), providing depth-integrated estimates with unprecedented spatial resolution and coverage.Note that, in principle, the method of Platt and Sathyendranath, based on short (1-2h) photosynthesis-irradiance experiment, may be considered to estimate GPP, whereas the method of Behrenfeld and colleagues, based on in situ incubation of one day, approaches NPP.A common approach to quantify PP in the surface ocean from satellite derived properties is based on a concept where stocks of carbon biomass or chlorophyll are combined with auxiliary properties to estimate rates of photosynthesis (e.g.Platt et al., 1988;Behrenfeld and Falkowski, 1997;Arrigo et al., 1998;Westberry et al., 2008).Another approach is to use Inherent Optical Properties (IOPs) to estimate NPP by combining satellitebased proxies for energy absorption in the water column with inferences of the efficiency when absorbed energy is converted into carbon biomass (Antoine et al., 1996;Lee et al., 1996;Smyth et al., 2005;Silsbe et al., 2016).

Net community production
NCP represents the net increase of biomass or carbon in the ecosystem of interest, or NPP minus community respiration of all heterotrophs (Williams, 1993;Cullen, 2001;Fasham, 2003).NCP estimates must be constrained to a defined domain in time and space to be of practical use.A method that aggregates results over the mixed layer can provide diametrically different results for a specific region compared with one that includes the part of the photic zone below the base of the mixed layer or parts of the mesopelagic.Likewise, NCP over short timescales should be interpreted very differently than annual averages (Fasham, 2003).

Export production
Export Production (EP, Laws, 1991) is the net production of organic carbon above a specified horizon and is assumed to be equivalent with NCP when the system is in steady-state and all temporal lags are accounted for.EP is an important property for the global carbon cycle by constraining the sequestration of organic carbon to deeper waters.EP is by definition only valid over significantly longer timescales than any processes directly controlling production and respiration.Hence, it is not possible to directly convert In-situ measurements of mixed layer NCP to EP since the newly-produced biomass might be consumed before it can be exported to the aphotic zone.It is also not yet possible to derive mechanistic relationships between EP satellite-based products.EP serves as the upper bound for transport of POC from the euphotic zone to the bathypelgic (e.g.Platt et al., 1989;Siegel et al., 2016;Siegel et al., 2022).

Export flux
If EP reflects the aggregated production of carbon above a depth horizon available for export to deeper waters, Eflux represents the direct or indirect measurement of this transport.Eflux is defined as Conceptual model of the relationships between different terms describing carbon fluxes in the Ocean.Each term is defined in the text and Table 1.The ratio between new or export production and NPP in a steady state system.

Export Efficiency Ef
The ratio of NPP that is exported across a vertical horizon.
the flux of material over a depth horizon and normally quantified via sediment traps (Dunne et al., 2005;Buesseler et al., 2007) or by measuring the reduction of particle-reactive 234 Th in comparison to its longer-lived parent 238 U in the water column (e.g.Bisson et al., 2018, and references therein).The 234 Th method determines the downward flux of POC by integrating the deficit of 234 Th in the upper water column and couples it to the POC/ 234 Th ratio in sinking particles.Samples can be collected with much higher vertical resolution than traps, allowing for the estimation of POC flux at or very near Z eu without the need for a common reference depth.In contrast to EP, Eflux can be estimated across any temporal or spatial scale.Factors controlling the regional, temporal, and depth variations of POC/ 234 Th ratios are however poorly understood (Puigcorbe et al., 2020).Other sources of uncertainty arise from neglecting physical processes and the necessary assumption of steady state in the Th isotope system (Buesseler et al., 2006).

Export efficiency
The fraction of PP that is exported out of the euphotic zone (EP/ NPP) can be described as the carbon export efficiency (Ef).This is a non-dimensional ratio that describes how inefficient the ecosystem is in retaining carbon in the upper layer of the ocean.The more efficient the pelagic ecosystem is, the more inefficient the carbon export is, to the point where all carbon is recycled, and no carbon will be exported (Buesseler, 1998).

e-ratio
A special case of Ef is the e-ratio, or the flux of particulate organic carbon at the base of the euphotic zone divided by NPP, (Murray et al., 1996).
1.1.8f-ratio Eppley and Peterson (1979) characterized export efficiency as the ratio of New to Total photosynthetic production, or the f-ratio.This idea is based on the concept of distinguishing NPP driven by nitrogen compounds originating from different processes in the ecosystem.New production is fueled by nutrients (usually NO 3 -) recently introduced to the euphotic zone (either from deeper waters or via lateral processes) (Dugdale and Goering, 1967) in contrast to production from rapidly recycled compounds such as ammonium.Export production would then be equal to New production if the system is in a steady state and all transformations between ammonium and NO 3 -occur outside the euphotic zone (Laws et al., 2011).The f-ratio was originally believed to be significant by being directly related to Ef, but this interpretation relied on the assumption that nitrification mainly occur below the euphotic zone, something that more recent measurements have questioned (Dore and Karl, 1996;Yool et al., 2007).Platt et al. (1989) also suggested that elevated new production is directly driven by perturbations in the physical forcing which challenges a necessary assumption of steady state.
1.1.9ef-ratio Laws et al. (2000) combined the e-and f-ratios to an ef-ratio based on the assumption that new production should balance export production if a system is in steady-state.This ratio makes it possible to combine measurements of new and export production.

pe-ratio
A more precise definition of the e-ratio is the pe-ratio, or "the ratio between the export of rapidly sinking particulate matter (particle export) and the total production of organic matter by photosynthesis (primary production)" (Murray et al., 1996;Dunne et al., 2005).The pe-ratio shows similar spatial patterns as the fratio on global scales, especially when identifying eutrophic and oligotrophic regions.

Structure of the review
With this review we have aimed to assess how different published algorithms that use satellite-derived properties to calculate EP perform.Our approach has been to leverage already existing evaluation studies where different models have been compared with each other, and only directly compare algorithms or validation datasets when no existing information is available.Earlier studies evaluating EP algorithms each use slightly varying approaches and were often conducted to compare newly developed algorithms with already existing ones.We address these differences by performing a kind of meta-analysis where different evaluations and validation datasets are compared together with the respective algorithms.We have followed the guidelines for systematic reviews prescribed in Khan et al. (2003) when applicable.
The review begins with a brief description of existing EP algorithms, followed by a presentation of different datasets that can be useful for the evaluation satellite-based EP algorithms.Next, we discuss earlier studies that evaluates EP algorithms, including our own comparison where we use the Dunne et al. (2005); Bisson et al. (2018), andMouw et al. (2016a) datasets to assess the different EP algorithms.Finally, we discuss the different evaluations and provide a recommendation about which algorithm to use.

Export production algorithms
A number of different approaches to constrain and scale EP using different satellite derived properties have been proposed over the years.Most algorithms are developed to provide some kind of export efficiency ratio that then can be scaled with PP estimates to generate properties that are comparable to observations of EP.It is not well defined if either satellite-derived PP products or EP algorithms are assuming that the biological production is defined as GPP, NPP, or something in between, while this is not always clear from in-situ measurements either (Balch et al., 2022).As a result, we use the term PP to designate primary production without specifying if it is GPP or NPP.All relationships except for those of Betzer et al. (1984); Pace et al. (1987), and the re-parametrization of Siegel et al. (2014) by Stukel et al. (2015) are designed to provide global estimates of EP.Terms used in the the algorithms are summarized in Table 2.

Eppley and Peterson, 1979
(1) Eppley and Peterson (1979)'s seminal paper is to our knowledge the first study that suggested a quantitative relationship between PP and EP.They base their algorithm solely on PP, with two different scaling factors if the magnitude of PP is above or below 200 mg C m -2 d -1 .

Suess, 1980
Suess ( 1980) uses one scaling factor for PP and adds a depth dependency as to predict organic carbon flux at any depth across a depth horizon below the base on the euphotic zone.The algorithm was derived from sediment trap data.

Betzer et al., 1984
The Betzer et al. (1984) relationship was derived from on 14 C based PP and POC flux observations using a free-drifting sediment trap at 900 m.The trap was deployed at four locations between 12°N and 6°S at 153°W in the Pacific Ocean.

Pace et al., 1987
EP = 3:523z −0:734 PP 1:000 (5) Pace et al. (1987) expanded on Suess (1980) by including the vertical flux of both POC and particulate organic nitrogen (PON) based on in-situ observations from the Vertical Transport and Exchange (VERTEX) program in the north-east Pacific Ocean.

Laws et al., 2000
This algorithm is based on a relationship between Ef, SST, and the f-ratio derived from data in Table 3 of Laws et al. (2000).We use the equation as described by Henson et al. (2011).

Laws et al., 2011
The algorithms in Laws et al. (2011)   Equation 12 is based on contours in Figure 2 of Laws et al. (2000) and evaluated in Stukel et al. (2015), both equations 12 and 13 are evaluated in Li and Cassar (2016).2012) uses a number of empirical relationships between PP and respiration (R) to assess NCP and EP.Part of their analysis is to generate regional PP-R relationships by dividing available observations into broad latitudinal zones with different nutrient dynamics.

Siegel et al., 2014
Siegel et al. ( 2014) algorithm divides EP is to different size classes based on the assumed ability to assess the community structure of phytoplankton assemblages via satellite-derived properties.The different terms are specified as follows: AlgEP is the total vertical flux of sinking algal cells and aggregates and FecEP is the total vertical flux of sinking fecal material released from zooplankton grazers.f Alg is the fraction of microphytoplankton production that sinks out of the base of the euphotic zone (assumed by Siegel et al. (2014) to be 0.1) and PP M is the PP of microphytoplankton.f FecM and f FecS are the fractions of grazing on microphytoplankton and smaller (<20 m m) phytoplankton, respectively, that contribute to fecal matter export from the euphotic zone (assumed by Siegel et al. (2014) to be 0.3 and 0.1, respectively).G M and G S are the grazing rates on microphytoplankton and small phytoplankton and are derived from phytoplankton mass balance budgets.

Li & Cassar, 2016
The Li and Cassar (2016) model was developed using a Genetic Programing approach to statistically optimize the Laws et al. (2000) and Henson et al. (2011) relationships using O 2 /Ar-based NCP estimates.

In situ data for evaluation
The Dunne et al. (2005) compilation of in-situ pe-ratios is based on 122 field observations from approximately 40 oceanographic studies with global distribution.The dataset includes estimates of PP, Chl a, New Production, nutrients, oxygen or carbon based estimates of EP, particle export estimates based on sinking flux from sediment traps and/or 234 Th, and the carbon-to-chlorophyll ratio.Physical parameters include mixed layer temperature and the depth of the euphotic zone (minimum of the 1% light level or sampling zone), The data coverage is presented in Figure 3.They find that, In general, pe-ratios are high (>0.4) in the Polar regions, moderate (0.3-0.4) in coastal regions and open ocean regions supporting phytoplankton blooms, and low (0.05-0.2) elsewhere.The data can be accessed as supplementary information to the Dunne et al. (2005) publication.
The Stukel et al. (2015) datasets are based on 32 Lagrangian process studies where shallow-drifting sediment traps were combined with 238 U-234 Th measurements to quantify EP (Buesseler et al., 2007).These Lagrangian studies where conducted between 2 and 5 days either within in the California Current Ecosystem (CCE) Long Term Ecological Research (LTER) or the Costa Rica Dome (CRD) FLUx and Zinc Experiments (FLUZiE) programs.Drifters were drogued at 15 m depth and tracked by satellite with either experimental incubation bottles or VERTEX-style sediment traps attached below (Stukel et al., 2013).This experimental setup allowed for simultaneous measurement of carbon export, food web processes (PP, protozoan grazing, mesozooplankton grazing, size-spectra of phytoplankton community), and net changes of in-situ Chl.The datasets consist of observations from 7 cruises (Figure 2) and can be located via the acknowledgments section of Stukel et al. (2015) or as supplementary information to the publication.
The Li and Cassar (2016) algorithm development and evaluation used a global dataset of mixed layer O 2 /Ar based NCP estimates either from discrete samples analyzed in the lab or continuous underway measurements (Reuer et al., 2007;Cassar et al., 2009;Jönsson et al., 2013).NCP can be derived from O 2 /Ar measurements by assuming a mass balance of biological O 2 in the mixed layer.Oxygen saturation at the ocean surface is influenced by biological (i.e., PP) and physical processes (e.g., bubble injection and temperature changes).Ar and O 2 have similar temperature dependencies (Craig and Hayward, 1987).Combined with their similar solubilities, they have almost equivalent responses to processes such as temperature or air pressure change and bubblemediated gas exchange.As such, oxygen concentration due to physical processes can be accounted for with measurements of the O 2 /Ar saturation state.
The dataset contains observations from 1999 to 2009 (n = 689,566) averaged to a 0.083°× 0.083°grid, yielding n=14,795 samples with a mean coefficient of variation (CV) of 0.12 per gridcell (Figure 4).The O 2 /Ar super saturation is converted to an NCP proxy using QSCAT/NCEP blended wind speeds (Reuer et al., 2007).Samples with negative NCP are removed due to potential biases associated with vertical mixing of O 2 -undersaturated waters (Reuer et al., 2007;Jönsson et al., 2013) 3).Where available, the flux of other minerals is also reported.Of the observations across the globe, 85% are concentrated in the Northern Hemisphere, time series sites accounts for 36% of the data, while 71% of the data are measured at ≥500 m with the most common deployment depths between 1000 and 1500 m.The dataset is archived in the PANGAEA data repository (Mouw et al., 2016b).
The Bisson et al. (2018)   points together with appropriate metadata including geographic location, date, and sample depth.When available, water temperature, salinity, 238 U (over 18 200 data points), and particulate organic nitrogen is included.Data source and method information (including 238 U and 234 Th) is also detailed along with valuable information for future data analysis such as bloom stage and steady-/non-steady-state conditions at the sampling moment.While not directly applicable in this study, this dataset provides a valuable resource for future EP algorithm development and evaluation.

Algorithm evaluations
The instrumental Dunne et al. (2005) study provided not only relationships between PP or Chl and EP that are widely used in ecosystem modeling, but also comparisons of a variety of empirical parameterizations with the data synthesis described in section 3.1.The observed pe-ratios were combined with in-situ observations of mixed layer temperature, depth-integrated chlorophyll, depthintegrated PP, new production, particle export, depth of the euphotic zone (minimum of the 1% light level or sampling zone), and the carbon-to-chlorophyll ratio.They found that the Eppley and Peterson (1979), (Figure 5A) algorithm has the lowest coefficient of determination (9%), which they attributed to the parameterization relying on the integral of PP alone.The Baines et al. (1994), (Figure 5B) algorithm added euphotic zone depth in addition to the depth-integral of PP and was able to account for a higher fraction of the variance (38%), while not improving the relative uncertainty (64%).A different approach was used by Baines et al. (1994), (Figure 5C) where chlorophyll concentrations were utilized as the predictive variable.Their parameterization was able to account for a slightly higher variance (40%) while also decreasing the relative uncertainty (46%), but showed a strong bias to low values at higher pe-ratios.Dunne et al. (2005) found that the Laws et al. (2000), (Figure 5D) algorithm succeeded in reproducing largescale structures in the data and accounted for nearly half of the variance (47%), while decreasing the relative error to 43%.The major shortcoming of this algorithm was in reproducing variability in pe-ratios at high temperatures.Dunne et al. (2005) suggested that "a weaker temperature dependence for phytoplankton and bacterial metabolism than for zooplankton metabolism" accounts for this misfit.
The algorithm developed by Dunne et al. (2005), (Figures 5E, F) provided a reasonable fit to the compiled dataset of observations, with an R 2 of 58% and a relative uncertainty of 33%.The algorithm had low skill in areas with the highest pe-ratios and sites with a combination of very high pe-ratios and low to moderate PP.This discrepancy if compensated for using biomass instead of PP improved R 2 to 61% with a relatively low relative error (35%).Dunne et al. (2005) explained this improvement with biomass integrating ecosystem processes better over time than PP.
Li and Cassar (2016) evaluated a number of algorithms described in section 2 by matching O 2 /Ar-derived NCP observations (see section 3.3) with satellite derived 8-day 0.083°× 0.083°SeaWiFS Chl and PAR, VGPM NPP, and AVHRR SST.The standard SeaWiFS Chl algorithm was shown to underestimate [Chl] by a factor of 2 to 3 in the Southern Ocean at the time when Li and Cassar (2016) conducted the evaluation (Kahru and Mitchell, 2010) and were improved by using a blending scheme presented by Kahru and Mitchell (2010).VGPM NPP was based on the recalculated Chl data product.Phytoplankton size composition was derived using Li et al. (2013) and VGPM NPP for the algorithm developed by Siegel et al. (2014), together with the other parameters as presented in Siegel et al. (2014).See Li and Cassar (2016) for more detailed descriptions of the data sources.
Li and Cassar (2016) used the satellite-derived data to calculate export production for the Eppley and Peterson (1979); Betzer et al.   2014) algorithms.They also used the data together with observed O 2 /Ar-NCP to develop the Li and Cassar (2016) algorithm.The main assumption here was that the O2/Ar-NCP is a valid proxy for EP.One could expect Li and Cassar (2016) to outperform the other algorithms since the observational dataset was used to train the algorithm, but this was not the case (Figure 6).Instead, all EP predicting algorithms showed surprisingly similar results.Eppley & Peterson (1979); Betzer et al. (1984), andBaines et al. (1994) showed almost identical distributions in the regressions against observations with R 2 s between 0.58 and 0.65.The algorithms of Eppley & Peterson (1979) and Betzer et al. (1984) in particular tended to overestimate low NCP values.The different Laws et al. (2000); Laws et al. (2011) algorithms all provided a smaller spread around the 1:1 line and a slightly better R 2 (0.64-0.7).The algorithms of Laws et al. (2000) also overestimated low NCP, while the algorithm of Laws et al. (2011); (Equation 12 and 13) showed symmetrical distributions.The Dunne et al. (2005) algorithm, on the other hand, tended to underestimate NCP.This tendency was even greater for Westberry et al. (2012) and Siegel et al. (2014), which also showed among the lowest R 2 s (0.62 and 0.55, respectively).This is particularly surprising for the algorithm of Westberry et al. (2012), which was developed using a framework that was based on the assumption of NCP being a good general proxy for EP.The two algorithms developed by Li and Cassar (2016) showed, as expected, a good skill in predicting O 2 /Ar-NCP.Their Support Vector Regression (SVR) approach had the highest R 2 , but seemed to have a floor where values below a certain threshold were not being predicted.This could be a consequence of how the SVR was configured.The fact that all approaches showed similar and relatively good skills in predicting O 2 /Ar-NCP and EP is surprising, as the various algorithms model different components of the biological pump.
The Stukel et al. (2015) EP algorithm comparison is based on a Lagrangian approach where in-situ rates of NPP, EP and auxiliary parameters were observed concurrently in a water mass.They do this by compiling results from Lagrangian process studies in the North Eastern Pacific Ocean (see Stukel et al., 2015, and section 3.2 for details).One main advantage of this approach is the ability to disentangle errors associated with inaccuracies of remote sensing products (e.g., PP, Chl, and SST) and errors associated with the model used to estimate EP.Their intention was not to conduct a definitive comparison of competing satellite algorithms, but rather to begin a process that assess and hopefully improves the different assumptions and parameterizations in current satellite algorithms, especially Siegel et al. (2014).
Satellite algorithms for EP are generally designed to predict export at either Z eu or 100 meters.While the observations used by Stukel et al. (2015) were within 30 m of Z eu , they scaled all data to Z eu using the ambient PAR at the depth of sampling.All EP algorithms were evaluated using in-situ input properties (e.g., SST, Chl, PP) as the goal was not to assess the corresponding satellite products.All water column rates and standing stock measurements were depth integrated, except when models made explicit reference to sea surface values, for methodological reasons.Stukel et al. (2015)  Figure 7 shows the resulting comparisons between satellite algorithms and in-situ measurements.At a first look, it seems that no algorithm performed significantly better or worse than any other.Dunne et al. (2005)    (2011) were closest to the observations but showed a 3-fold difference and no clear trends.Henson et al. (2011) consistently provided lower export estimates than the observations.Their explanation is that a stronger dependency on temperature by Henson et al. (2011) leads to low export fluxes (¡ 2 mmol C m −2 d −1 ) throughout the study area.They also observe a significant overestimation of EP by Dunne et al. (2005) in their equatorial Comparison of particle export ratio estimates by Dunne et al. (2005) of various models described in section 2 using data described in section 3.1.Panels (A-D) show results based on algorithms described in sections 2.1, 2.3, 2.5, and 2.6, and panels (E, F) the two algorithms described in section 2.7.Symbols are grouped by temperature into less than 14°C (crosses) and greater than 14°C (dots).Image from Dunne et al. (2005).5 Evaluating the different EP models using three in situ databases The evaluations described so far are all promoting a new algorithm (Dunne et al., 2005;Li and Cassar, 2016) or reparameterizing an existing algorithm (Stukel et al., 2015).While the different approaches are thorough and the results consistent between the three studies, we believe it worthwhile to evaluate the different algorithms from a neutral starting point.For this, we use  (Kulk et al., 2021, BICEP).It should be noted that different satellite-based PP products vary considerably (Bisson et al., 2018;Siegel et al., 2022), which can affect the calculated EP estimates Bisson et al. (2018).We found, however, that different PP products had only a limited influence on the skill of the EP algorithms evaluated in this Comparison of satellite algorithms of carbon export production by Li and Cassar (2016).O 2 /Ar-derived NCP was converted to C using a stoichiometry of O 2 /C=1.4 (Laws, 1991).Samples with O 2 /Ar-NCP estimates<1.0mmol O 2 m 2 d -1 were excluded.Phytoplankton size composition was derived using Li et al. (2013) and VGPM NPP for the algorithm developed by Siegel et al. (2014), together with the other parameters as presented in Siegel et al. (2014).Image from Li and Cassar (2016).
study and we therefore chose to only present results based on one PP product.
We begin by calculating global EP flux estimates averaged over the years 1998-2020 for all algorithms using the earlier mentions satellite-derived products (all values presented in Table 3).The flux estimates are between 1 and 140 Gt C y −1 , a much larger range of uncertainty than normally presented for EP.If only algorithms with one of the top three skills scores in any of our evaluations (red colors in Tables 3, 4, 5) are included, the range is 1-9 Gt C y −1 , This result is more in line with earlier published estimates (Dunne et al., 2007;Henson et al., 2011;Laws et al., 2011;Siegel et al., 2016;Siegel et al., 2022).Dunne et al. (2005) does not report any information about sampling dates, which means that we are not able to match satellitederived properties to the dataset.Instead, we rely on properties included in the dataset (n=125), all which we believe to be based on in-situ observations.There is 487 satellite matchups for Mouw et al. (2016b) observations between 100 and 200 meter, and 1,058 matchups for Bisson et al. (2018).All observations used in the comparisons are shown in Figure 3.The three datasets covers similar regions, but do not necessarily include the same observations.This is to be expected since our use of the Mouw et al. (2016b) and Bisson et al. (2018) datasets are limited to the time of ocean color satellite coverage (1998 to present) whereas the Dunne et al. (2005) dataset has a cutoff some years before publication.Bisson et al. (2018) have several long transects included that are not part of Mouw et al. (2016b).
When visually comparing in-situ POC fluxes with predicted EP calculated using the Dunne et al. (2005); Laws et al. (2000), and Li and Cassar (2016) algorithms (Figures 8-10) we see similar patterns.Comparing the Dunne et al. (2005) algorithm in its corresponding in-situ dataset results, as expected, in the same correlation as reported by the paper.A more interesting finding is that the two other algorithms have similarly good skill in predicting EP.The main exception is a slight offset from the 1:1 line by Li and Cassar (2016).When using in-situ POC flux from Mouw et al. (2016b) dataset together with matched satellite properties, we see quite different results where neither observations from deep waters (purple markers) nor data from shallow water at less than 200 m depth (blue markers) are predicted particularly well.The main issue seems to be that low observed values are not predicted as low values by the algorithm, which results in EP being significantly overestimated compared to observations.Here, the relationship Comparison between satellite algorithms and in-situ measurements by Stukel et al. (2015).2018) is trending higher than Mouw et al. (2016b), but EP predicted by Dunne et al. (2005) falls within the same range for both in-situ datasets, leading to a notable underestimation by the algorithm.This pattern can be found for Li and Cassar (2016), but is less pronounced, whereas Laws et al. (2000) generates EP predictions that are quite symmetrically distributed around the 1:1 line.Figures for the other algorithms can be found in the Supplementary Materials.
Finally, we compare log transformed predictions of EP from the different satellite-based models to in-situ observations of POC flux using a number of metrics: coefficient of determination (R 2 , Wright, 1921), Mean Absolute Error (MAE, Chicco et al., 2021), Root Mean Square Error (RMSE, Nevitt and Hancock, 2000), Mean Absolute Percent Error (MAPE, Myttenaere et al., 2016), symmetric Mean Absolute Percentage Error (sMAPE Makridakis, 1993), and Bias.Please see Chicco et al. (2021) and Seegers et al. (2018) for a more detailed discussion about each metric and their utility.All values are presented in Tables 3-5.We find that most models have limited to very limited skill when evaluated with R2 against the Mouw et al. (2016b) or Bisson et al. (2018) datasets, whereas several models perform better with Dunne et al. (2005); Li and Cassar (2016), and   6 Discussion and conclusions The EP algorithms described here assume different definitions of export efficiency, are based on different products for deriving PP from satellite products (who themselves have different assumptions All EP values are log-transformed.Higher is better for R 2 while lower is better for the other metrics.Red colors denote the three algorithms with highest skill according to each metric.about PP), and are developed using different in-situ datasets.Still, the skill of predicting export production is surprisingly similar among the different algorithms.Both the Dunne et al. (2005) and Li and Cassar (2016) algorithm evaluations showed that their own model provides the best results, which is not too surprising since they were developed using the evaluation data.The advantage is, however, modest for Dunne et al. (2005) and insignificant for Li and Cassar (2016).The Stukel et al. (2015) evaluation used a Lagrangian in-situ dataset collected with the Siegel et al. (2014) algorithm in mind and performed a re-parameterization of said algorithm, but only achieved a modest improvement in skill measured as R 2 .The authors argued that other statistical methods are more useful to evaluate EP algorithms and Siegel et al. (2014) showed a larger improvement by those metrics.
There is only a slight correlation between how complex an algorithm is and how well it performs.Siegel et al. (2014) is arguably the most complex approach and showed good results in the Stukel et al. (2015) study, but was performing rather poorly in Li and Cassar (2016).The simplest approach is by Eppley and Peterson (1979), which is the only algorithm evaluated that uses PP as the sole independent input feature.It performed worse than other algorithms in Dunne et al. (2005) but reasonably well in Li and Cassar (2016).This might suggest that SST is a more important factor when estimating carbon fluxes at depth than EP from the euphotic zone.
We find that using the Mouw et al. (2016b) dataset together with satellite-derived properties provide a poor correlation between observed POC flux and predicted EP for the Dunne et al. (2005) algorithm, the reason for this is not entirely clear.Some possible explanations are problems with the satellite-derived products used or differences in how the Dunne et al. (2005) and Mouw et al. (2016b) datasets represent the global ocean.Another possible reason is (we assume) that all properties used in the Dunne et al. (2005) dataset are specifically sampled in connection to the POC flux observations.One could expect a better connection between surface processes and thermocline fluxes when observed over appropriate temporal and spatial scales.This suggestion would also explain the good correlations found by Stukel et al. (2015) and Li and Cassar (2016), the latter by not relying of thermocline fluxes in the evaluation.
A future step to better understand the contrasting results seen in this study is to re-evaluate all models with all available datasets.It is a reasonable assumption that empirical relationships between available satellite-derived products and EP differ significantly between different regions of the ocean (Sathyendranath et al., 1991;Stukel et al., 2015;Britten and Primeau, 2016;Li and Cassar, 2016).Recent syntheses of in-situ observations within the BICEP and EXPORTS projects have created the potential to re-parametrize existing algorithms and to perform new regression analyses on regional scales.An alternative promising approach to estimate EP from space is to use satellitederived properties for data assimilation in biogeochemical models.Two recent examples are studies by DeVries and Weber (2017) and Nowicki et al. (2022) where they quantify the biological pump by using satellite and oceanographic tracer observations to constrain rates and patterns of organic matter production, export, and remineralization in an inverse model framework.
2.10 Westberry et al., 2012 EP = PP − 1:01 • PP 0:82 (all data) EP = PP − 0:93 • PP 0:78 (open ocean) (14) Westberry et al. ( . Note that positive NCP values may also be biased by vertical mixing where vertical mixing brings O 2 -undersaturated water to the surface and the estimates should be regarded as lower bounds on the true NCP.Conversely, positive biases in NCP could occur in regions with high biological O 2 below the mixed layer (e.g., deep chlorophyll maximum).Because of these uncertainties, O 2 /Ar NCP data below 1.0 mmol O 2 m 2 d -1 are removed from the dataset.Additional uncertainties and biases (e.g., gas exchange parameterization and lack of steady state in biological O 2 in the mixed layer) are further discussed in Jönsson et al. (2013).Data access is described in Li and Cassar (2016).The Mouw et al. (2016a) dataset consists of Particulate Organic Carbon (POC) flux estimated from sediment traps and 234 Th compiled across the global ocean including six long-term time series locations.The data set contains 15,792 individual POC flux estimates at 674 unique locations collected between 1976 and 2012 (Figure dataset is based on observations from sediment traps at depths less than 200 meters and 234 Th measurements converted to POC flux at Zeu.The data is selected to represent different sampling methodologies and spatiotemporal scales, and totals 1,719 observations from 1984 to 2014.The Puigcorbe et al. (2020) dataset does not include POC flux observations but POC/ 234 Th ratios that can be indirectly used to constrain EP and evaluate EP models.The collection contains of 9,318 measurements with a global coverage collected between 1989 and 2018 from the surface to > 5500 m, and divided into three size fractions (∼< 0.7 µm, ∼ 1-50 µm, ∼> 50 µm).The data has an uneven distribution with some areas highly sampled (e.g.,China Sea, Bermuda Atlantic Time Series station) while others regions are sparsely covered (the south-eastern Atlantic, the south Pacific, and the south Indian Oceans).The dataset is archived in the PANGAEA data repository (Puigcorbe, 2019).Ceballos-Romero et al. (2022) provide a comprehensive dataset of 234 Th measurements sampled across the global ocean between 1967 and 2018.The compilation includes a total of 56 631 data
(1984); Baines et al. (1994); Laws et al. (2000); Dunne et al. (2005); Laws et al. (2011); Westberry et al. (2012), and Siegel et al. ( noted that Dunne et al. (2005) and Laws et al. (2011) are parametrized to predict total EP including active transport by diel vertically migrating organisms and passive export of DOC, leading to a positive bias since sediment traps and 234 Th only measure POC fluxes.Stukel et al. (2013) estimated that the active transport by diel migration is about 19% of the total sinking flux in the CCE region, providing a lower constraint on this bias.
and Siegel et al. (2014) had R 2 coefficients of determination of 0.37, whereas the R 2 for Henson et al. (2011) and Laws et al. (2011) were 0.27.Stukel et al. (2015) reparameterized Siegel et al. (2014) using their in-situ observations and improved the R 2 to 0.38.It should be noted that all algorithms have been developed and parameterized to function in a global setting in all physical, chemical, and biological settings.This study

FIGURE 4
FIGURE 4Global map of O2/Ar measurements fromLi and Cassar (2016).Samples with positive values are color coded.Samples with negative values are shown using a gray scale.Image fromLi and Cassar (2016).
Stukel et al. (2015) performed a comparison in one specific region with a small subset of ecosystem dynamics and conditions.Puigcorbe et al. (2017) compared Dunne et al. (2005); Laws et al. (2011), and Henson et al. (2011) using PP estimates from three different satellite-derived primary production models and a regional dataset of POC fluxes based on 234 Th from the North Western Atlantic Ocean.They found that Dunne et al. (2005) and Laws et al.
the published dataset of POC fluxes by Dunne et al. (2005); Mouw et al. (2016a); Mouw et al. (2016b), and Bisson et al. (2018).We matched the Mouw et al. (2016a); Mouw et al. (2016b) and Bisson et al. (2018) data with monthly satellite-derived SST from the Group for High Resolution Sea Surface Temperature/Operational Sea Surface Temperature and Ice Analysis (UKMO, 2005, GHRSST/ OSTIA), Chl and Kd 490 from Ocean Colour Climate Change Initiative (Sathyendranath, 2021, OC-CCI), and PP from the Biological Pump and Carbon Exchange Processes project

FIGURE 6
FIGURE 6 FIGURE 7 (A) Siegel et al. (2014) vs sediment-trap-(circles and diamonds) and 234 Th-derived (squares and triangles) measurements.Circles and squares are based on results using microscopy to determine the fraction of microphytoplankton.(B-D) Dunne et al. (2005); Laws et al. (2011), and Henson et al. (2011) algorithms, respectively, with circles showing sediment trap data and squares showing 234 Th data.All panels show export normalized to the base of the Z eu , except panel D, which shows export at 100m. Green diamonds show arithmetic mean of predicted and measured export for each quartile of the measurements (35-85, 85-125, 125-205, and 205-560 mg C m -2 d -1 for base of Z eu ; 18-65, 65-89, 89-140, and 140-300 mg C m -2 d -1 for 100m).Dashed lines depict a 1:1 relationship.Error bars show one standard error (for in-situ measurements) and propagation of measurement standard error through satellite algorithms (for predictions).Image from Stukel et al. (2015).betweenPOC flux and EP based on theDunne et al. (2005) dataset could act as an upper constraint when applying the algorithms to theMouw et al. (2016b) dataset.We find a less coherent pattern when in-situ POC flux fromBisson et al. (2018) is plotted against EP from the three algorithms.The distribution of values inBisson et al. ( Laws et al. (2000) at the top, when comparing predicted to insitu POC fluxes in Dunne et al. (2005) dataset using in-situ properties only.These results are in accordance with the earlier presented visual comparisons.The other metrics show similar patterns.

TABLE 1
Different terms associated with Biological Production that are relevant for algorithm development.

TABLE 2
Input and output parameters for the different algorithms.

TABLE 3
Dunne et al. (2005) for different export production models evaluated using theDunne et al. (2005)dataset.All EP values are log-transformed.Higher is better for R 2 while lower is better for the other metrics.Red colors denote the three algorithms with highest skill according to each metric.The final column contains global flux estimates averaged over the years 1998-2020 using each respective algorithm and satellite-derived data described in the text.
Only observations between 100 and 200 meters are included and all EP values are log-transformed.Higher is better for R 2 while lower is better for the other metrics.Red colors denote the three algorithms with highest skill according to each metric.