# COLOUR AND LIGHT IN THE OCEAN

EDITED BY : Victor Martinez-Vicente, Astrid Bracher, Didier Ramon, Shubha Sathyendranath and Tiit Kutser PUBLISHED IN : Frontiers in Marine Science

### Frontiers eBook Copyright Statement

The copyright in the text of individual articles in this eBook is the property of their respective authors or their respective institutions or funders. The copyright in graphics and images within each article may be subject to copyright of other parties. In both cases this is subject to a license granted to Frontiers. The compilation of articles constituting this eBook is the property of Frontiers.

Each article within this eBook, and the eBook itself, are published under the most recent version of the Creative Commons CC-BY licence. The version current at the date of publication of this eBook is CC-BY 4.0. If the CC-BY licence is updated, the licence granted by Frontiers is automatically updated to the new version.

When exercising any right under the CC-BY licence, Frontiers must be attributed as the original publisher of the article or eBook, as applicable.

Authors have the responsibility of ensuring that any graphics or other materials which are the property of others may be included in the CC-BY licence, but this should be checked before relying on the CC-BY licence to reproduce those materials. Any copyright notices relating to those materials must be complied with.

Copyright and source acknowledgement notices may not be removed and must be displayed in any copy, derivative work or partial copy which includes the elements in question.

All copyright, and all rights therein, are protected by national and international copyright laws. The above represents a summary only. For further information please read Frontiers' Conditions for Website Use and Copyright Statement, and the applicable CC-BY licence.

ISSN 1664-8714 ISBN 978-2-88963-421-7 DOI 10.3389/978-2-88963-421-7

### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# COLOUR AND LIGHT IN THE OCEAN

Topic Editors: Victor Martinez-Vicente, Plymouth Marine Laboratory, United Kingdom Astrid Bracher, Alfred Wegener Institute Helmholtz Centre for Polar and Marine Research (AWI), Germany Didier Ramon, Hygeos, France Shubha Sathyendranath, Plymouth Marine Laboratory, United Kingdom Tiit Kutser, University of Tartu, Estonia

### CLEO publications in *Frontiers in Marine Science*

Foreword Josef Aschbacher, Director of ESA's Earth Observation Programmes

Satellite data have drastically changed the view we have of the oceans. Covering about 70% of Earth's surface, oceans play a unique role for our planet and for our life – but large areas remain unexplored and are difficult to reach.

Since the 1980s, Earth-orbiting satellites have helped to observe what is happening at the ocean surface. Sensors like CZCS, AVHRR, SeaWifs and MODIS provided the first ocean colour data from space. Starting in 2002, ESA's Medium Resolution Imaging Spectrometer (MERIS) on-board the environmental satellite Envisat, provided detailed information on phytoplankton biomass and concentrations of other matter in the global oceans.

These satellite observations laid the groundwork for studying the marine environment and how it responds to climate change, and the research community has since delivered information on the variability of marine ecosystems.

Part of this work is reflected in this stunning collection of peer-reviewed publications presented at the workshop, Colour and Light in the Ocean from Earth Observation (CLEO), held at ESA's ESRIN site in Frascati, Italy, on 6–8 September 2016. The event attracted more than 160 participants from all over the world, including remote sensing experts, marine ecosystem modelers, in-situ observers and users of Earth observation data. Scientifically, the meeting covered applications in climate studies over primary productivity and ocean dynamics, to pools of carbon and phytoplankton diversity at global and regional scales. It also demonstrated the potential of Earth observation and its contribution to modern oceanography. Looking to the future, new satellites developed by ESA under the coordination of the European Commission will further our scientific and operational observations of the seas.

With Sentinel-3A in orbit and its twin Sentinel-3B following in 2017, there is a new category of data available for operational oceanographic applications and climate studies for years to come. These data are free and easy to access by anyone interested. Looking at the role of oceans in our daily lives, I am sure that this collection of scientific excellence will be valued by scientists of today and will inspire the next generation to carry these ideas into the future.

Citation: Martinez-Vicente, V., Bracher, A., Ramon, D., Sathyendranath, S., Kutser, T., eds. (2020). Colour and Light in the Ocean. Lausanne: Frontiers Media SA. doi: 10.3389/978-2-88963-421-7

# Table of Contents


Astrid Bracher, Heather A. Bouman, Robert J. W. Brewin, Annick Bricaud, Vanda Brotas, Aurea M. Ciotti, Lesley Clementson, Emmanuel Devred, Annalisa Di Cicco, Stephanie Dutkiewicz, Nick J. Hardman-Mountford, Anna E. Hickman, Martin Hieronymi, Takafumi Hirata, Svetlana N. Losa, Colleen B. Mouw, Emanuele Organelli, Dionysios E. Raitsos, Julia Uitz, Meike Vogt and Aleksandra Wolanin


Tanya Churilova, Vyacheslav Suslin, Olga Krivenko, Tatiana Efimova, Nataliia Moiseeva, Vladimir Mukhanov and Liliya Smirnova

*93 Uncertainty in Ocean-Color Estimates of Chlorophyll for Phytoplankton Groups*

Robert J. W. Brewin, Stefano Ciavatta, Shubha Sathyendranath, Thomas Jackson, Gavin Tilstone, Kieran Curran, Ruth L. Airs, Denise Cummings, Vanda Brotas, Emanuele Organelli, Giorgio Dall'Olmo and Dionysios E. Raitsos


Martin Hieronymi, Dagmar Müller and Roland Doerffer


Pierre Gernez, David Doxaran and Laurent Barillé

*190 Regional Empirical Algorithms for an Improved Identification of Phytoplankton Functional Types and Size Classes in the Mediterranean Sea Using Satellite Data*

Annalisa Di Cicco, Michela Sammartino, Salvatore Marullo and Rosalia Santoleri

*208 Extended Formulations and Analytic Solutions for Watercolumn Production Integrals*

Žarko Kovač, Trevor Platt, Suzana Antunović, Shubha Sathyendranath, Mira Morović and Charles Gallegos

*224 Synergistic Exploitation of Hyper- and Multi-Spectral Precursor Sentinel Measurements to Determine Phytoplankton Functional Types (SynSenPFT)*

Svetlana N. Losa, Mariana A. Soppa, Tilman Dinter, Aleksandra Wolanin, Robert J. W. Brewin, Annick Bricaud, Julia Oelker, Ilka Peeken, Bernard Gentili, Vladimir Rozanov and Astrid Bracher


Hayley Evers-King, Victor Martinez-Vicente, Robert J. W. Brewin, Giorgio Dall'Olmo, Anna E. Hickman, Thomas Jackson, Tihomir S. Kostadinov, Hajo Krasemann, Hubert Loisel, Rüdiger Röttgers, Shovonlal Roy, Dariusz Stramski, Sandy Thomalla, Trevor Platt and Shubha Sathyendranath


Thomas Jackson, Shubha Sathyendranath and Trevor Platt

*305 Characterizing Spatial Variability of Ice Algal Chlorophyll* a *and Net Primary Production Between Sea Ice Habitats Using Horizontal Profiling Platforms*

Benjamin A. Lange, Christian Katlein, Giulia Castellani, Mar Fernández-Méndez, Marcel Nicolaus, Ilka Peeken and Hauke Flores

*328 Primary Production: Sensitivity to Surface Irradiance and Implications for Archiving Data*

Trevor Platt, Shubha Sathyendranath, George N. White III, Thomas Jackson, Stéphane Saux Picart and Heather Bouman

*337 Intercomparison of Ocean Color Algorithms for Picophytoplankton Carbon in the Ocean*

Víctor Martínez-Vicente, Hayley Evers-King, Shovonlal Roy, Tihomir S. Kostadinov, Glen A. Tarran, Jason R. Graff, Robert J. W. Brewin, Giorgio Dall'Olmo, Tom Jackson, Anna E. Hickman, Rüdiger Röttgers, Hajo Krasemann, Emilio Marañón, Trevor Platt and Shubha Sathyendranath

*356 Comparison of Seasonal Cycles of Phytoplankton Chlorophyll, Aerosols, Winds and Sea-Surface Temperature off Somalia*

Muhammad Shafeeque, Shubha Sathyendranath, Grinson George, Alungal N. Balchand and Trevor Platt

### *371 Satellite Radiation Products for Ocean Biology and Biogeochemistry: Needs, State-of-the-Art, Gaps, Development Priorities, and Opportunities*

Robert Frouin, Didier Ramon, Emmanuel Boss, Dominique Jolivet, Mathieu Compiègne, Jing Tan, Heather Bouman, Thomas Jackson, Bryan Franz, Trevor Platt and Shubha Sathyendranath

*391 Optical Classification of the Coastal Waters of the Northern Indian Ocean* S. Monolisha, Trevor Platt, Shubha Sathyendranath, J. Jayasankar, Grinson George and Thomas Jackson

# Using Optical Sensors on Gliders to Estimate Phytoplankton Carbon Concentrations and Chlorophyll-to-Carbon Ratios in the Southern Ocean

Sandy J. Thomalla1, 2 \*, A. Gilbert Ogunkoya3 †, Marcello Vichi 2, 3 and Sebastiaan Swart 1, 2, 4

*<sup>1</sup> Southern Ocean Carbon and Climate Observatory, CSIR - Ocean Systems and Climate, Cape Town, South Africa, <sup>2</sup> Department of Oceanography, University of Cape Town, Cape Town, South Africa, <sup>3</sup> Marine Research Institute (Ma-Re), University of Cape Town, Cape Town, South Africa, <sup>4</sup> Department of Marine Sciences, University of Gothenburg, Gothenburg, Sweden*

### Edited by:

*Victor Martinez-Vicente, Plymouth Marine Laboratory, UK*

### Reviewed by:

*Jason R. Graff, Oregon State University, USA Marco Bellacicco, Centre National D'Etudes Spatiales, Italy*

> \*Correspondence: *Sandy J. Thomalla sthomalla@csir.co.za*

† Present Address: *A. Gilbert Ogunkoya, Department of Ecology, Montana State University, Bozeman, MT, USA*

### Specialty section:

*This article was submitted to Ocean Observation, a section of the journal Frontiers in Marine Science*

Received: *10 October 2016* Accepted: *27 January 2017* Published: *13 February 2017*

### Citation:

*Thomalla SJ, Ogunkoya AG, Vichi M and Swart S (2017) Using Optical Sensors on Gliders to Estimate Phytoplankton Carbon Concentrations and Chlorophyll-to-Carbon Ratios in the Southern Ocean. Front. Mar. Sci. 4:34. doi: 10.3389/fmars.2017.00034* One approach to deriving phytoplankton carbon biomass estimates (Cphyto) at appropriate scales is through optical products. This study uses a high-resolution glider data set in the Sub-Antarctic Zone (SAZ) of the Southern Ocean to compare four different methods of deriving Cphyto from particulate backscattering and fluorescence-derived chlorophyll (chl-a). A comparison of the methods showed that at low (<0.5 mg m−<sup>3</sup> ) chlorophyll concentrations (e.g., early spring and at depth), all four methods produced similar estimates of Cphyto, whereas when chlorophyll concentrations were elevated one method derived higher concentrations of Cphyto than the others. The use of methods derived from particulate backscattering rather than fluorescence can account for cellular adjustments in chl-a:Cphyto that are not driven by biomass alone. A comparison of the glider chl-a:Cphyto ratios from the different optical methods with ratios from laboratory cultures and cruise data found that some optical methods of deriving Cphyto performed better in the SAZ than others and that regionally derived methods may be unsuitable for application to the Southern Ocean. A comparison of the glider chl-a:Cphyto ratios with output from a complex biogeochemical model shows that although a ratio of 0.02 mg chl-a mg C−<sup>1</sup> is an acceptable mean for SAZ phytoplankton (in spring-summer), the model misrepresents the seasonal cycle (with decreasing ratios from spring to summer and low sub-seasonal variability). As such, it is recommended that models expand their allowance for variable chl-a:Cphyto ratios that not only account for phytoplankton acclimation to low light conditions in spring but also to higher optimal chl-a:Cphyto ratios with increasing growth rates in summer.

Keywords: phytoplankton carbon, chlorophyll to carbon ratios, particulate backscattering, gliders, Southern Ocean

# INTRODUCTION

Marine phytoplankton at the global scale have an average biomass turnover time of 1 week or less (Falkowski et al., 1998). Despite their temporary existence, these living organisms can absorb carbon at a rate of 40–50 Pg C y−<sup>1</sup> and are responsible for roughly half the net primary production on Earth (Longhurst et al., 1995; Antoine et al., 1996; Field et al., 1998; Falkowski et al., 1998; Westberry et al., 2008). The Southern Ocean is responsible for 40% of global CO<sup>2</sup> uptake (Gruber et al., 2009) and the subbasin of the Sub-Antarctic Zone (SAZ) is recognized as one of the regions of higher carbon export and an effective atmospheric CO<sup>2</sup> sink (Metzl et al., 1999; McNeil et al., 2001; Trull et al., 2001; Wang et al., 2001). This region of the Southern Ocean is characterized by deep convective mixed layers (>500 m) during winter, which favors the injection of waters rich in inorganic carbon into the mixed layer (Key et al., 2004). Conversely during summer, seasonal heat flux shoals the mixed layer (Swart et al., 2015), which improves the overall availability of light for photosynthesis favoring the transformation of inorganic carbon to particulate organic carbon (POC) and the potential for an effective "biological carbon pump" (Broecker and Peng, 1982; Volk and Hoffert, 1985). The deep mixed layers formed in the SAZ during winter are subducted northwards as Sub-Antarctic Mode Water (SAMW) and Antarctic Intermediate Water (AAIW, McCartney, 1977) driving an effective solubility pump which in combination with the biological pump maintains a strong CO<sup>2</sup> sink year round (McNeil et al., 2007).

If researchers are to accurately reflect the seasonal cycle of phytoplankton production in predictive climate models and thereby improve our understanding of the sensitivities of the biological carbon pump to changes in climate forcing factors (both needed for predicting long term trends), the Southern Ocean ecosystem has to be investigated at the appropriate scales that link the physical drivers to the biogeochemistry (Lévy et al., 2001; Le Quéré et al., 2007; Klein et al., 2008; Doney et al., 2009; Thomalla et al., 2011; Racault et al., 2012; Joubert et al., 2014; Carranza and Gille, 2015; Swart et al., 2015). There is increasing evidence in the Southern Ocean that seasonal to sub-seasonal temporal scales and meso- to submeso- spatial scales play an important role in determining the response of primary producers to physical forcing (Boyd, 2002; Fauchereau et al., 2011; Thomalla et al., 2011, 2015; Lévy et al., 2012; Swart et al., 2015), which may in turn affect their sensitivity to climate change.

Satellite remote sensing provides an essential tool for investigating patterns of phytoplankton variability at highsampling frequency and with good spatial resolution globally. However, remotely detected water-leaving radiances emanate from only the first optical depth and require assumptions about their representativeness of the vertical structure of the water column. The frontier in ocean observation is adequate and sustained spatial sampling of the sub-surface ocean (Rudnick et al., 2004) conducted at an appropriate frequency to define and understand the growth timescales of phytoplankton. Autonomous platforms (e.g., floats and gliders) are able to profile the water column (0–1000 m) and characterize vertical biogeochemistry at smaller scales, but also for sufficiently long periods that may help to reduce uncertainties associated with carbon budgets at longer time scales. In addition, the volume of information that a single glider mission retrieves, can be instrumental in developing and validating statistically robust parameterizations for numerical models, which are otherwise performed with oftentimes inadequate data sets generated from once-off or "classical" (low spatial and/or low temporal frequency) sampling techniques. As such, high-resolution sampling of phytoplankton biomass through the water column is key to reducing uncertainties associated with carbon budgets.

Net autotrophic primary production is ultimately a function of the standing population of phytoplankton biomass, which is a system state variable that is not easily observed in remote regions like the Southern Ocean. Phytoplankton biomass refers to the total quantity of phytoplankton in a given volume of water expressed here as weight in carbon (mg C m−<sup>3</sup> ). Satellite ocean color data can provide a proxy of phytoplankton biomass through empirical combinations of radiometric information to obtain estimates of carbon and/or chlorophyll concentration (e.g., Stramski et al., 1999; Gardner et al., 2006; Blondeau-Patissier et al., 2014). In addition, production models are available to turn this information into rates of primary production (e.g., Behrenfeld and Falkowski, 1997; Arrigo et al., 1998, 2008; Moore and Abbott, 2000; Carr et al., 2006). However, chlorophyll is a dynamic property of phytoplankton (MacIntyre et al., 2002) that is influenced by shifts in the physiology of the cells, with intracellular pigments being modulated in response to changes in growth conditions (e.g., temperature, nutrients, light) (e.g., Halsey and Jones, 2015; Behrenfeld et al., 2016). In addition, different pigment content is expressed by the phylogenetic evolution of phytoplankton species with changes in accessory pigments leading to differences in light absorption per unit chlorophyll (MacIntyre et al., 2002). As such, temporal changes in chlorophyll over large ocean regions can result form physiologically or community driven modifications in cellular chlorophyll-to-carbon ratios, rather than to changes in biomass (Behrenfeld et al., 2005, 2016; Westberry et al., 2008, 2016; Mignot et al., 2014; Bellacicco et al., 2016). Such instances will have strong implications on assessments of long term trends in primary production, ecosystem trophic dynamics, and carbon export (Behrenfeld et al., 2016).

Other optical observations such as backscattering are better correlated to carbon than chlorophyll (Antoine et al., 2011) and can provide independent measures of phytoplankton biomass in open ocean waters (away from regions with highly scattering inorganic material). In addition, they can be measured both in situ and with satellite remote sensing. Unlike chlorophyll, these measures are more likely to be insensitive to changes in the intracellular concentration of pigments (Behrenfeld and Boss, 2006; Behrenfeld et al., 2016; Bellacicco et al., 2016; Westberry et al., 2016). However, the use of optical proxies for total organic carbon is complicated by the highly variable relationships reported in the literature for POC vs. backscattering which can vary by a factor of five (Cetinic et al., 2012 ´ ). Much of the variability between POC and backscattering is driven by differences inherent to the types of particles in the system. For example, the carbon density (or POC to volume ratio) varies between species with diatoms and Phaeocystis having typically lower POC to volume ratios (Cetinic et al., 2012 ´ ). In addition, the ability to differentiate phytoplankton specific carbon (Cphyto) from other suspended particulate matter is a big challenge operationally (Lü et al., 2009). Methodological constraints result in the carbon biomass of phytoplankton being poorly identifiable (Martinez-Vicente et al., 2013) and not easy to distinguish from other types of carbon (Eppley et al., 1992; Oubelkheir et al., 2005). Nonetheless, the strong relationship observed between bio-optical observations of carbon biomass allows the use of autonomous instruments (e.g., gliders and floats) to retrieve data at high spatial and temporal resolution, increasing the capability of capturing higher frequency changes in ocean biogeochemistry (e.g., Mignot et al., 2014; Thomalla et al., 2015). Despite their advantages however, the availability of in situ bio-optical data on both a regional and global scale is still sparse, highlighting the need to prioritize their collection on future research campaigns. Given their growing importance in the trajectory of ocean ecosystem understanding, it is important to develop methods of detecting biogeochemical properties from instruments on autonomous platforms; and carbon from optical sensors is certainly one of the first candidates, given the importance of phytoplankton production in driving the carbon sink (Boss et al., 2008).

Rates of carbon production can also be derived from mathematical models that link standing biomass and environmental conditions to growth rates. Since the very first extensive discussions on models (e.g., Cullen, 1990) chlorophyll and/or carbon have been considered necessary to model irradiance-based growth. Mathematical models (empirical, semi-analytical, and analytical) are used to reconstruct primary production from satellite estimates of chlorophyll or carbon (see Behrenfeld and Falkowski, 1997 for model summary), and are similarly used in global ocean biogeochemical models to determine rates of carbon production (e.g., Moore et al., 2002; Aumont and Bopp, 2006; Vichi et al., 2007). Some of the models include mechanistic representations of phytoplankton acclimation that consider varying amounts of carbon and chlorophyll in the modeled phytoplankton (largely relying on the Geider et al., 1997 formulation). The Southern Ocean is likely to experience light-saturated conditions for a rather limited period, and therefore, the acclimation process and associated changes in chlorophyll-to-carbon ratios in the period leading to the summer blooms are very likely. The majority of global ocean biogeochemical models and Earth System Models however overestimate the magnitude of the spring-summer bloom (e.g., Doney et al., 2009; Steinacher et al., 2009; Vichi and Masina, 2009), which is usually attributed to the coarse resolution global ocean models and their inability to simulate deep vertical mixing. However, McKiver et al. (2015) demonstrated that an increase in horizontal resolution down to 25 km (considered to be eddy-permitting and partly eddy-resolving in the Southern Ocean) did not help to improve the timing and magnitude of the bloom in the SAZ. Coarse resolution models with a 1 to 2◦ grid tend to have an early bloom and a much larger magnitude than observed, and the substantial increase in resolution only partly reduced these biases. They went on to suggest that this behavior might instead result from inaccurate parameterization of the chlorophyll-to-carbon ratios. This strengthens the need to obtain combined measures of chlorophyll and carbon from the world oceans and particularly from the Southern Ocean.

This work tests four different methods of deriving phytoplankton carbon from optical data collected by sensors on a glider deployed in the SAZ. Some of these methods can be adjusted using available data from the SAZ, while other relationships are specific to the region of original sampling and can only be applied to the SAZ as they are. By comparing the different methods we aim to elucidate the differences of choosing one method over another and their relative validity in the Southern Ocean. In addition, the chlorophyll-to-carbon (chl-a:Cphyto) ratios generated from the different methods are compared with in situ data from the literature and a model simulation to highlight their respective ranges and distribution.

# MATERIALS AND METHODS

# Glider Data

The data used for this research were collected in the framework of the Southern Ocean Carbon and Climate Observatory (SOCCO; http://www.socco.org.za) during the Southern Ocean Seasonal Cycle Experiment (SOSCEx; Swart et al., 2012). Seaglider SG573 was deployed south of Gough Island in the South-East Atlantic Ocean at 43.0◦ S, 11◦W (**Figure 1**). The glider was deployed on 25 September 2012 and retrieved on 15 February 2013, resulting in a high-resolution transect of 143 days (∼5.5 months) covering a total distance of 1693 km (see Swart et al., 2015 for more detail). The glider measures a suite of parameters that includes conductivity (salinity), temperature, pressure, dissolved oxygen, chlorophyll-a fluorescence (proxy for phytoplankton concentration), Photosynthetically Active Radiation (PAR), and two wavelengths of optical backscattering by particles, bbp(470) and bbp(700).

The glider was programmed to profile between the surface and 1000 m continuously at a nominal vertical velocity of 10 cm s−<sup>1</sup> . Each dive cycle (which includes a descent and ascent profile) took ∼5 h to complete and covered an average horizontal distance of 2.8 km, rendering a temporal resolution of 2.5 h and a spatial resolution of 1.4 km (between each water column profile). Glider data were linearly interpolated to a 6-hourly frequency in order to grid unevenly spaced profiles, which typically ranged between 4 and 6 hourly. At the deployment and retrieval site of the glider, ship-based CTD cross-calibration casts were carried out yielding two independent inter-calibrations between the gliders, CTD sensors and bottle samples (Swart et al., 2015).

Glider fluorescence from a WETLabs ECO puckTM (BB2Fl-470/700) was dark corrected by subtracting the median fluorescence below 300 m from all raw instrument counts. Fluorescence quenching was isolated by selecting all daylight profiles between local sunrise and sunset plus 2.5 h (as quenching was on occasion observed in the profile following sunset). Fluorescence quenching was subsequently corrected using optical backscattering (bbp), an alternate proxy for phytoplankton biomass that is not susceptible to quenching, based on the methods described in Sackmann et al. (2008). This method assumes a constant chlorophyll-to-carbon ratio throughout the surface waters and no cellular changes in chlorophyll packaging with depth as a photo adaptive strategy to low light levels. When backscattering was unavailable (intermittent sensor malfunction), quenching was corrected by extrapolating the maximum fluorescence value within the mixed layer to the surface according to Xing et al. (2012). In this method, the maximum fluorescence yield within the mixed layer is

assumed to be representative of the mixed layer and implies homogeneity. This however is not necessarily the case as mixing and settling patterns between phytoplankton functional types are dynamic (Quéguiner, 2013). As such, when backscattering data are available, the Sackmann et al. (2008) method of quenching correction is preferred as it allows for small-scale variability in biomass with depth and reduces the possibility of overestimating surface chlorophyll values in the event of a subsurface chlorophyll maxima that is not biomass driven.

Fluorescence was converted to chlorophyll using a combination of the manufacturer's instrument specific chlorophyll conversion factor and in situ chlorophyll samples (250 ml filtered onto GF/F and extracted in 8 ml of 90% acetone for 24 h at −20◦C) collected from CTD casts at glider intercept stations. This allowed all five gliders deployed during the SOSCEx experiment to be plotted simultaneously to form a statistically significant regression from 83 co-located glider chlorophyll and in situ chlorophyll samples (slope = 4.12, intercept= −0.21, r <sup>2</sup> = 0.66) with the manufacturer calibrated glider-based measurements being ∼4 times higher than the shipbased chlorophyll measurements. The slope of the regression was applied to all glider chlorophyll data to correct the manufacturer conversion to chlorophyll to one more suited to the regional characteristics of Southern Ocean chlorophyll (see Swart et al., 2015 for more detail).

Spikes from raw backscattering (λ = 470) were separated using a 7-point running median filter according to Briggs et al. (2011). Raw digital counts were then converted into particulate backscattering (bbp) according to Dall'Olmo et al. (2009), following the equation:

$$b\_{bp470} = 2\pi \,\, \chi\_{\mathcal{P}} \left[ \mathcal{S} \left( \mathcal{C} - D \right) - \beta\_{sw} \right] \tag{1}$$

where S is the instrument specific scaling factor; C are the raw digital counts and D are the dark counts (factory value); χ<sup>p</sup> (equal to 1.1) is the factor used to convert the particulate volume scattering function into bbp (Boss and Pegau, 2001) and βsw is the volume scattering of pure water estimated using the models of Zhang and Hu (2009) and Zhang et al. (2009). Remaining spikes in particulate backscattering were removed with a threshold in shallow (bbp<sup>470</sup> > 0.048) and deep (bbp<sup>470</sup> > 0.0025) water. Profiles with high mean backscattering (bbp > 0.001) in deep waters (> 150 m) were identified as bad profiles and discarded (see Thomalla et al., 2015 for more detail).

Mixed layer depth (MLD) was defined following the criterion of de Boyer Montégut et al. (2004) as the depth where the difference in temperature exceeds 0.2◦C in reference to the temperature at 10 m (1T10m = 0.2◦C). The temperature based criteria was chosen because (1) the salinity data (and hence density data) is contaminated in the final ∼5 weeks of the glider data due to Gooseneck barnacles bio-fouling the conductivity cell of the CTD sensor and (2) intermittent spiking and thermal lag errors (see Garau et al., 2011) of the salinity data resulted in intermittent false MLD determinations when using a density criterion alone to determine the MLD. An investigation into the two MLD methods used on the glider data shows that they match each other closely (r= 0.86, p≪ 0.01) throughout the experiment (Swart et al., 2015).

### Cruise Data

The in situ data used in this work were collected on four cruises to the Southern Ocean between austral winter 2012 and late summer 2013 (**Table 1**), which were typically separated into three legs: The GoodHope Line between Cape Town and Antarctica, the Buoy Run between Antarctica and South Georgia and the marginal ice zone along the continental margin (**Figure 1**). Data consisted of CTD casts and surface underway sampling for chlorophyll and POC collected routinely on SOCCO summer research cruises. CTD casts also provided bbp<sup>470</sup> data that were co-located with Niskin bottle samples for POC. Chlorophyll samples (250 ml) were filtered onto Whatmann 25 mm GF/F glass fiber filters and extracted in 8 ml of 90% acetone at −20◦C for 12–24 h. Fluorescence was measured on a Turner Trilogy Laboratory Fluorometer and converted to chlorophyll using a standard chlorophyll dilution calibration. POC samples (∼2 L) were filtered onto a pre-combusted 25 mm or 47 mm Whatmann GF/F filters and oven dried at 50◦C. Filters were acidified by fuming with concentrated Hydrochloric Acid for 24 h to drive off inorganic carbon and re-dried in the oven. Filters were pelleted into 5 x 8 mm tin capsules and analyzed using a CHN analyser (Parsons et al., 1984; Knap et al., 1994). Blanks were interspersed every 6 to 20 samples (typically every 12). No replicate samples were analyzed.

# Model Data

Numerical model results in the Sub-Antarctic zone were obtained from a simulation of the Biogeochemical Flux Model (BFM, http://bfm-community.eu) coupled with the NEMO ocean model (http://www.nemo-ocean.eu) at 0.25◦ resolution (PELAGOS025, McKiver et al., 2015). The BFM model (Vichi et al., 2007) allows computing phytoplankton dynamics in terms of stoichiometrically variable constituents, which also include variable chlorophyll to carbon ratios modified after Geider et al. (1997). All details of the model simulation are given in McKiver et al. (2015) and full descriptions of the biogeochemical model equations are available in Vichi et al. (2015). The variables extracted from the model results were total phytoplankton carbon and total phytoplankton chlorophyll, with the resulting dominant functional group in the region throughout the year being diatoms.

TABLE 1 | Names and dates of the various cruises indicating the different legs covered and the number of POC samples per cruise, with leg 1 between Cape Town and Antarctica along the GoodHope Line, leg 2 from Antarctica to South Georgia and leg 3 along marginal ice zone (Figure 1).


### Estimation of Phytoplankton Carbon

Four different methods of deriving the fraction of phytoplankton-specific carbon (Cphyto) from backscattering and chlorophyll were applied to the glider optical data. They represent examples of the different types of methods available in the scientific literature.

### Linear Method (30%POC)

Linear relationships between POC and backscattering were first proposed by Stramski et al. (1999). The high correlation in open ocean waters was the result of the dominant organic particle concentration which controls changes in both POC and bbp (Stramski and Morel, 1990; Gardner et al., 1993; Stramski and Reynolds, 1993; Loisel and Morel, 1998). In this method 221 POC samples from the cruises listed in **Table 1** (except SANAE53 where no CTD POC samples were available) were plotted against co-located bbp<sup>470</sup> data (**Figure 2**). POC and bbp<sup>470</sup> were linearly correlated with one another to provide a regression equation specific to the SAZ using a total least square method to account for the uncertainty in both the POC data and the backscattering:

$$POC = \left(39418 \pm 3000\right) b\_{bp470} - \left(13 \pm 6\right). \tag{2}$$

We have assumed an uncertainty of 7% for both the input data to the total least square regression and reported the slope with all digits, as it is customary in the literature (**Table 2**), with the addition of the standard error. The relationships between bbp and POC from various ocean basins in the literature (**Table 2**) are included on **Figure 2** for comparison.

POC however still needs to be converted into a fraction specific to phytoplankton (Cphyto). Behrenfeld et al. (2005) summarized ranges of field based Cphyto:POC ratios from different oceanic regions to an average phytoplankton contribution to total POC of ∼30%. Their 30% summary was derived from the studies of Eppley et al. (1992), Durand et al. (2001), Gundersen et al. (2001), and Oubelkheir (2001), where Cphyto:POC ratios ranged between 19 and 49% in regions that varied from eutrophic to oligotrophic. The 30%POC method as applied here converts bbp<sup>470</sup> into POC using the linear regression (Equation 2) and then uses a constant 30% fraction to represent Cphyto as the phytoplankton contribution to total POC. A comparison of glider bbp<sup>470</sup> with CTD bbp<sup>470</sup> from five collocated profiles (at glider deployment, intercept and retrieval sites) gave a slope of 0.99 (n= 104) making it possible to apply the regression from Equation (2) to the glider bbp<sup>470</sup> data to retrieve POC.

### Behrenfeld et al. (2005) Method (B05)

Behrenfeld et al. (2005) derived a method of estimating Cphyto from remotely sensed backscattering data using a linear relationship between bbp<sup>440</sup> and chlorophyll. In this method, chlorophyll and bbp at 440 nm (m−<sup>1</sup> ) were estimated using the Garver-Siegel-Maritorena (GSM) semi-analytical algorithm (Garver and Siegel, 1997; Maritorena et al., 2002; Siegel et al., 2002) applied to monthly SeaWiFS data from September 1997 to January 2002. While the bbp wavelength analyzed by Behrenfeld et al. (2005) (440 nm) differs to our study (B05) (470 nm) this discrepancy would only result in a small percentage difference in

and South Atlantic and Equatorial Pacific; (4) Loisel et al. (2001) from Mediterranean; (5) Cetinic et al. ´ (2012) from North Atlantic; (6) Stramski et al. (1999) from the Polar Frontal Zone (PFZ); (7) Balch et al. (2010) from north and south Atlantic; (8) Stramski et al. (2008) from Atlantic/Pacific (entire data set, including upwelling). See Table 2 for regression coefficients.

bbp values (Graff et al., 2015). As with Behrenfeld et al. (2005), our bbp<sup>470</sup> was corrected for contributions from non-algal organisms other than phytoplankton (which also contribute to the optical backscattering signal) by subtracting a background value of 3.5 10−<sup>4</sup> m−<sup>1</sup> , with the assumption that this value represents the portion of non-algal particulate matter that does not co-vary with phytoplankton (Behrenfeld et al., 2005). Worth noting however is that the proportion of non-algal particulate matter is not constant as was shown by a study in the Mediterranean where the non-algal contribution to particulate backscattering varied both seasonally and regionally by more than one order of magnitude (Bellacicco et al., 2016). Only 0.02% of the glider backscattering data fell below this threshold and they were discarded. The corrected bbp<sup>470</sup> was then converted into Cphyto through the equation

$$C\_{phyto} = 13000 \left( b\_{bp470} - 3.5 \, 10^{-4} \right). \tag{3}$$

The slope was chosen by Behrenfeld et al. (2005) to give satellite chl-a:Cphyto values within the range compiled by Behrenfeld et al. (2002) from laboratory experiments (average = 0.010, range = 0.001 to >0.06) and to give an average phytoplankton carbon contribution to total particulate organic carbon of ∼30% (range: 24 to 37%). Behrenfeld et al. (2005) did not provide any statistical assessment of the estimated parameters.

### Martinez-Vicente et al. (2013) Method (M13)

Martinez-Vicente et al. (2013) derived phytoplankton carbon from the relationship between Cphyto and in situ backscattering

TABLE 2 | Comparison of POC vs. bbp reported from literature for original wavelength [taken from Cetinic et al. (2012) ´ ] with the addition of results from Graff et al. (2015) and this study.


*<sup>a</sup>excluding upwelling data, <sup>b</sup> for entire data set, including upwelling data and <sup>c</sup>Antarctic Polar Frontal Zone.*

bbp<sup>470</sup> from the euphotic zone of a latitudinal study of the Atlantic ocean using a total least square linear regression. Cphyto was directly estimated from flow cytometry for 6 different groups of phytoplankton using information on phytoplankton abundances, cellular carbon per unit volume and mean cell volume. The total Cphyto concentration per sample was calculated as the sum of the contributions from each phytoplankton type. A significant linear relationship was found between bbp<sup>470</sup> and Cphyto:

$$C\_{phyto} = (30100 \pm 1100) \left[ b\_{bp470} - (7.6 \pm 0.4) \text{ 10}^{-4} \right] \tag{4}$$

The linear regression was limited to <sup>b</sup>bp<sup>470</sup> <sup>&</sup>lt; 0.003 m−<sup>1</sup> because samples above this threshold value (n= 8) exhibited a shift in the relationship that was not possible to describe with a single linear function. In this study we applied the M13 Equation (4) to all glider data, even those higher than 0.003 m−<sup>1</sup> (these are all surface data representing about 3% of the total, with values up to 0.0048 m−<sup>1</sup> ). In addition, the form of Equation (4) implies that the domain does not comprise bbp values lower than 7.6 10−<sup>4</sup> m−<sup>1</sup> . Since in this study most of the bbp<sup>470</sup> values collected by the glider sensor deeper than 150 m were lower than this threshold (about 54% of the total data), they were discarded from the analysis during the application of the method.

### Sathyendranath et al. (2009) Method (S09)

Sathyendranath et al. (2009) derived a method of estimating Cphyto using chlorophyll and POC observations from offshore regions of the North West Atlantic Ocean and the Arabian Sea. In their study, the authors considered that POC incorporates all types of particulate carbon in the system (including phytoplankton, bacteria, detritus, and viruses). The S09 method then assumes that for any given chlorophyll concentration, the minimum particulate carbon content represents an upper bound on phytoplankton carbon. In the S09 method, the authors log-transformed both POC and chlorophyll to linearize the relationship and to reduce the weight of the stations with high values of POC and chlorophyll in the regression analysis. They then used a 1% quantile regression to represent the upper limit on carbon content from phytoplankton alone.

This method can be applied to the available data from the South Atlantic Southern Ocean, using co-located POC and chlorophyll data from both underway and profile CTD samples collected on all cruises listed in **Table 1** and depicted in **Figure 1**. The 1% quantile regression was fitted to the log- transformed data (**Figure 3**) to derive the following equation:

$$C\_{phyto} = 42 \text{ (} chl\text{)}^{0.86} \text{ } \tag{5}$$

The relationships derived by Buck et al. (1996) and Sathyendranath et al. (2009) are included in **Figure 3** for comparison and show that the SAZ data are characterized by lower carbon content per unit chlorophyll and that the values converge to the Sathyendranath et al. (2009) line at higher chlorophyll concentrations (**Figure 3**).

### RESULTS

### Seasonal Evolution of Chlorophyll and POC

The glider transect showed substantial seasonal changes in chlorophyll (**Figure 4A**) with relatively low chlorophyll concentrations (<0.5 mg m−<sup>3</sup> ) found at the beginning of the transect between late September and October, coinciding with the deepest MLDs (down to 200 m). Periods of enhanced chlorophyll were shown to be associated with submesoscale physical forcing of the MLD by Swart et al. (2015) and Thomalla et al. (2015). The spatial scales of the physical field are evident in Supplementary Figure 1, and investigated in more detail in a study by Du

(0.55 +/− 0.03) log<sup>10</sup> Chla], the Cphyto minimum carbon estimates by quantile regression [log<sup>10</sup> POC = (1.63 +/− 0.05) + (0.86 +/− 0.09) log<sup>10</sup> Chla, *p* = 0.01], in dark blue. The relationship between phytoplankton carbon and chlorophyll from Sathyendranath et al. (2009) (green line) and Buck et al. (1996) (light blue) are shown for comparison.

Plessis et al. (in review; see also their **Figure 6**), which shows small-scale excursions of mixed layer density and surface (100 m) stratification for the glider time series. These rapid changes in density are associated with submesoscale features that actively slump horizontal density gradients (Mahadevan et al., 2012) driving enhanced stratification during periods of lighter mixed layers and weaker stratification when mixed layers are dense (Supplementary Figure 1) (Du Plessis et al., in review). The shoaling of the MLD from ∼100 to 20 m through seasonal heat flux in late November saw a concomitant increase in surface chlorophyll (>0.55 mg m−<sup>3</sup> ) that was sustained throughout summer until the end of the sampling period in mid-February.

Overall, there is a rather good visual correspondence between the seasonal evolution of chlorophyll and POC (**Figure 4B**), which was attained through the application of equation 2 to the glider bbp<sup>470</sup> data. The bulk of POC was located within the MLD, as it occurs for chlorophyll, with lowest POC concentrations (50–80 mg C m−<sup>3</sup> ) being observed in October, increasing in November (to about 80–120 mg C m−<sup>3</sup> ) and reaching maximum concentrations in December (with values up to 150 mg C m−<sup>3</sup> ) extending through to February. The more evident difference between the datasets is in late October where an increase in POC is not observed in chlorophyll and in early January, when a decrease in POC is not observed in chlorophyll.

### Phytoplankton Carbon Estimates

**Figures 5B–D** compares the results from the different Cphyto methods for selected depths. Corresponding chlorophyll concentrations are also shown (**Figure 5A**). Cphyto shows a very similar seasonal distribution to that of chlorophyll and very little difference between surface 10 and 40 m concentrations highlighting the homogeneity expected in well-mixed surface waters. When applied to the glider bbp<sup>470</sup> and chlorophyll data, 3 of the methods of retrieving Cphyto described in Section Discussion showed similar results, despite differences in the equations used. The 30%POC and B05 methods (**Figures 5B–D**) showed almost identical distribution patterns with concentrations ranging between 0 and 45 mg C m−<sup>3</sup> . This is not surprising since the slope of the B05 equation is about 1/3 of the one found in Equation (2), and by assuming a 30% contribution due to phytoplankton carbon the numbers become similar. These slopes are also very similar to the more recent equation developed by Graff et al. (2015) Cphyto = 12128(bbp<sup>470</sup> + 4.86 x 10−<sup>5</sup> ) (their **Figure 3**) that utilized direct measurements of Cphyto from the temperate South Atlantic, equatorial upwelling and oligotrophic gyres. In addition, the S09 method (**Figure 5D**), the only method that uses chlorophyll rather than particulate backscattering to retrieve Cphyto, showed remarkably similar results to the methods of B05 and 30%POC.

Values of Cphyto from the M13 method were overall much larger, particularly at 10 and 40 m (**Figures 5B,C**), with values up to 80 mg C m−<sup>3</sup> . With this method, surface concentrations of Cphyto in October (∼30–40 mg C m−<sup>3</sup> ) were lower relative to the remaining transect but high when compared to the other three methods (∼20 mg C m−<sup>3</sup> ). Similarly, from November to February, high surface Cphyto from M13 ranged from ∼40– 100 mg C m−<sup>3</sup> compared with ∼ 20–50 mg C mg−<sup>3</sup> in the other

three methods. The discrepancy between M13 and the other methods decreases with depth (80 m) and is less prominent at the beginning of the transect (October). This indicates that the difference between M13 and the other three methods is generally smaller when chlorophyll concentrations are low (<0.5 mg chl-a m−<sup>3</sup> ) in early spring and at depth (>80 m).

# Seasonal Changes in Chlorophyll to Phytoplankton Carbon Ratios

In general, the surface ratios (**Figure 6**) appear lower in October (0.01–0.02) when mixed layers were deeper and chlorophyll concentrations were low, increasing toward mid-January (0.02– 0.04) with decreasing mixed layers and increasing concentrations of phytoplankton biomass (**Figure 4**). The S09 method, despite being less variable through its dependence on chlorophyll alone, shows a similar background value to the chl-a:Cphyto ratios of the B05 and 30%POC methods, except at the beginning of the time series (last week of September through to the first week of October) where a decrease in chlorophyll is not mirrored by a corresponding decrease in Cphyto (from bbp<sup>470</sup> computed methods) resulting in low chl-a:Cphyto ratios (from bbp<sup>470</sup> computed methods). A similar deviation occurred in the last week of January to the first week of February where S09 chla:Cphyto ratios were noticeably higher than B05 and 30%POC but this time due to an increase in Cphyto (from bbp<sup>470</sup> computed methods) that was not evidenced in chlorophyll. A final example of where chl-a:Cphyto ratios between fluorescence based and bbp<sup>470</sup> based methods deviated is in the last week of October where a peak in the chl-a:Cphyto ratios from bbp<sup>470</sup> computed methods (**Figure 6**) was the result of a distinct drop in carbon that was not evident in chlorophyll (see also **Figure 4B**).

The ratio obtained with the M13 method is about half that obtained from the other methods, with similar short term time variations (∼1–7 days) to the other bbp<sup>470</sup> derived methods but a different seasonal time evolution (i.e., no tendency for M13 chla:Cphyto ratios to increase from October to mid-January). The seasonal evolution of low to high chl-a:Cphyto ratios from spring to summer is evidenced in 30%POC, B05 and S09 methods.

The ratios obtained from the glider data in the SAZ are usually larger than the range obtained with global satellite data (Behrenfeld et al., 2005) as shown in **Figure 6**. This range is more in agreement with the M13 method and the ratio obtained by applying the original Sathyendranath et al. (2009) equation to the data, or by considering all POC as phytoplankton carbon. Interestingly, the model simulation by McKiver et al. (2015), which parameterizes a dynamical ratio, also shows a similar value even if the temporal trend is opposite.

## DISCUSSION

This study uses a high-resolution glider data set from the sub-Antarctic Southern Ocean to compute Cphyto using four different methods available in the literature. The ability to get sound estimates of Cphyto is important as it provides a measure of phytoplankton carbon biomass that is core to many models of net primary productivity, a key indicator of the carbon

(4), and S09 equation (5).

cycle. In addition, good measurements of Cphyto enable us to better understand changes in cellular chlorophyll-to-carbon ratios, which provides information on phytoplankton physiology (e.g., cellular adjustments to changes in light, temperature, and nutrients, Behrenfeld et al., 2005). As such, deriving good estimates of Cphyto and chl-a:Cphyto ratios will enable us to refine model parameterizations of phytoplankton dynamics (Sathyendranath et al., 2009). Since the biological pump in the Southern Ocean drives 33% of global organic carbon flux (Schlitzer, 2002; Lenton et al., 2013), it is of particular importance that we improve our understanding of the biological response of Southern Ocean phytoplankton to climate change.

## Comparison of the Methods

Rather than assess the quality and merit of individual methods of deriving Cphyto against each other, which would require

chl-a:Cphyto ratios derived from the original equation from Sathyendranath et al. (2009) presented as S09original (purple line), (2) the satellite range of ratios from Behrenfeld et al. (2005) (black dotted lines) and (3) the ratios derived from the PELAGOS025 model (McKiver et al., 2015, extracted from the model for the same geographical co-ordinates as the glider transect in time but for a year 2011 simulation, solid black line). The inset shows a detail of the daily signal for the B05 method.

independent phytoplankton data that would inevitably be limited in time and space, this study aims to evaluate the differences of the various methods using high frequency optical glider data from the SAZ as a benchmark. It should also be noted that the methods applied in this research were developed using different data obtained from different oceanic regions. The M13 method is the only equation that was derived from in situ Cphyto data (computed from cell counts and assumptions of carbon content per cell) but from a latitudinal study of the Atlantic; the B05 equation is also non-specific to the Southern Ocean and derived from global satellite data; the 30%POC and S09 methods were recomputed using in situ POC and chlorophyll data from the Southern Ocean. To further extend the number of methods, the original equation from the S09 method developed using data from the NW Atlantic and Arabian Sea (Sathyendranath et al., 2009) has been included in **Figures 6**, **8** for comparison (presented as S09original).

The most evident result of the comparison is that three of the methods cluster together (30%POC, B05, and S09) with only one (M13) being substantially different in magnitude from the others (**Figure 5**). Despite their different origins, Cphyto from 30%POC, B05, and S09 (the only method that is derived from chlorophyll ) all show similar results. The similarity in Cphyto derived using B05 (which made use of a global data set to convert bbp to POC) and 30%POC (that used a regionally specific conversion based on Southern Ocean filtered samples) implies that the conversion from bbp to POC used here is regionally robust. This is confirmed by a comparison of the POC vs. bbp relationship from previously published values (**Figure 2**, **Table 2**), which shows the slope of the regression between POC and bbp to fall well within the observed literature range from various ocean basins. The relationship from our data (from the south Atlantic Southern Ocean; **Figure 2**, **Table 2**) is very similar to that generated from the Mediterranean (Loisel et al., 2001), the North Atlantic Bloom Experiment (NABE, Cetinic et al., ´ 2012) and the Polar Frontal Zone (PFZ, Stramski et al., 1999). Our POC vs. bbp slope is however lower than that from the North and South Atlantic and Equatorial Pacific (Graff et al., 2015), the Ross Sea (Stramski et al., 1999), and the slope that is the basis of the NASA POC algorithm (Stramski et al., 2008), using data from the Pacific and Atlantic but excluding data from the upwelling regions (Cetinic et al., ´ 2012). On the contrary, our slope is higher than that from the Atlantic and Pacific including data from the upwelling regions (Stramski et al., 2008) and yields higher POC (for bbp >0.0027) when compared to the relationship from the north and south Atlantic (Balch et al., 2010).

The M13 method returns much higher concentrations of Cphyto with the difference being less when chlorophyll concentrations are below < 0.5 mg m−<sup>3</sup> ) in the month of October (**Figure 5**) and at depth (∼ 80 m). The reason for the higher Cphyto estimates using M13 is the steep slope (30100) of Equation (4), which is more than twice the slopes used in B05 (13000) and 30%POC (11825). The comparatively steep slope of M13 was noted in Martinez-Vicente et al. (2013) and the explanation proposed for the doubling of parameters between their equation and that of Behrenfeld et al. (2005) was due to "uncertainties in satellite and in situ estimates of bbp and/or differences in the spatio-temporal scales of the two studies." A big difference in the steepness of the slope was similarly noted between M13 (slope = 30100) and a study by Graff et al. (2015) in the North and South Atlantic and Equatorial Pacific (slope = 12128). They suggest that the difference may be due to variability in the assumed volume-based phytoplankton biomass conversions used in the M13 method to convert volume to Cphyto. This can yield an order of magnitude variability in resultant Cphyto estimates (Caron et al., 1995; Dall'Olmo et al., 2011).

Equation (4) from Martinez-Vicente et al. (2013) was generated from the typically low biomass region of the Atlantic, where in situ bbp data were generally less than 0.003 m−<sup>1</sup> . The eight data points that exceeded this threshold were removed from the regression by the authors, as they showed a shift in the relationship that could not be described with a single linear function. An inclusion of the 8 data points (where bbp was > 0.003) would drive an even steeper slope, even higher concentrations of Cphyto, and a greater disagreement between the different methods used here. These 8 data points were characteristic of eutrophic conditions (as opposed to oligotrophic conditions) where larger cells characterized the phytoplankton community (more Nano- and Pico- eukaryotes and less Prochlorochoccus) with lower bbp:Cphyto ratios (see Martinez-Vicente et al., 2013, their Supplementary Figure 1). Similarly, if microphytoplankton were included in their measurements (missed by flowcytometry), the slope would likely be even steeper. This would be consistent with optical theory that predicts a decrease in bbp:Cphyto ratios with an increase in particle size, where Cphyto is proportional to volume (Boss et al., 2004; Martinez-Vicente et al., 2013). However, worth noting is that this theory represents phytoplankton cells as homogenous spheres that can overestimate the real refractive index for certain species (Vaillancourt et al., 2003). As described in Section Martinez-Vicente et al. (2013) method (M13), the M13 equation was applied to all glider bbp data (ranging from 0.001 to 0.01 m−<sup>1</sup> ), with much of the data falling above the domain of applicability of Equation (4) (bbp <0.003). It is thus possible that the overestimated Cphyto concentrations generated from the M13 method are the result of an unsuitable application of a regionally derived model from the low backscattering field of the Atlantic to the relatively high backscattering SAZ. This possibility is supported by all four methods generating similar Cphyto concentrations when biomass was below 0.5 mg m−<sup>3</sup> (i.e., in early spring and at depth ∼ 80 m, **Figure 5**). As such, when chlorophyll concentrations are below 0.5 mg m−<sup>3</sup> it appears to make little difference which method of estimating Cphyto is chosen by the user.

The S09 method showed similar Cphyto results in range and distribution to both B09 and 30%POC, highlighting its robustness in converting chlorophyll to Cphyto. This information is particularly useful for data sets that require conversion to phytoplankton specific carbon in the absence of backscattering measurements. However, when the chl-a:Cphyto ratios from the different methods are compared (**Figure 6**), the limitations of the S09 method are highlighted, namely the very low temporal variability in chl-a:Cphyto ratios. This can be attributed to the derivation of chl-a:Cphyto ratios from a monotonic function of chlorophyll concentration:

$$
abla : \mathcal{C}\_{physo} = \frac{1}{a} \left( \
abla \right)^{1-b} \tag{6}$$

where a= 42 and b= 0.86 are the coefficients derived from Equation (5). The implication of this is that the S09 method, though being non-linear, does not allow for a scenario where Cphyto increases or decreases without a corresponding change in chlorophyll. The remaining three methods on the other hand allow Cphyto to vary independently of chlorophyll, which accounts for the higher temporal variability observed in the chl-a:Cphyto ratios using the B05, 30%POC and M13 methods (**Figure 6**).

### Seasonal and Sub-Seasonal Variations in the Chlorophyll-to-Carbon Ratio

How do surface chlorophyll-to-carbon ratios (chl-a:Cphyto) vary in time? In general chl-a:Cphyto ratios tend to be lower in spring (October), higher in summer (December – January), and decreasing again in late summer (February) in particular in the backscattering methods (30%POC, B05, and M15). The absence of a strong seasonal evolution in M13 is likely a consequence of the method which drives higher Cphyto (relative to the other methods) when chlorophyll concentrations are high, and more similar Cphyto when chlorophyll is below 0.5 mg m−<sup>3</sup> thus dampening any seasonal driven signal in surface chl-a:Cphyto ratios.

According to the literature, chl-a:Cphyto ratios tend to be highest when larger diatoms are present and lowest when smaller species dominate (e.g., Prochlorococcus) (Sathyendranath et al., 2009). As such, some of the seasonal variability observed in the chl-a:Cphyto ratios could be the result of different sized species dominating the community at different times. Smaller cells in spring (October) when light is potentially limiting (mean MLD = 87 m and mean PAR in MLD = 0.01 µ E cm−<sup>2</sup> s −1 ) and late summer (January) when nutrients are potentially limiting (depleted reservoir), vs. larger cells in early summer (December) when light (Supplementary Figure 2) and nutrients are thought to be unrestricted (mean PAR in MLD = 0.03 µ E cm−<sup>2</sup> s −1 ). Previous studies have shown that the range of chl-a:Cphyto ratios observed in the Southern Ocean are the result of seasonal variation in the physiological response of phytoplankton to light and nutrient limitation (Behrenfeld et al., 2005). Phytoplankton chl-a:Cphyto ratios tend to decrease with increasing light conditions and decreasing temperature and nutrient concentrations (and vice versa) (Geider et al., 1997; Socal et al., 1997; Taylor et al., 1997; Behrenfeld et al., 2005; Lü et al., 2009; Halsey and Jones, 2015; Westberry et al., 2016). Increasing chl-a:Cphyto ratios as a physiological response of phytoplankton to low light conditions enables them to increase their light harvesting ability by increasing the volume of chlorophyll packed into their cells (Behrenfeld and Milligan, 2013; Halsey and Jones, 2015; Bellacicco et al., 2016). Indeed, photoacclimation to changes in the underwater light field as a bloom develops have been known to account for the entire range of seasonal variability observed in bulk chlorophyll (e.g., Winn et al., 1995; Westberry et al., 2008). However, since Fe reservoirs are replenished in winter through deep mixing and depleted through biological uptake in the Spring Summer growing season (Boyd et al., 2010), nutrients are not considered the likely driver of the observed increase in chl-a:Cphyto ratios. Similarly, both light (through increased PAR Supplementary Figure 2 and decreased MLD **Figure 4**) and temperature (see Thomalla et al., 2015 their **Figure 2B**) tend to increase from October to December. It is thus unlikely that these processes (light, temperature or nutrients) are responsible for the seasonal increase in chl-a:Cphyto ratios observed during this period (**Figure 6**). However, since laboratory studies consistently show a decrease in chl-a:Cphyto ratios under Fe stress (Greene et al., 1992; Sunda and Huntsman, 1995), it is possible that limiting nutrients could contribute to the observed decrease in chl-a:Cphyto ratios in late summer (February). Similar results of declining chl-a:Cphyto ratios were observed by Behrenfeld et al. (2005) in regions dominated by large spring–summer blooms in phytoplankton abundance, where a decrease in chl-a:Cphyto ratio was observed prior to the seasonal biomass crash..

The physiological response of phytoplankton to relief from Fe stress has shown increases in chl-a:Cphyto ratios as growth rate increases (Greene et al., 1992; Geider et al., 1997; Sunda and Huntsman, 1997). This response is the result of upregulation of photosynthetic machinery and light harvesting capacity (Laws and Bannister, 1980). A measure of the growth rate estimated as the rate of change of MLD integrated chlorophyll (Supplementary Figure 2) shows an increase from 0 at the beginning of the time series (October) to ∼0.017 mg chl-a m−<sup>2</sup> d −1 in mid-December. Although we are unable to quantify the role of Fe in this study, the increase in chl-a:Cphyto ratios over the same time period may be a physiological response to increased growth rates when neither Fe or light were considered limiting. This was a similar response to what was observed during bloom conditions by Westberry et al. (2013) in both natural and purposeful Fe addition experiments. In addition, phytoplankton must photoacclimate as the bloom develops to counter the effects of self-shading, further increasing chl-a:Cphyto ratios over the growing season.

Finally, an alternative explanation for the observed seasonal increase in chl-a:Cphyto ratios is that the ratio of Cphyto to total POC (and to bbp) decreases as the seasonal bloom develops. In other words, as the season progresses a greater percentage of the particulate backscattering signal is due to non-phytoplankton specific carbon (e.g., heterotrophic bacteria, detritus, viruses, ciliates) (Christaki et al., 2011). This was observed in a study in the Mediterranean where the nonalgal contribution to bbp<sup>470</sup> was generally larger in the more productive regions that had elevated phytoplankton abundances (Bellacicco et al., 2016). Behrenfeld et al. (2005) reported that a compiled data set of field derived Cphyto:POC ratios spanning oligotrophic to eutrophic ocean regions ranged from 19 to 49%. These differences reflect system variability in producerconsumer dynamics, processes influencing the particle field and possible differences in export efficiency (Graff et al., 2015). If this ratio varies seasonally as much as regionally (i.e., by a factor of 2.5), it will have a substantial effect on optically derived chl-a:Cphyto ratios. Of the four methods applied here, all three bbp<sup>470</sup> computed methods (B09, 30%POC and M15) apply a constant Cphyto:POC ratio with time. On the other hand, the S09 method may indirectly account for adjustments in Cphyto:POC through the non-linear relationship between chlorophyll and Cphyto that drives an increase in chl-a:Cphyto ratios with increasing chlorophyll. To elaborate; as the bloom develops, phytoplankton biomass increases with a concomitant decrease in the percentage of Cphyto relative to total POC. This may be reflected in the logarithmic relationship between chlorophyll and Cphyto (**Figure 3**), which would be observed as an increase in chl-a:Cphyto ratios with increasing chlorophyll concentrations in the S09 method (**Figure 6**).

Over and above the characteristic seasonal cycle in chla:Cphyto ratios, there is strong sub-seasonal and daily variability that are likely driven by phytoplankton community responses to smaller scale physical processes and day-night physiological adjustments. In the case of the M13 method, where the seasonal cycle is dampened, one can argue that the smaller sub-seasonal scales are in fact dominating the variability. Some distinct examples of smaller scale adjustments in the chl-a:Cphyto ratios are as follows:


The drivers of sub-seasonal variability in chl-a:Cphyto ratios seen in late September (low) and late October (peak) are clearly linked to mesoscale features and associated adjustments in the depth of the mixed layer, which is deep in late September (∼150 m) and shallow in late October (∼25 m). However, the adjustments in chl-a:Cphyto ratios are not a physiological response of the phytoplankton to light. Were this the case, the opposite response would be true i.e., an increase in chl-a:Cphyto ratios would be observed when MLD's were deep and light was supposedly limiting. It is thus more likely that these small scale adjustments in chl-a:Cphyto ratios are the result of a shift in community structure from a small cell dominated community with low growth rates in late September to a population dominated by large cells with high growth rates under high light conditions in late October.

To test whether the diurnal variability in the chl-a:Cphyto signal (see insert in **Figure 6**) was driven by an artifact of residual uncorrected solar quenching (or the quenching correction itself), we compared in **Figure 7** the linear regressions of midnight and midday Cphyto (r <sup>2</sup> = 0.93) to midnight and midday chlorophyll (r <sup>2</sup> = 0.66). Although the coefficient of determination for chlorophyll is lower, this is what can be expected from the effects of acclimation. The normal dispersion of the data however suggests that the variability in chlorophyll is not biased and hence implies that there is no systematic artificial error introduced through the quenching correction (**Figure 7**). As such, it is believed that the daily signal in chl-a:Cphyto ratios is a real response of the phytoplankton community. Interpreting this diurnal variability is difficult as it depends on numerous parameters that include phytoplankton concentration, composition and physiological status together with the detrital and small heterotroph concentration, all of which are typically not known and hence diurnal variability in chl-a:Cphyto ratios remains poorly understood. The likely drivers however include the balance between daytime production and night time degradation, changes in particle size distribution and changes in the refractive index driven by varying internal concentrations of organic compounds (e.g., accumulation of intracellular carbon through photosynthesis) (Kheireddine and Antoine, 2014). Although this small-scale variability elicits further detailed investigations it remains outside of the scope of this manuscript.

### Implications for Model Parameterizations

Some biogeochemical models apply a constant chl-a:Cphyto ratio globally (usually 0.02 mg chl-a mg C−<sup>1</sup> ) even if the constraints are not developed equally (i.e., some regions are data poor), while others have a dynamical function that should include acclimation to prevailing light conditions. It is entirely possible that the inadequate parameterization of this ratio is the reason for the seasonal biases found in biogeochemical models (Doney et al., 2009; Steinacher et al., 2009; Vichi and Masina, 2009). In **Figure 6** we compare the chl-a:Cphyto ratios generated from a Southern Ocean data set (using the different optical methods) with those constrained in satellite data, cruise data and from one medium resolution global model (the PELAGOS025 model analyzed by (McKiver et al., 2015), Model data). The chl-a:Cphyto data from the model was extracted from the same location as the glider transect and over the same dates (month and day) but from a simulation for the year 2011, due to a limitation in the availability of atmospheric forcing functions.

Since Cphyto cannot exceed POC, the chl-a:POC ratio from the cruise data (**Table 1**) was used to set a minimum chl-a:Cphyto ratio assuming all POC was phytoplankton specific (presented as 100%POC, **Figure 6**). These ratios were understandably low (<0.01 mg chl-a mg C−<sup>1</sup> ) but above the minimum satellite range (0.004 mg chl-a mg C−<sup>1</sup> ) of Behrenfeld et al. (2005) and oftentimes close in magnitude to the ratios generated by the M15 method. The ∼50% lower chl-a:Cphyto ratios for M13 are driven by the high Cphyto values that this method generates relative to the other methods, particularly when biomass is high (December to February). It was proposed earlier that the low chl-a:Cphyto ratios were a possible result of an unsuitable application of a regionally derived model from the Atlantic to the SAZ. The S09 method allows us to do a direct comparison of the chl-a:Cphyto results generated from one model derived predominantly from data from the NW Atlantic (S09original) with the same model but derived from data from the Southern Ocean (S09, **Figure 6**). This comparison shows how the application of the S09original model to the SAZ data set results in much lower chl-a:Cphyto ratios that are more inline with those generated by the M13 method, that was similarly derived from cell counts from the Atlantic.

Both the M13 and S09original methods produce chl-a:Cphyto ratios that are very similar to the 100%POC method, which outputs the minimum ratios possible assuming all POC is phytoplankton specific. This is not surprising when one considers that the slope of the M13 method (30100 mg C m−<sup>2</sup> ) is 76%

FIGURE 7 | Linear regressions of daily surface midnight vs. daily surface midday concentrations of (A) Cphyto from the glider transect using the B05 method (slope = 0.97, *r* <sup>2</sup> = 0.93, rmse = 2.49) and (B) chlorophyll (slope = 0.8, *r* <sup>2</sup> = 0.66, rmse = 0.15). The linear least square regression line appears in red with the one to one line in black.

of the POC: bbp slope found in this study (39418 mg C m−<sup>2</sup> ) (**Figure 2**). This finding implies that when the M13 method is applied to the glider data set, there is very little non-algal POC that co-varies with Cphyto. This result is unlikely if one considers that the maximum Cphyto:POC ratio reported by Behrenfeld et al. (2005) is 49% and in the Equatorial Upwelling and temperate spring waters of the Graff et al. (2015) study rarely exceeded 40% (mean = 25%). Although phytoplankton has been known to dominate POC at a high biomass coastal upwelling site (Hobson et al., 1973) this observation is in contrast to the typical low contribution of Cphyto to POC in productive offshore waters (Hobson et al., 1973; Andersson and Rudehäll, 1993). As such, it is more likely that the slope of the M13 method (and S09original) is too high for this region and that the 30%POC, S09 and B05 methods are more appropriately applied to Southern Ocean data sets, supporting the argument of an unsuitable application of a regionally derived model from the Atlantic to the SAZ. The M13 method still resolves the shorter-term variations likely driven by photo-physiology but reduces the summer increase in the ratio explained by adjustments in nutrient driven growth rates and community structure.

A comparison of the results with the PELAGOS model (**Figure 6**) imply that an overall chl-a:Cphyto ratio of 0.02 mg chla mg C−<sup>1</sup> seems adequate to represent the SAZ phytoplankton during spring and summer. However, the glider data indicate that the sub-seasonal and seasonal variations of this parameter are rather high and may play an important role in determining the magnitude of the bloom. In addition, the PELAGOS025 model shows an apparent misrepresentation of the seasonal cycle of chl-a:Cphyto ratios, with high ratios in winter and low ratios in summer (whereas the glider shows an increase in the chl-a:Cphyto ratios from winter through to summer). This is likely because PELAGOS025 is based on the Biogeochemical Flux Model (Vichi et al., 2007; Vichi and Masina, 2009) that uses a variation of the Geider et al. (1997) formulation for light acclimation and chlorophyll regulation. As such, the seasonal mismatch may be a result of the model assumption that all seasonal variability is purely due to acclimation to a prescribed high optimal chla:Cphyto ratio (deep mixed layers in winter driving increased chlorophyll synthesis through packaging responses to low light conditions). This may explain the observed seasonal biases of the models when compared with ocean color data reported in the literature. Rather, if our estimates of Cphyto based on backscattering are appropriate for the Southern Ocean (and this can only be validated with corresponding in situ estimates) then the glider data suggest that models need to account for low light adaptation in winter, which would dampen the acclimation response and allow for lower optimal chl-a:Cphyto ratios in winter. While in summer, the large increase in chlorophyll may need to coincide with a shift toward a community characterized by larger cells, relatively higher growth rates and higher optimal chla:Cphyto ratios. Such a mechanism is currently not implemented in any of the biogeochemical models because they usually consider one single group of diatoms. Numerical models, even the ones with a more sophisticated physiology like the BFM (or PISCES) may account for acclimation to light but are still dominated by the same kind of diatoms without any additional plasticity. The PELAGOS025 model does however do a better job in capturing the range of observed chl-a:Cphyto ratios, which the satellite data do not (**Figure 6**; see satellite range from Behrenfeld et al. (2005) for the Southern Ocean 0.004 – 0.013, their **Table 1**).

# A Reference Chlorophyll-to-Carbon Ratio for the Sub-Antarctic Zone?

The data used for this research provides for the first time Cphyto obtained from a high-resolution glider data set to derive independent surface chl-a:Cphyto ratios from the SAZ. This is a much-needed parameter for biogeochemical models to improve the simulation of phytoplankton blooms in the region. To summarize the range of results obtained with the glider data from the different methods described in Section Materials and Methods, we compare in **Figure 8** their surface (10 m) chl-a:Cphyto ratios to several literature values and ranges (Montagnes et al., 1994; Sunda and Huntsman, 1995; Llewellyn and Gibb, 2000), as well as to the model results, using nonparametric distributions (mean, median and 5–25 percentiles).

Our analysis suggests that it is not possible to establish one single value for the conversion between chlorophyll and carbon. The large range of observed variability is driven by methodological uncertainties, regional differences and large seasonal variations. All chl-a:Cphyto ratios from the different methods applied to the glider data set (30%POC, B05, S09, M15) fell within the data set limits compiled by Behrenfeld et al. (2002), which for the global ocean range between 0.001 and >0.06 mg chl-a mg C −1 (Behrenfeld et al., 2005). Similarly, all methods generated chl-a:Cphyto ratios that fell within the range of laboratory culture measurements collated here (Supplementary Table 1) and within the range of ratios generated when one applies the 30% mean and 19–49% range in Cphyto:POC ratios (Behrenfeld et al., 2005) to the cruise POC data. Worth noting here is that although the Behrenfeld et al. (2005) range of 19– 49% is presented for comparison (derived from the following references: (Eppley et al., 1992; Durand et al., 2001; Gundersen et al., 2001, and Oubelkheir, 2001), this range has been shown to extend to a minimum of 14% by Oubelkheir et al. (2005) and as high as 75% by Martinez-Vicente et al. (2013).

The smaller range (0.02–0.025 mg chl-a mg C−<sup>1</sup> ) of distribution in chl-a:Cphyto ratios from the S09 method (which does not allow for independent adjustments in Cphyto relative to chlorophyll) is clear (as in **Figure 6**). This range increases (0.008 – 0.018 mg chl-a mg C−<sup>1</sup> ) when applying the S09original model, which is characterized by a lower phytoplankton carbon per unit chlorophyll (**Figure 3**), likely due to the different region of origin of the analyzed samples. A much greater spread is evidenced in the cruise data (0.002–0.17 mg chl-a mg C−<sup>1</sup> ) relative to the glider data (0.006–0.05 mg chl-a mg C−<sup>1</sup> ). This is likely driven by the large regional coverage of the cruises relative to the glider (**Figure 1**) with the cruises likely sampling a much wider range of communities exposed to more varied growth conditions. In line with this argument, it follows that the range of variability of the cruise data is on a similar scale to that found in the laboratory culture experiments (0.0001–0.05), which are from a large variety of phytoplankton species and growing conditions.

The high variability in observed chl-a:Cphyto ratios illustrates the potential error associated with predicting Cphyto concentrations based on chlorophyll concentrations alone.

If you round the median of the different data sets off to two decimal places, they fall into two distinct groups. Those with low median chl-a:Cphyto ratios (∼0.01 mg chl-a mg C−<sup>1</sup> ) from the laboratory cultures and glider data when the 100%POC, M13 and S09original methods are applied, and those with higher median chl-a:Cphyto ratios (∼0.02 mg chl-a mg C−<sup>1</sup> ) from the cruise data, PELAGOS025 model and the glider data with the 30%POC, B05 and S09 methods applied. Interestingly, another recent study from the north and south Atlantic, using direct measurements of Cphyto, had the same median chl-a:Cphyto ratio (0.01 range 0.029– 0.002) as the M13 and S09original methods that were similarly all derived from the Atlantic (Graff et al., 2015). These results suggest that if reality is a low median chl-a:Cphyto ratio (∼0.01), then the M13 and S09original methods would produce the best results when applied to the glider data set from the SAZ. On the other hand, if reality is a higher median chl-a:Cphyto ratio (∼0.02) then the 30%POC, S09, and B05 methods better represent reality in the SAZ.

Nevertheless, even assuming that the values with a median of 0.02 mg chl-a mg C−<sup>1</sup> are more realistic, the spread is sensibly large, and apparently not purely driven by acclimation given the phase discrepancy between the model results and the glider data (**Figure 6**). Further insights on the merit of one method over another can only be obtained with the aid of concurrent data on the phytoplankton community composition and their specific chlorophyll and carbon content. To this end, recent advances in technology such as sorting flow-cytometry and high sensitivity elemental analysis can allow for quantitative assessment of Cphyto (e.g., Graff et al., 2015) which will contribute to a more extensive set of field data for evaluating and validating optical methods of determining Cphyto concentrations.

# CONCLUSIONS

This study used optical data from a high-resolution glider transect in the SAZ to compare four different empirical estimates of phytoplankton carbon (three based on particulate backscattering and one on chlorophyll). The chl-a:Cphyto ratios generated from the different methods were compared with in situ data from the literature and a model simulation to inform on their comparative range and distribution. Out of the four methods used, three (30%POC, B05, and S09) showed similar Cphyto concentrations, despite their different origins, in particular when chlorophyll concentrations were below 0.5 mg m−<sup>3</sup> . The fourth (M13) derived higher concentrations of Cphyto when chlorophyll concentrations were high (>0.5 mg m-3). The S09 method produced very similar Cphyto concentrations to B09 and 30%POC, highlighting its robustness in converting chlorophyll to Cphyto in the absence of particulate backscattering. However, the S09 method does not allow for adjustments in Cphyto without proportional changes in chlorophyll and hence cannot account for cellular adjustments in chl-a:Cphyto ratios that are not driven by biomass. Methods derived from bbp on the other hand, showed high seasonal and sub-seasonal variability in chl-a:Cphyto ratios that can be attributed to adjustments in dominant species composition, physiological adaptations to varying light and nutrient regimes, changes in growth rates and variability in the ratio of Cphyto to total POC.

All chl-a:Cphyto ratios generated from the four different methods fell within the literature range compiled by Behrenfeld et al. (2002), the range of collated laboratory culture measurements and the cruise POC data (when a 19– 49% range in Cphyto:POC ratio was applied). Nonetheless, the methods derived from the North Atlantic (M13 and S09original) were shown to generate mean chl-a:Cphyto ratios that were comparatively low (0.01 mg chl-a mg C−<sup>1</sup> ) and did not allow for sufficient variation of non-algal POC with Cphyto. The 30%POC, S09, and B05 methods on the other hand generated higher mean chl-a:Cphyto ratios (0.02 mg chl-a mg C−<sup>1</sup> ) that were considered more appropriate for application to the SAZ. This highlights the potential for unsuitable application of regionally derived methods to the Southern Ocean.

Model simulations tend to overestimate the magnitude and miss the timing of the Southern Ocean spring-summer bloom, which McKiver et al. (2015) suggested was a possible result of inaccurate parameterization of the chl-a:Cphyto ratio. A comparison of the glider surface chl-a:Cphyto ratios with those from laboratory culture experiments from the literature, Southern Ocean cruise data (assuming a Cphyto:POC range of 19–49%) and the same model output used by McKiver et al. (2015) show that although an overall chl-a:Cphyto ratio of 0.02 mg chl-a mg C−<sup>1</sup> could adequately represent the SAZ phytoplankton during spring and summer, the seasonal variation of this parameter is high and may play an important role in characterizing the bloom. It is proposed that the seasonal mismatch between the model and the glider data may result from the models assumption that all seasonal variability is simply due to acclimation. If indeed the seasonal ramp in chl-a:Cphyto ratios observed with the glider is related to higher growth rates (and not to a decrease in % contribution of Cphyto to bbp as the seasonal bloom develops) then these results suggest that models need to accommodate a variable chl-a:Cphyto ratio that accounts for phytoplankton adaptation to low light conditions in spring (low optimal chl-a:Cphyto ratio) and higher optimal chl-a:Cphyto ratios with species-specific increasing growth rates in summer. Another option, as suggested by the work of Bellacicco et al. (2016), is that methods converting backscattering to Cphyto need to take

### REFERENCES


into account the space-time variability of non-algal contributions to particulate backscattering, which can vary by more than one order of magnitude. To further our understanding of the merits of different optical methods of determining Cphyto, additional data is required on concurrent community composition and specific chlorophyll and carbon content.

### AUTHOR CONTRIBUTIONS

All authors contributed to the final manuscript: ST designed the research, processed the data, and contributed to the writing of the manuscript. AO prepared all the figures and the first draft of the manuscript, MV contributed to the analysis of results and the writing of the manuscript, SS provided and processed the glider data. All authors discussed the results and drew the conclusions.

### ACKNOWLEDGMENTS

This work was undertaken and supported through the Southern Ocean Carbon and Climate Observatory (SOCCO) Programme. This work was partially funded by the European Commission 7th framework programme through the GreenSeas Collaborative Project (265294 FP7-ENV-2010). This work was supported by CSIR's Parliamentary Grant funding (SNA2011112600001) and NRF SANAP grants (SNA2011120800004 and SNA14071475720). In addition MV was funded by the SANAP grant TRAIN-SOPP (SNA14072880912, grant no. 93089), University of Cape Town. We thank both the South African Maritime Safety Authority (SAMSA), the South African National Antarctic Programme (SANAP), the captain and officers of the MV SA Agulhas, and the RV SA Agulhas II for the successful completion of the voyages pertaining to this research. We acknowledge the dedication and professionalism shown by the oceanographic engineers of Sea Technology Services. Thanks are extended to the IOP team at the Applied Physics Laboratory, University of Washington and to Dr. Thomas Ryan-Keogh for his assistance in collating the laboratory culture chl-a:Cphyto ratios. We would like to thank the editor and our reviewers for their valuable contribution to the improvement of this manuscript.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fmars. 2017.00034/full#supplementary-material


production from ocean color. Deep Sea Res. II Top. Stud. Oceanogr. 53, 741–770. doi: 10.1016/j.dsr2.2006.01.028


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Thomalla, Ogunkoya, Vichi and Swart. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# A Consumer's Guide to Satellite Remote Sensing of Multiple Phytoplankton Groups in the Global Ocean

Colleen B. Mouw<sup>1</sup> \*, Nick J. Hardman-Mountford<sup>2</sup> , Séverine Alvain<sup>3</sup> , Astrid Bracher 4, 5 , Robert J. W. Brewin6, 7, Annick Bricaud<sup>8</sup> , Aurea M. Ciotti <sup>9</sup> , Emmanuel Devred<sup>10</sup> , Amane Fujiwara<sup>11</sup>, Takafumi Hirata12, 13, Toru Hirawake<sup>14</sup>, Tihomir S. Kostadinov <sup>15</sup> , Shovonlal Roy <sup>16</sup> and Julia Uitz <sup>8</sup>

<sup>1</sup> Graduate School of Oceanography, University of Rhode Island, Narragansett, RI, USA, <sup>2</sup> CSIRO Oceans and Atmosphere, Perth, WA, Australia, <sup>3</sup> Laboratoire d'Océanologie et de Géosciences - UMR 8187 LOG, Centre National de la Recherche Scientifique, Université Lille Nord de France - ULCO, Wimereux, France, <sup>4</sup> Alfred-Wegener-Institute Helmholtz Centre for Polar and Marine Research, Bremerhaven, Germany, <sup>5</sup> Institute of Environmental Physics, University Bremen, Bremen, Germany, <sup>6</sup> Plymouth Marine Laboratory, Plymouth, UK, <sup>7</sup> National Centre for Earth Observation, Plymouth Marine Laboratory, Plymouth, UK, <sup>8</sup> Laboratoire d'Océanographie de Villefranche, Observatoire Océanologique de Villefranche, Centre National de la Recherche Scientifique, Sorbonne Universités, UPMC-Université Paris-VI, Villefranche-sur-Mer, France, <sup>9</sup> Center for Marine Biology, University of São Paulo, São Paulo, Brazil, <sup>10</sup> Bedford Institute of Oceanography, Fisheries and Oceans Canada, Halifax, NS, Canada, <sup>11</sup> Institute of Arctic Climate and Environment Research, Japan Agency for Marine-Earth Science and Technology, Yokosuka, Japan, <sup>12</sup> Faculty of Environmental Earth Science, Hokkaido University, Sapporo, Japan, <sup>13</sup> CREST, Japan Science Technology Agency, Tokyo, Japan, <sup>14</sup> Faculty of Fisheries Sciences, Hokkaido University, Hakodate, Japan, <sup>15</sup> Department of Geography and the Environment, University of Richmond, Richmond, VA, USA, <sup>16</sup> Department of Geography and Environmental Science and School of Agriculture Policy and Development, University of Reading, Reading, UK

Phytoplankton are composed of diverse taxonomical groups, which are manifested as distinct morphology, size, and pigment composition. These characteristics, modulated by their physiological state, impact their light absorption and scattering, allowing them to be detected with ocean color satellite radiometry. There is a growing volume of literature describing satellite algorithms to retrieve information on phytoplankton composition in the ocean. This synthesis provides a review of current methods and a simplified comparison of approaches. The aim is to provide an easily comprehensible resource for non-algorithm developers, who desire to use these products, thereby raising the level of awareness and use of these products and reducing the boundary of expert knowledge needed to make a pragmatic selection of output products with confidence. The satellite input and output products, their associated validation metrics, as well as assumptions, strengths, and limitations of the various algorithm types are described, providing a framework for algorithm organization to assist users and inspire new aspects of algorithm development capable of exploiting the higher spectral, spatial and temporal resolutions from the next generation of ocean color satellites.

Keywords: remote sensing, ocean color, optics, phytoplankton functional types, phytoplankton size classes, particle size distribution, phytoplankton taxonomic composition, bio-optical algorithms

### Edited by:

Chris Bowler, École Normale Supérieure, France

### Reviewed by:

Michael J. Behrenfeld, Oregon State University, USA Daniele Iudicone, Stazione Zoologica Anton Dohrn, Italy

\*Correspondence:

Colleen B. Mouw cmouw@uri.edu

### Specialty section:

This article was submitted to Ocean Observation, a section of the journal Frontiers in Marine Science

Received: 18 October 2016 Accepted: 03 February 2017 Published: 21 February 2017

### Citation:

Mouw CB, Hardman-Mountford NJ, Alvain S, Bracher A, Brewin RJW, Bricaud A, Ciotti AM, Devred E, Fujiwara A, Hirata T, Hirawake T, Kostadinov TS, Roy S and Uitz J (2017) A Consumer's Guide to Satellite Remote Sensing of Multiple Phytoplankton Groups in the Global Ocean. Front. Mar. Sci. 4:41. doi: 10.3389/fmars.2017.00041

**26**

# INTRODUCTION

The determination of phytoplankton community structure using satellite remote sensing has evolved from an aspiration to a highly active area of research, with numerous published approaches available over the past decade. Prior work had focused on the discrimination of dominant single phytoplankton groups such as coccolithophores, Trichodesmium spp., diatoms, and other harmful species such as Karenia brevis, Karenia mikimotoi, Nodularia, and Microcystis (IOCCG, 2014; see chapter 3 and references therein). A variety of approaches have emerged that attempt to discriminate "phytoplankton functional types" (PFT), which include algorithms that retrieve phytoplankton size classes (PSC), phytoplankton taxonomic composition (PTC), or particle size distribution (PSD). In this way, a PFT is an aggregation of phytoplankton, where irrespective of their phylogeny, they share similar biogeochemical or ecological roles. This broad definition lacks specificity, with no universal interpretation (Reynolds et al., 2002). Here PSC, PTC, and PSD serve as a further refinement of PFTs, where the choice of the considered functional type depends on the question at hand. Surveying the existing algorithms, with their varying inputs and outputs, can be overwhelming for nonexperts wishing to use the data products from such approaches and determine which algorithm output may be most applicable to their problem at hand. This guide serves as a synthesis of the existing methods with clear articulation of the underlying approach, satellite input and output products, assumptions, strengths, limitations, and validation metrics.

There are several recent reviews of research accomplishments of phytoplankton composition retrieval from satellite (Nair et al., 2008; Brewin R. J. et al., 2011; De Moraes Rudorff and Kampel, 2012; IOCCG, 2014). Nair et al. (2008) provide a review of singlespecies and multiple type retrievals, while De Moraes Rudorff and Kampel (2012) review various algorithm approaches (empirical, semi-analytical, analytic). Brewin R. J. et al. (2011) directly compare the performance of PFT and PSC algorithms. IOCCG (2014) provides a comprehensive report of PFT accomplishments to date, giving users detailed information on the various satellite PFT techniques. Yet, since the time of these reviews the literature has grown quickly. Building on the IOCCG report, the goal here is to provide a simple guide to current PFT techniques that is attractive to a broad audience of marine scientists. We provide a direct comparison of the assumptions, strengths, limitations, required satellite input and output products and performance metrics for the different approaches. The goal of this guide is to provide such a comparison in accessible form to reduce the barrier of expert knowledge needed for users to make a sound and confident selection of an algorithm or group of algorithms. To address a similar requirement for primary productivity models, Behrenfeld and Falkowski (1997) produced a "consumer's guide to primary productivity models"; this contribution seeks to address a similar need for the users of PFT satellite products. Given phytoplankton form the base of the aquatic food web and their composition impacts the structure, function, and sustainability of the whole food web, we anticipate a broad user community, including: numerical model developers, environmental, and fisheries management entities, those seeking to understand climate-related changes in marine ecosystems and the carbon cycle, and members of the satellite remote sensing community that are non-PFT algorithm developers. Observationalists wanting to provide information to the broadest community are often looking for guidance on what variables or types of measurements would be of the highest value, in addition to identifying tools to put their observations into a larger context. Satellite remote sensing adds valuable synoptic observations on spatio-temporal scales impossible to sample in situ. In addition, by summarizing the parameters utilized in algorithm development, as well as satellite inputs and outputs, we aim to motivate identification of non-exploited parameter space and new algorithm development for extended PFT capability into the future.

Here, we focus on global open ocean methods solely dependent on inputs from ocean color radiance or its derived products. Thus, we exclude ecologically based methods that require additional physical and spatio-temporal information (e.g., Raitsos et al., 2008; Palacz et al., 2013). We utilize all of the algorithms that Kostadinov et al. (2017) directly compare plus three additional algorithms (Hirata et al., 2008; Devred et al., 2011; Li et al., 2013).

Unlike the "consumer's guide to primary productivity models" (Behrenfeld and Falkowski, 1997), where net primary production was the single common output between all compared models, satellite PFT algorithms have a variety of phytoplankton classes, units, and satellite product outputs. This presents an additional layer of challenge, precluding direct comparison of algorithm performance and explicit "how to" instructions as found in Behrenfeld and Falkowski (1997). Instead, other metrics, such as phenological cycles, are being explored as a way to intercompare PFT algorithms (Kostadinov et al., 2017). It is not our purpose here to inter-compare algorithm performance, rather we seek to provide users with a simplified "go to" reference to understand existing algorithm types, their associated strengths and limitations, input requirements and output products, to aid in selecting the satellite PFT model that may best fit their application.

# ALGORITHM OVERVIEW

Here, we focus on the four algorithm types that derive PFTs that are classified according to their theoretical basis, and include abundance-, radiance-, absorption-, and scattering-based approaches (**Figure 1**). The underlying assumptions and basic constructs for each of these algorithm types are described. We begin with the satellite inputs, followed by the outputs, then describe how they were derived (algorithm basis) and how successful the algorithm has been shown so far at retrieving the desired products (validation). A summary of notation can be found in **Table 1**.

# Understanding Satellite Data Inputs: Ocean Color Radiometry

A satellite ocean color radiometer measures light (radiance) at the top of the atmosphere. On the global scale, the atmosphere

alone typically accounts for >90% of this signal (Mobley, 1994). After atmospheric correction, the primary measured variable is spectral remote sensing reflectance [Rrs(λ)] or normalized waterleaving radiance [nLw(λ)]. [Note that these variables are related via Rrs(λ) = nLw(λ)/F0(λ), where F0(λ) is the extraterrestrial solar irradiance centered at wavelength λ (Thuillier et al., 2003)]. In open ocean waters, the threshold of uncertainty acceptance for Rrs(λ) is 5% (Bailey and Werdell, 2006). All other ocean color variables are estimated from Rrs(λ) (**Figure 2**). This means that the inherent optical properties (IOPs, i.e., absorption and scattering/backscattering), which are independent from the ambient light field, as well as, biogeochemical variables such as chlorophyll-a concentration, [Chl], are estimated from Rrs(λ), not measured directly from space. Approximate relationships between Rrs(λ) and IOPs were presented by Gordon et al. (1988) so that:

$$R\_{rs}(\lambda) = \Re \frac{f(\lambda)}{Q(\lambda)} \frac{b\_b(\lambda)}{a(\lambda) + b\_b(\lambda)} \tag{1}$$

where, a(λ) is spectral total absorption coefficient and b<sup>b</sup> (λ) is spectral total backscattering coefficient, ℜ is a factor that accounts for reflection and refraction at the air-water interface, and f/Q accounts for the bidirectional nature of reflectance (Morel et al., 2002). The IOPs absorption and backscattering are functions of biological/biogeochemical variables. Phytoplankton abundance, composition and physiological status impact [Chl], PSD, light absorption, and backscattering, and thus Rrs(λ). The algorithms that utilize absorption and backscattering satellite inputs obtain these IOP parameters from a variety of different semi-analytical inversion algorithms that are all fundamentally derived from the basic construct of Equation (1; Werdell et al., 2013; **Figure 2**). In contrary, the Phytoplankton Differential Optical Absorption Spectroscopy (PhytoDOAS) algorithm uses top of atmosphere satellite reflectance directly as input, to fit (and separate) simultaneously all absorbers in the atmosphere and ocean—accounting for atmospheric affects within the algorithm (Bracher et al., 2009; Sadeghi et al., 2012a).

Abundance-based algorithms use [Chl] as a satellite input (**Figure 2**). To date, all published abundance-based models utilize [Chl] derived by an empirical approach (O'Reilly et al., 1998),

$$\log\_{10}\left[\text{Chl}\right] = a\_0 + \sum\_{i=1}^{4} a\_i \log\_{10}\left(\frac{R\_{rs}(\lambda\_{blue})}{R\_{rs}(\lambda\_{green})}\right)^i \qquad (2)$$

where, a0–a<sup>4</sup> are sensor-specific coefficients and Rrs(λblue) is the greatest of several input Rrs(λ) values. However, within the constructs of the PFT algorithms, there is no reason why semi-analytically determined [Chl] could not

### TABLE 1 | Summary of notation (units in parentheses, where applicable).

### Optical parameters


### Pigments



be used in place of empirically determined [Chl]. The sensor-specific coefficients and bands are available at: http://oceancolor.gsfc.nasa.gov/cms/atbd/chlor\_a. The level of acceptable uncertainty for [Chl] is 35% (Bailey and Werdell, 2006).

Within the portion of the satellite Rrs(λ) signal that is attributed to phytoplankton (absorption by pigments and scattering by cellular material), pigment abundance is primarily responsible for first order magnitude variability in Rrs(λ), while spectral shape differences associated with diversity in the taxonomic composition are secondary (Ciotti et al., 1999). Therefore, it is important to consider the overall phytoplankton contribution to total absorption and scattering budgets. Mouw et al. (2012) quantified this by looking at model output over the range of optical variability encountered in the global ocean considering scenarios where phytoplankton size did and did not vary. They find the magnitude of the [Chl] contribution to Rrs(443) (443 nm is the wavelength where greatest phytoplankton absorption occurs) is much greater than the contribution of phytoplankton taxonomic composition to Rrs(443) variability (see their Figures 6–8). This is due to the fact that chlorophylla, a pigment ubiquitous to all phytoplankton, has maximum absorption at 443 nm. PFT algorithms that exploit these second order characteristics, after accounting for the presence of colored dissolved organic matter (CDOM) and non-algal particles (NAP), are therefore subject to limitations due to relatively low signalto-noise ratio of the residuals, that is, they operate near the limits of what is retrievable by the current state-of-the-art (e.g., Evers-King et al., 2014). Conversely, PFT algorithms that use the dominant abundance signal, such as [Chl], phytoplankton absorption, or particulate backscatter, are less impacted but have to face other limitations such as uncertainty in relationships between these properties and phytoplankton grouping.

# Understanding Satellite PFT Outputs: PSC, PTC, and PSD

Here, we seek to summarize and simplify the satellite phytoplankton functional type algorithm products or outputs. The PSC output is most commonly grouped as pico- (0.2– 2µm), nano- (2–20µm), and/or microplankton (>20µm) following the size classification scheme proposed by Sieburth et al. (1978). However, a few models allow for multicomponent size classes not constrained by the traditional size groupings (Roy et al., 2013; Brewin et al., 2014b). The PSD satellite output (Kostadinov et al., 2009, 2010; Roy et al., 2013) can conform to the Sieburth et al. (1978) size classification. The PTC algorithms have a variety of outputs, dictated largely by the resolution of information available from in situ calibration and/or validation datasets. The PHYSAT approach (Alvain et al., 2005, 2008; Ben Mustapha et al., 2014) retrieves nanoeukaryotes, haptophytes (a major component of the nanoflagellates), Prochlorococcus, Synechococcus-like cyanobacteria, diatoms, coccolithophores, and Phaeocystis-like phytoplankton. The Hirata et al. (2011) approach retrieves pico-eukaryotes, prymnesiophytes (synonymous with haptophytes), diatoms, prokaryotes, green algae (chlorophytes), dinoflagellates, and Prochlorococcus sp., in addition to the main pico, nano, and micro size classes. The PhytoDOAS algorithm (Bracher et al., 2009; Sadeghi et al., 2012a) retrieves cyanobacteria, diatoms, coccolithophores, and dinoflagellates. We group similar classes together for clarity and simplicity. For example, haptophytes retrieved by Alvain et al. (2005, 2008) and Sadeghi et al. (2012a) are grouped with prymnesiophytes retrieved by Hirata et al. (2011). Prochlorococcus and Synechococcus, along with the broader prokaryotes class obtained by Hirata et al. (2011), are grouped as cyanobacteria (**Table 2**). Algorithm abbreviations follow those established by the algorithm's author(s), are consistent with those in Kostadinov et al. (2017), and are noted in **Figure 1** and **Table 2**.

These PFT output products are similar but are not identical and are defined by distinct units. These include dominance, [Chl] for each group (mg m−<sup>3</sup> ), fractional [Chl] (%), fractional

biovolume (%), absorption (m−<sup>1</sup> ) of each group, and a continuous size parameter varying from 0 to 1 (see Equation 1 and **Table 3**). We also simplify output with regards to units. All phytoplankton groups or size classes, regardless of units, are grouped together in **Table 2** and **Figure 3**, which provide an overview of all algorithms. While users will most certainly require unit information, the overview table allows easy identification of the citations for the outputs of interest. For greater depth of information regarding units, a full list of output products, their validation source, and validation metrics are provided in **Table 3**.

An important consideration is the aspect of phytoplankton group dominance. Alvain et al. (2005, 2008) and Hirata et al. (2008) retrieve the dominant group for a given satellite image pixel. Alvain et al. (2005) define dominance as situations in which a given phytoplankton group is the major contributor to the radiance anomaly. This contribution is retrieved as dominant when the ratio (biomarker pigment concentration/[Chl]) value is at least equal to 50% of the value that will be observed if the phytoplankton group was alone in the sample. This approach allows an empirical relationship between radiances anomalies and in situ information. For this reason, PHYSAT interpretation needs to be carefully considered in terms of in situ data used to give a name to the remotely sensed signal. Alvain et al. (2005) classify daily images and compile monthly maps of the most frequent dominant phytoplankton group. The group present in more than half of the daily images is assigned as dominant in the monthly compilation. When no group remains dominant over the whole month, pixels are labeled as unidentified. Hirata et al. (2008) determine PSCs from diagnostic pigments and relate them to phytoplankton absorption at 443 nm [aph(443)] to retrieve PSCs from satellite imagery. In the development stage of relating diagnostic pigments to aph(443) in situ, a PSC is defined as dominant if the marker pigment to diagnostic pigment ratio is >45%. However, in applying the approach to aph(443) imagery, PSCs are determined based on threshold ranges of aph(443), as such for a given pixel, only a single dominant type output is classified, regardless of temporal resolution of the satellite imagery. These are considerations users need to be aware of and can impact their interpretation and use. Further, when comparing satellite algorithms with biogeochemical model outputs, dominance (highest percentage of group) will vary whether one considers dominance of [Chl], aph(λ), bbp(λ), or carbon—requiring care to ensure comparisons are done on the same terms.

### Algorithm Basis

Abundance-based algorithms are based on the general observation that in the global open ocean a change in [Chl] is associated with a change in phytoplankton composition or size structure. The basis of this approach is that there is an upper limit of [Chl] in small cells imposed from genotypic and phenotypic constraints. Beyond this value, larger phytoplankton are responsible for an increase in [Chl] (Yentsch and Phinney, 1989; Chisholm, 1992).

Morel and Berthon (1989) suggested near surface [Chl] is related to water column-integrated chlorophyll content and its vertical distribution. Extending this work, Uitz et al. (2006) proposed quantitative relationships between the near surface [Chl] and (i) the water-column integrated chlorophyll content, (ii) its vertical distribution, and (iii) its community composition


where other size classes could be inferred but are not directly retrieved are indicated with "(x)". Notation for column headers can be found inTable 1.

### FIGURE 3 | Continued

circles: radiance (red), chlorophyll concentration (green), absorption (blue), and scattering (yellow). Overlapping circles indicate two or more satellite input products are utilized. (C) Overview algorithms satellite output by PFT types. The colored circles indicate the PFT type of the output products (phytoplankton taxonomic class (PTC, green), phytoplankton size class (PSC, yellow), and particle size distribution (PSD, blue). Overlapping circles indicate where a given algorithm produces two or more satellite output product types. The color of the text in all subplots indicates the algorithm type: abundance (green), radiance (red), absorption (black) and scattering (blue). Algorithm abbreviations are as in Figure 1 and Tables 2, 3.

in terms of three pigment-based PSC. The relationships were established from the analysis of a large high precision liquid chromatography (HPLC) pigment database, covering a broad range of trophic conditions in the global open ocean. Uitz et al. (2006) used a modified version of the diagnostic pigment indices of Vidussi et al. (2001) (described in the Algorithm Validation Section) to determine the depth-resolved contribution to the total chlorophyll biomass of three PSCs (pico-, nano- , and microphytoplankton). The resulting PSC-specific vertical profiles of [Chl] from stratified waters were discriminated from those sampled in well-mixed waters based on the ratio of the euphotic layer depth (calculated from the vertical [Chl] profile following Morel and Maritorena, 2001) and the mixed layer depth (extracted from a global monthly climatology). For the stratified and mixed waters, the [Chl] profiles of pico-, nano- , and microphytoplankton were sorted in trophic categories, defined by successive intervals of surface [Chl]. For each trophic category, average profiles of [Chl] associated with the pico- , nano, and microphytoplankton were calculated. The shape and magnitude of these profiles showed regular changes along the trophic gradient and, thus, could be parameterized as a function of surface [Chl]. Applied in a continuous manner to any given satellite-derived surface [Chl], the resulting empirical parameterization enables the ability to derive a vertical profile of [Chl] for each of the three pigment-based PSCs.

Hirata et al. (2011) estimate fractions of three PSCs and seven PTCs from empirical relationships between [Chl] and diagnostic pigments of various phytoplankton groups (see equations and coefficients in Hirata et al., 2011), based on global observations that abundance and composition of phytoplankton are not necessarily independent/de-coupled on synoptic scale. Brewin et al. (2010), extending the model proposed by Sathyendranath et al. (2001), describe the exponential functions that relate [Chl] to the fractional contribution of various PSCs,

$$\left[\text{Chl}\right]\_{p,n} = \text{C}^{m}\_{p,n} \left[1 - \exp\left(-\text{S}\_{p,n} \left[\text{Chl}\right]\right)\right] \tag{3a}$$

$$[\text{C}lh]\_{\mathfrak{p}} = \mathcal{C}\_{\mathfrak{p}}^{\mathfrak{m}}[1 - \exp\left(-\mathbb{S}\_{\mathfrak{p}}\left[\text{C}lh\right]\right)] \tag{3b}$$

$$[\text{Chl}]\_n = [\text{Chl}]\_{p,n} - [\text{Chl}]\_p \tag{3c}$$

$$[\text{Chl}]\_m = \begin{bmatrix} \text{Chl} \end{bmatrix} - [\text{Chl}]\_{p,n} \tag{3d}$$

where subscripts p, n, and m refer to pico- (>0.2–2 µm), nano- (>2–20 µm), and microplankton (>20 µm), respectively. C m p,n and C m p are asymptotic maximum values for the associated size classes and Sp,<sup>n</sup> and S<sup>p</sup> determine the increase in size-fractionated [Chl] (parameter values can be found in Table 2 of Brewin et al., 2015), and have been found to vary with environmental conditions (Brewin et al., 2015; Ward, 2015). Both Brewin et al. (2010, 2012), Brewin R. J. W. et al. (2011) and Hirata et al. (2011) utilize the continuum of [Chl] (please see Figure 2 in Hirata et al., 2011 and Figure 4A in Brewin et al., 2010).

Radiance-based algorithms classify PFTs based on the shape and/or magnitude or the satellite-observed Rrs(λ) or nLw(λ). Radiance-based approaches assume that, after normalization, changes in radiance coincide with changes in PFT composition, as opposed to other in-water constituents such as CDOM or NAP that may or may not covary with the phytoplankton (e.g., Siegel et al., 2005). Alvain et al. (2005, 2008) normalize Rrs(λ) to [Chl] and identify characteristic spectral bounds for several PTCs in terms of shape and amplitude (Ben Mustapha et al., 2014): nanoeukaryotes, Prochlorococcus, Synechococcus-like cyanobacteria, diatoms, Phaeocystis-like cells, and coccolithophores. More recently, based on theoretical relationships between radiance anomalies and specific phytoplankton groups, PHYSAT has been shown to potentially detect phytoplankton assemblages of several PTC as opposed to a single dominant one (Rêve et al., in revision). Alternatively, Li et al. (2013) consider a variety of spectral features on surface reflectance and use machine learning to select the most significant of these. They find continuum-removed and spectral curvature are the most significant spectral features with particular importance around 440–555 nm, which isolate absorption characteristics and measure non-linearity. They utilize these results with support vector regression to estimate PSCs.

Absorption-based algorithms comprise by far the majority of existing approaches. All of the approaches have some level of dependence on the spectral magnitude or shape of phytoplankton absorption [aph(λ)]. The magnitude of aph(λ) is related to pigment composition and total pigment concentration, dominated by [Chl] at the peak wavelength (for oceanic waters) of 443 nm. Size information is contained in the absorption spectrum due to pigment packaging (e.g., Bricaud and Morel, 1986). Some of the approaches utilize chlorophyllspecific phytoplankton absorption in which phytoplankton absorption is normalized to [Chl] (Bracher et al., 2009; Mouw and Yoder, 2010a; Sadeghi et al., 2012a; Roy et al., 2013), either for a specific wavelength or to derive a spectral shape or slope that is related to second order signals including pigment composition and packaging. Several of the approaches (Ciotti and Bricaud, 2006; Mouw and Yoder, 2010a; Bricaud et al., 2012) stem from the theoretical underpinning of Ciotti et al. (2002) who identify that, despite the physiological and taxonomic variability, variation in aph(λ) spectral shape can be defined by changes in the dominant size class. They determine chlorophyll-specific phytoplankton absorption (a ∗ ph) as weighted between normalized mean pico- (a¯ ∗ ph,pico)


TABLE 3 | Algorithm

 retrieval parameters

 and validation

 metrics.

(Continued)


**35**

the exception of CB06 (Brazil continental shelf) and FUJI11 (Arctic and sub-Arctic) were developed for global extent. CB06 was later verified for global use by Bricaud et al. (2012).

and microplankton (a¯ ∗ ph, micro) chlorophyll-specific absorption spectra.

$$a\_{ph}^{\*}\left(\lambda\right) = \left[\mathcal{S}\_{\mathcal{f}} \times \bar{a}\_{ph,\,pico}^{\*}\left(\lambda\right)\right] + \left[\left(1 - \mathcal{S}\_{\mathcal{f}}\right) \times \bar{a}\_{ph,\,micro}^{\*}\left(\lambda\right)\right] \tag{4}$$

where S<sup>f</sup> is a dimensionless index constrained to vary between 0 and 1, specifying the relative contributions of microphytoplankton and picophytoplankton, respectively, to phytoplankton absorption. Equation (4) is based on the fact that the shape of the phytoplankton absorption spectrum flattens with increasing cell size. This relationship results from pigments being contained within particles (rather than in solution), known as the "discreetness effect," and secondarily how pigments are packaged within the cell, known as the "packaging effect" (Morel and Bricaud, 1981). Small cells have little cellular material between the chloroplast and cell wall making them highly efficient absorbers, resulting in higher magnitude and more peaked absorption. With large cells, light has to penetrate more cellular material to reach the chloroplast after passing through the cell wall, resulting in muted absorption affinity and in some cases shelf-shading (see Figure 7E in Ciotti et al., 2002). Note that the shape of the phytoplankton absorption spectrum (and therefore the S<sup>f</sup> -value) can be affected by variations in pigment composition and intracellular pigment concentration resulting from photoacclimation, independent of cell size. Ciotti and Bricaud (2006) proposed a new a¯ ∗ ph,pico vector, based on an oceanic data set and Bricaud et al. (2012) utilize this relationship directly to retrieve S<sup>f</sup> , absorption due to nonalgal particles and colored dissolved organic matter at 443 nm [adg (443)] and the spectral slope of adg (λ) through an inversion model. Mouw and Yoder (2010a) modify Equation (4) to vary with the percentage of microplankton (Sfm) rather than picoplankton. They develop an optical look-up-table (LUT) that contains ranges of Sfm, [Chl], and adg (λ) from which Rrs(λ) is calculated from radiative transfer. They utilize satellite [Chl] and adg (443) to narrow the search space within the LUT, then find the closest match between satellite Rrs(λ) and LUT Rrs(λ) and retrieve the associated Sfm from the LUT.

Hirata et al. (2008) do not use the Ciotti et al. (2002) construct (i.e., Equation 4) that utilizes multiple wavelengths to characterize the spectral shape of aph(λ). Instead, they identify a tight relationship between the magnitude of phytoplankton absorption at a single wavelength [aph(443)], related to [Chl], and the slope of aph(443) to aph(510), which is influenced by pigment packaging and composition. When this approach is applied to satellite data, it only uses aph(443), and determines dominate size class using boundaries in aph(443).

The approaches of Devred et al. (2011) and Brewin R. J. W. et al. (2011) are similar to that of Brewin et al. (2010), applying the constructs of Equations (3a–3d). Devred et al. (2011) use Equation (3) to derive chlorophyll-specific absorption coefficients for three PSCs. When this approach is applied to satellite data, it uses a semi-analytic inversion algorithm together with the derived chlorophyll-specific absorption coefficients to estimate size-fractionated [Chl], not using Equation (3) at this stage; hence this approach does not assume covariance between total [Chl] and size-fractionated [Chl] (as with an abundance-based approach). Fujiwara et al. (2011) is the only absorption-based approach that also uses backscatter as an input, which they determine empirically from Rrs(λ) band ratios. They estimate PSC utilizing empirical relationships with phytoplankton absorption-spectra ratios and the particulate backscatter slope.

Roy et al. (2011) developed a semi-analytical algorithm based on phytoplankton absorption at a red wavelength (676 nm) to compute the equivalent spherical diameter of phytoplankton. Roy et al. (2013) further extended the algorithm to heterogeneous phytoplankton populations, where they utilized phytoplankton absorption at 676 nm to compute the PSD corresponding to the phytoplankton cells alone, and derived the power-law exponent/slope of the phytoplankton size spectrum. Knowing the slope of the phytoplankton cell-size distribution, the proportions of [Chl] within any diameter range of PSCs can be calculated.

The PhytoDOAS algorithm (Bracher et al., 2009; Sadeghi et al., 2012a) uses hyperspectral top of atmosphere reflectances to identify spectral features associated with PTCs. This approach requires hyperspectral satellite data and has been applied to the SCanning Imaging Absorption SpectroMeter for Atmospheric CHartographY (SCIAMACHY) onboard ENVISAT (more details in Bovensmann et al., 1999), which has a limited spatial coverage and resolution with 6-day revisit and 30 by 60 km pixel size. The differential optical absorption spectroscopy (DOAS) technique exploits sharp spectral features and when extended to phytoplankton, differentiates on spectral specific-absorption features of major PTCs. The DOAS method, utilizes observed backscattered radiation, normalized to the solar irradiance, at the top of the atmosphere and absorption cross sections (i.e., specific absorption coefficients, of all important absorbing constituents varying spectrally in the atmosphere-ocean system). The method uses non-linear optimization to fit these "differential absorption cross sections" of different phytoplankton groups, water vapor, and atmospheric trace gases: O3, O4, NO2, glyoxal (CHOCHO), iodine oxide (IO), and spectral features caused by filling-in of Fraunhofer Lines due to Raman scattering. The contributions of broad-band scattering and absorption features, such as Mie- and Ray-leigh scattering in the atmosphere or NAP and CDOM in water, are approximated by a second-order polynomial in each fit. Bracher et al. (2009) adopted DOAS within 429–495 nm to retrieve absorption and biomass of cyanobacteria and diatoms independently. Sadeghi et al. (2012a) extended the method further to simultaneously retrieve diatoms, coccolithophores, and dinoflagellates over the 429–521 nm spectral range.

To date, there have only been two scattering-based algorithms published. Backscattering approaches retrieve information on all particles rather than just phytoplankton. Generally, the backscattering coefficient decreases according to a power law function with increasing wavelength. Smaller particles have a greater backscattering slope (η) than larger particles. Montes-Hugo et al. (2008) was the first to estimate phytoplankton size by considering the backscattering slope. They demonstrated their approach near the western shelf of the Antarctic Peninsula. Kostadinov et al. (2009) was the first to demonstrate the approach globally. They estimate spectral particulate backscattering

[bbp(λ)] from Rrs(λ) and then calculate η from bbp(λ) based on Loisel et al. (2006). Using η, the PSD slope and reference abundance of particles are retrieved from a look-up-table that is constructed based on theoretical Mie scattering computations. These parameters are then used to estimate the number and volume concentrations for pico, nano, and micro sized particles. Assuming the relative proportions of biovolume are roughly constant across size classes, Kostadinov et al. (2010) validate the Kostadinov et al. (2009) approach with pigments and confirm pigment-based micro-, nano-, and pico-sized phytoplankton approximately represent micro-, nano-, and pico-sized particles derived from backscattering. Kostadinov et al. (2016a) further develop the KSM09 approach by using existing allometric relationships (Menden-Deuer and Lessard, 2000) to convert biovolume calculated from the PSD to phytoplankton carbon (C) in these three PSCs. These PSCs are the only carbon-based PFT retrievals available to date. The approach can be used to estimate phytoplankton carbon concentrations (absolute and fractional) in any size class in the 0.5–50 µm diameter range. It is desirable to express the PFTs in terms of carbon because: it is relatively insensitive to variations in phytoplankton physiological, unlike [Chl]; it is relevant to the carbon cycle and other biogeochemical cycles; and carbon is the unit used for PFTs in climate models (Hood et al., 2006).

### Algorithm Validation

Nearly all algorithms are validated against estimates of phytoplankton size and composition estimates determined from in situ measurements of pigment concentrations with the HPLC technique (**Table 3**). The chemotaxonomic approach provides a means to quantify phytoplankton taxonomic composition utilizing a set of biomarker pigments (e.g., Jeffrey et al., 1997; Roy et al., 2011). Claustre (1994) and Vidussi et al. (2001) further proposed to utilize groupings of biomarker pigments to estimate phytoplankton size structure. They identified a set of seven diagnostic pigments specific to phytoplankton taxa, which were then assigned to one of the three size classes (micro-, nano-, and pico-) depending on the average cell size of the organisms. The diagnostic pigment-based approach enables estimating the contribution of the three phytoplankton size classes to the total chlorophyll a biomass as follows (Equation 5):

$$
\Sigma DP = \sum\_{i=1}^{7} W\_i P\_i \tag{5a}
$$

$$f\_{micro} = \frac{\sum\_{i=1}^{2} W\_i P\_i}{\Sigma \text{ DP}} \tag{5b}$$

$$f\_{nano} = \frac{\sum\_{i=3}^{5} W\_i P\_i}{\Sigma \text{ DP}} \tag{5c}$$

$$f\_{pico} = \frac{\sum\_{i=6}^{7} W\_i P\_i}{\Sigma \text{ DP}} \tag{5d}$$

where, 6DP is the sum of all the diagnostic pigments multiplied by the weight coefficients (W<sup>i</sup> , values discussed below), fmicro, fnano, and fpico are the fractions of the micro-, nano- and pico-plankton size classes to [Chl], and P<sup>i</sup> are the pigments' concentrations (P = {fucoxanthin; peridinin; 19′ -hexanoyloxyfucoxanthin; 19′ -butanoyloxyfucoxanthin; alloxanthin; chlorophyll-b and divinyl chlorophyll-b; zeaxanthin}). The most widely used coefficients are those proposed by Uitz et al. (2006) (W = {1.41; 1.41; 1.27; 0.35; 0.6; 1.01; 0.86}), which were derived from a global HPLC pigment database.

While diagnostic pigments have been widely used for validation due to the availability of extensive datasets of HPLC pigments across the global ocean (Peloquin et al., 2013), there are important limitations to consider. The diagnostic pigmentbased approach does not necessarily reflect the true size structure of the phytoplankton communities because some taxonomic groups may spread over a broader size range (e.g., diatoms are typically found in the micro- but could also occur in the nano-size and sometimes in the pico-size classes) and some diagnostic pigments are shared by different taxonomic groups (e.g., fucoxanthin is the main carotenoid of diatoms but may also be found in prymnesiophytes). Recently modifications to the Vidussi et al. (2001) and Uitz et al. (2006) approach were proposed that account for the presence of some diagnostic pigments in more than one taxon. Hirata et al. (2011) and Devred et al. (2011) proposed further adjustments to the fucoxanthin pigment coefficient, to assign a portion of this pigment to nanoplankton. To address the analogous issue that prymnesiophytes predominate within the nanophytoplankton but can also be present in the pico-eukaryote population, Brewin et al. (2010) modified the 19′ -hexanoyloxyfucoxanthin (19′ -hex) coefficient to attribute a portion of this pigment to the picoplankton in low [Chl] waters. Furthermore, the DP for diatoms, fucoxanthin, is the precursor pigment for 19′ -hex leading to some prymnesiophytes being classified as diatoms. For the algorithms that utilize HPLC pigments in their development, it should be noted that direct comparisons need to be considered carefully, as not all output products are developed and validated with the same set of diagnostic pigment coefficients. There are ongoing efforts to verify HPLC methods (i.e., Equation 5) through comparison with other techniques (e.g., Brewin et al., 2014a). As Nair et al. (2008) pointed out, any single method alone may not be entirely dependable, thus incorporating various methodologies leads to a more complete diagnosis of phytoplankton groups. Future efforts are necessary to complement HPLC methods with independent information on PFTs, for instance carbon-based size classes (Kostadinov et al., 2016a).

Not all validation approaches are based on HPLC pigment data, but rather use information on either absorption coefficients (Ciotti and Bricaud, 2006; Brewin R. J. W. et al., 2011) or sizefractionated [Chl] (Fujiwara et al., 2011). In addition, the study of Sadeghi et al. (2012b) does not perform a true validation, but rather compares numerical model results to other satellite products (**Table 3**).

Validation metrics are not reported using uniform metrics across algorithms causing an additional layer of complication when comparing algorithm performance. While it would be better to provide consistent validation measures across all algorithms, as mentioned previously, different satellite product outputs, units and use of variable development and validation datasets/coefficients, preclude this ability. Reported validation measures are compiled in **Table 3**. Ideally, root mean square error (RMSE) (IOCCG, 2006) should be reported for matchups carried out according to the methods of Bailey and Werdell (2006) that specify the mean value of a five-by-five pixel box at the highest available pixel resolution measured by the sensor surrounding the location and within ±3 h of an in situ observation. In many cases, 9 km global area coverage satellite data are used to infer PFT classification. Thus, the spatial resolution is already coarse for validation matchups. However, algorithms can also be applied to full resolution (1 km) imagery improving validation efforts. In the case of PhytoDOAS, the input requires hyperspectral resolution and has been developed for use with SCIAMACHY, which has a resolution of 30 by 60 km. Spatial resolution differences between in situ point observations and the large SCIAMACHY pixels presents a limitation for validation using matchups (Bracher et al., 2009), since only very few in situ observations are within a homogeneous area of the size of a SCIAMACHY pixel. However, Aiken et al. (2007) point out that in the open ocean phytoplankton assemblages may be homogenously distributed over 50–100 km and smaller scales are possible for specific communities. In Sadeghi et al. (2012b), PhytoDOAS coccolithophore [Chl] was validated by comparison with satellite-derived particulate inorganic carbon (Balch et al., 2005).

## ALGORITHM SELECTION

Users often select satellite products that most closely align with their application. When there are several satellite product choices for a given PFT type with varying facets and complexity, the optimal choice may not be clear. To help users determine what might be best suited for their purpose, in addition to the satellite inputs, outputs, and validation metrics described above, we compile a comparative list of assumptions, strengths, and limitations (**Table 4**). It is possible that merged products produce the best output beyond any individually selected algorithm (Palacz et al., 2013), yet an understanding of the underlying inputs into a merged product is always desirable.

Abundance-based algorithms assume a change in size and taxonomic structure with a change in chlorophyll. To the first order, and for large time and space scales, this holds true, but there are exceptions. Deviations from the mean state of the data in which the relationship is developed may occur (Hirata et al., 2011). This is particularly challenging at regional scales and in optically complex water where CDOM and NAP also complicate the retrieval of [Chl]. In a changing ocean, if shifts toward different phytoplankton assemblages with similar [Chl] occur, empirical relationships will require recalibration (Hirata et al., 2011). Abundance-based algorithms begin with uncertainty associated with the input satellite [Chl] product in addition to the uncertainty in relationships between [Chl] and phytoplankton grouping (**Figure 2**). Typically, band-ratio estimation of [Chl] (O'Reilly et al., 1998) has an accepted 35% uncertainty (Bailey and Werdell, 2006), which has recently been documented to be much less in the open ocean (16%) (Brewin et al., 2016), but becomes worse in coastal waters. Some semi-analytical inversions that retrieve [Chl], also have similar uncertainty across global scales (Brewin et al., 2015), but may maintain accuracy in coastal waters due to their ability to account for other in-water constituents contributing to the IOPs present that vary independently of each other. However, PFT approaches, which are broadly characterizing phytoplankton, may ultimately result in less uncertainty than the starting [Chl] product. The attractiveness of the abundance-based approaches is their ease of implementation and that they exploit the first-order signal in Rrs. [Chl] a primary biological variable that is routinely measured in situ, thus enabling extensive association of PFT fields with the abundance of in situ [Chl] that has accumulated across the globe. Once you know [Chl], PSC, or PTC estimates are a simple calculation.

Radiance-based approaches assume that after normalization to [Chl], changes in radiance coincide with changes in PFTs. They utilize Rrs(λ) [or nLw(λ)], the fundamental parameter observed by a satellite radiometer and having uncertainty thresholds of 5% (Bailey and Werdell, 2006). Thus, the strength of radiance-based approaches is that they do not require or have limited dependence on products derived from Rrs(λ). However, any normalization of the signal to derive the secondorder relationships that tend to underpin these approaches will inevitably suffer from reduced signal to noise. Furthermore, when [Chl] is used in normalization (e.g., PHYSAT), the uncertainty associated with [Chl] is introduced (**Figure 2**). These algorithms are dependent on empirical relationships between radiance and PTCs or PSCs, thus as with empirical [Chl] dependencies described above, they require recalibration for long-term analyses. As with absorption- and scatteringbased approaches and abundance-based approaches when using [Chl] determined from a semi-analytical model, radiance-based approaches allow for the ability to account for other optically active in water constituents (CDOM and NAP) as these also impact the spectral radiance (Alvain et al., 2012). This aspect allows potential development by users who have their own in situ datasets—it is possible to empirically associate a specific radiance anomaly to phytoplankton assemblages or specific composition (Alvain et al., 2012; Rêve et al., in revision). This highlights the importance of continued investment of detailed in situ databases to allow future development and use of remotely sensed phytoplankton groups. Radiance-based approaches are also influenced by physiological variability; however, the variability likely represents a larger proportion of the signal in normalized quantities.

PhytoDOAS (Bracher et al., 2009; Sadeghi et al., 2012a) has so far only been applied to a single sensor that has sufficient spectral resolution, precluding it from studies of phytoplankton composition where 30 km spatial resolution would be limiting. However, this is expected to improve in the near future: adaptations of the algorithm to similar high spectrally resolved satellite data with improved spatial coverage and resolution are currently ongoing. Ozone Monitoring Instrument (OMI) (since 2004) with 13 km by 24 km and TROPOMI (tropospheric OMI, to be launched in early 2017) with 3.5 km by 7 km global spatial resolution are, or will be, used with PhytoDOAS. In addition,


ocean color sensors are planned for the future with significantly increased spectral resolution (Mouw et al., 2015) that may allow a wider adoption of the PhytoDOAS method to even smaller spatial scales. For example, NASA's planned Plankton, Aerosol, Cloud, and ocean Ecosystem (PACE) mission with a hyperspectral ocean color sensor payload is expected to revolutionize the ability to use algorithms, such as PhytoDOAS, on more adequate spatiotemporal scales.

The number of existing absorption-based algorithms indicates the clear impact phytoplankton cell size and pigment composition have on the shape of the spectral absorption coefficient. These relationships have been reported in the literature for decades (e.g., Bricaud et al., 1988, 1995; Ciotti et al., 1999). The strengths of this type of algorithm include the ability to begin with inherent optical properties rather than [Chl] as the satellite input product, thus starting with reduced uncertainty at the onset. However, the assumed spectral shapes and coefficients utilized in semi-analytical approaches cannot fully capture natural variability across a variety of conditions resulting in uncertainties. These uncertainties are a balance of spectral accuracy and the accuracy of particular parameters over others (Werdell et al., 2013). As with Rrs approaches, those that require normalization by [Chl] inevitably reduce signal to noise and also reintroduce uncertainty associated with [Chl]. A limitation of absorptionbased approaches is that they are sensitive to physiological variability associated with light and nutrient histories and these are likely to be of more influence when normalized quantities are used. Furthermore, small changes in the spectral shape of phytoplankton absorption can be difficult to retrieve from ocean-color (Garver et al., 1994; Wang et al., 2005), such that identifying and distinguishing different PFTs may not always be successful. Problems can also occur when trying to discriminate different phytoplankton groups with similar absorption signatures.

The scattering-based approaches presented here assume the PSD has a power-law shape and relative proportions of biovolume are roughly constant across size classes. Conversion to phytoplankton carbon for the carbon-based PFTs requires additional assumptions Kostadinov et al. (2016a). The models assume a relationship between the PSD and the spectral slope of bbp(λ). The use of bbp(λ) makes the approach less sensitive to physiological variability than other approaches. However, the particle size classes include all particles, not just phytoplankton and the relationship between bbp(λ) and phytoplankton cell size is still a matter of active debate (Stramski et al., 2004; Vaillancourt et al., 2004; Dall'Olmo et al., 2009; Whitmire et al., 2010). In addition, the sources of backscattering are still uncertain (Stramski et al., 2004) and applicability of Mie theory to particles and/or phytoplankton assemblages in seawater has its limitations (e.g., Dall'Olmo et al., 2009). It has been suggested that this approach represents phytoplankton carbon more closely—see Martinez-Vicente et al. (2013), and backscattering has been used to retrieve total phytoplankton carbon (Behrenfeld et al., 2005).


Products were only available for SeaWiFS at time of writing for KSM09, BR10, PHYSAT, and MY2010 and for SCIAMACHY for PhytoDOAS.

Users are more focused on the satellite outputs, which they can use for various applications, rather than the intricacies of the type of algorithm used to produce the output. For ease of information identification, we have provided the validation metrics reported by algorithm type and satellite output types (**Table 3**). However, our purpose here is not to intercompare or validate algorithms. It is important to point out that the algorithms all use different approaches, datasets, and validation metrics. To be able to properly assess algorithm performance, one would have to carry out a comprehensive inter-comparison using the same validation data and consider errors of omission and commission (see Brewin R. J. et al., 2011), which is outside the scope of the present work. A validation effort is planned as part of the International Satellite Phytoplankton Functional Type Algorithm Inter-comparison Project (Hirata et al., 2012; http://pft.ees.hokudai.ac.jp/satellite/index.shtml) while an intercomparison based on phenology has been carried out by Kostadinov et al. (2017).

It is important to point out that many of these methods have been developed for the global open ocean. The optical complexity encountered in coastal waters is quite different from that found in the global datasets used to develop these algorithms. Additionally, the assumptions made by some are only valid for the global open ocean. The relationship between [Chl], CDOM absorption, and particulate backscatter is more variable in coastal water than the open ocean. For example, riverine sources, resuspension, and mixing may cause CDOM and NAP to vary independently of phytoplankton. For these reasons, band-ratio [Chl] estimates that utilize the blue and green region of the spectra are plagued with problems in coastal waters (Matthews, 2011). Thus, it is not advisable to apply open ocean abundance-based algorithms to coastal systems. Relationships would need to be assessed and likely redeveloped using a regionally specific dataset. Similar limitations would be expected for radiance-based methods. While the atmospheric correction can be a challenge over some coastal waters (Goyens et al., 2013), if Rrs(λ) is accurately retrieved, the dynamic range of CDOM and NAP that impart a significant signal to Rrs(λ) require empirical relationships and thresholds defined for various PFTs and PSCs to be reestablished. The approaches that build upon semi-analytic expressions that first retrieve IOPs from Rrs(λ) and then PFTs from the retrieved IOPs, have the greatest ability to accommodate dynamic environments. These approaches parse the contributions of NAP, CDOM and phytoplankton before the phytoplankton IOPs are associated with a PFT. Similarly, this is done within the PhytoDOAS method by accounting for all relevant absorbers (from water and atmosphere) within the fitting of hyperspectral top of atmosphere reflectance. Accordingly, Brewin R. J. et al. (2011) find absorption-based approaches show an improvement over abundance-based approaches in coastal waters. However, the thresholds of detectability of approaches targeting optical signatures will not allow PFT retrieval in all cases.

The limitation of the ability to retrieve PFTs in some cases needs to be acknowledged. For example, Mouw and Yoder (2010a) are careful to consider the change in Rrs(λ) produced by PSCs, [Chl] and CDOM absorption in relation to the radiometric sensitivity of the satellite senor. They find that when [Chl] or CDOM absorption were too high, the impact of size on Rrs(λ) is masked. Likewise, when [Chl] is too low, the spectral response of Rrs(λ) due to size is too small to differentiate from noise. Additionally, PHYSAT in its first version (Alvain et al., 2005) did not classify pixels where no phytoplankton group dominated due to the use of biomarker pigment threshold during the first empirical anomalies labeling steps. However, recent developments of PHYSAT have shown its capability to detect more than dominance cases utilizing detailed in situ data (Alvain et al., 2012; Ben Mustapha et al., 2014; Rêve et al., in revision).

The accessibility of products is another reason why users may select a given algorithm over another. Algorithms where simple calculations extracted from the publication can be quickly applied are far more likely to be utilized than those that require multiple complicated steps. The algorithm developer hosting the final output product for download by users has often remedied difficulty in this later situation. The PFT products that are currently accessible online are listed in **Table 5**. Further, the PFT products compiled for phenological comparison (Kostadinov et al., 2017) intend to be released in the near future. The availability of PFT product access is anticipated to grow substantially as future missions that have specified PFT products as part of their mission goals come online.

PFT algorithm development thus far has been focused on retrieving global distributions of PFTs. The next challenge is to detect change in these distributions over time. The temporal anomaly of PFTs can be a smaller signal than the bulk composition retrievals achieved thus far. The anomalies are critical for understanding climate change issues and testing ecosystem model prediction. However, detecting change is confounded by inter- and intra-algorithm uncertainties and the relatively short record length of satellite data. Further, critical to this consideration are changes in phytoplankton physiology. Behrenfeld et al. (2016) show the importance of accounting for photoacclimation in temporal chlorophyll variability, as light-driven changes in chlorophyll can be associated with constant or increased photosynthesis. This finding of the necessity to account for physiological plasticity also directly impacts the PFT methods described here, most acutely for abundance-, radiance-, and absorption-based methods. While satellite PFT time series data have already been used to assess regional PFT variability and trends (Brewin et al., 2012; Sadeghi et al., 2012b; Alvain et al., 2013; Soppa et al., 2016). there is a need to characterize physiological plasticity in PFT retrievals to more accurately quantify phytoplankton compositional response to a changing ocean.

# CONCLUSIONS

At the global scale, the current PFT algorithms demonstrate proof of concept in retrieving phytoplankton composition from satellite radiometry, opening the door for further development, and expand the use of satellite observations. While there are a variety of algorithm approaches, all agree on broad understanding of PFT distribution at large spatial-temporal scales, that are forced mainly by bathymetry and climatic regions. Larger cells and taxa tend to be found near coastal regions, especially under upwelling regimes, while smallest cells and taxa dominate in the center of oceans. Temperate regions are likely to present seasonal blooms of large cell sizes in spring and/or fall, while a less variable size distribution of phytoplankton is expected in tropical and subtropical areas and in the oligotrophic gyres.

Continual PFT algorithm development is anticipated, particularly with the expansion of sensor capability with future missions. Planned capability will expand spectral, spatial, and temporal resolution, in addition to radiometric sensitivity (Mouw et al., 2015). Increased spectral resolution will provide the ability to exploit more spectral signatures of PFTs (Isada et al., 2015; Wolanin et al., 2016). In addition to increased spectral resolution, increased spatial resolution may lend clarity to coastal processes and phytoplankton response to finer scale physical features. Improved temporal resolution on geostationary platforms will allow multiple views per day to investigate diurnal phytoplankton variability. Improved radiometric sensitivity will expand threshold detection required to detect the secondary impact of PFTs on radiometric variability. All of the potential capability in expanded satellite PFT products with the next generation of satellite sensors hinges on continued and increased investment in in situ observations to allow further algorithm development and validation. In addition to HPLC pigments that so many of these approaches are validated upon, training datasets also need to include unambiguous metrics of community composition that include particle size distribution and taxonomy (from imaging technologies) (Bracher et al., 2015). Exploiting compilations of abundance and biomass (Leblanc et al., 2012) and connections to genetically determined community composition (Malviya et al., 2016) are potentially rich resources for expanding training and validation datasets. In addition, coincident optical [i.e., Rrs(λ) and IOPs] observations will be highly important to connect to the signals observed by satellite radiometers. The expanding optical sensors on Bio-Argo floats may also provide a valuable data stream for PFT development, particularly for vertical structure of phytoplankton communities (Mignot et al., 2014). It is important to expand the capability to measure phytoplankton carbon in situ (Graff et al., 2012, 2015) so future definitions of PFTs can be more carbon-relevant (Kostadinov et al., 2016a).

This document provides an overview of the primary components used in developing, implementing, and using satellite PFT products. While we do not provide direct recommendations for particular applications, our hope is that providing an accessible overview of the primary components of PFT algorithms will aid users in more confidently selecting products for a given application and ignite future conversations between satellite product developers and a variety of user communities. The satellite PFT literature is rapidly expanding and these tables and figures will require updating and the need to develop anew. In addition to the value we hope this brings to the user community, we equally hope this summary provides a framework for algorithm organization to inform where possible new approaches could be investigated into the future.

# AUTHOR CONTRIBUTIONS

CM carried out the synthesis of algorithms, developed the organization, and prepared the manuscript with guidance from NH. CM prepared all figures and tables. All other co-authors contributed to ensuring accuracy of their algorithm description, overall synthesis, and editing.

# FUNDING

The National Aeronautics and Space Administration (NASA) provided financial support for CM (NNX13AC34G) and TK (NNX13AC92G) for this effort. The contribution of AsB, RB, and AnB was partly funded via the ESA SEOM SY-4Sci Synergy project SynSenPFT.

# ACKNOWLEDGMENTS

We acknowledge the International Satellite Functional Type Algorithm Intercomparison Project (http://pft.ees.hokudai.ac.jp/satellite/) for facilitation of a series of meetings and workshops that led to the development of this manuscript, with funding from JAXA and the UK National Centre for Earth Observation and support from the International Ocean Colour Coordination Group (IOCCG).

# REFERENCES


and inverse modelling approaches. Opt. Express 22, 11536–11551. doi: 10.1364/OE.22.011536


waters in a new diagnostic ecological indicator model. Biogeosciences 10, 7553–7574. doi: 10.5194/bg-10-7553-2013


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The handling Editor declared a shared affiliation, though no other collaboration, with several of the authors SA, AB, and JU and states that the process nevertheless met the standards of a fair and objective review.

Copyright © 2017 Mouw, Hardman-Mountford, Alvain, Bracher, Brewin, Bricaud, Ciotti, Devred, Fujiwara, Hirata, Hirawake, Kostadinov, Roy and Uitz. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Obtaining Phytoplankton Diversity from Ocean Color: A Scientific Roadmap for Future Development

Astrid Bracher 1, 2 \*, Heather A. Bouman<sup>3</sup> , Robert J. W. Brewin4, 5, Annick Bricaud6, 7 , Vanda Brotas <sup>8</sup> , Aurea M. Ciotti <sup>9</sup> , Lesley Clementson<sup>10</sup>, Emmanuel Devred<sup>11</sup> , Annalisa Di Cicco<sup>12</sup>, Stephanie Dutkiewicz <sup>13</sup>, Nick J. Hardman-Mountford<sup>14</sup> , Anna E. Hickman<sup>15</sup>, Martin Hieronymi <sup>16</sup>, Takafumi Hirata17, 18, Svetlana N. Losa<sup>1</sup> , Colleen B. Mouw<sup>19</sup>, Emanuele Organelli <sup>4</sup> , Dionysios E. Raitsos <sup>4</sup> , Julia Uitz 6, 7, Meike Vogt <sup>20</sup> and Aleksandra Wolanin1, 2, 21

### Edited by:

Laura Lorenzoni, University of South Florida, USA

### Reviewed by:

Matthew J. Oliver, University of Delaware, USA Catherine Mitchell, Bigelow Laboratory for Ocean Sciences, USA

### \*Correspondence:

Astrid Bracher astrid.bracher@awi.de

### Specialty section:

This article was submitted to Ocean Observation, a section of the journal Frontiers in Marine Science

Received: 30 November 2016 Accepted: 15 February 2017 Published: 03 March 2017

### Citation:

Bracher A, Bouman HA, Brewin RJW, Bricaud A, Brotas V, Ciotti AM, Clementson L, Devred E, Di Cicco A, Dutkiewicz S, Hardman-Mountford NJ, Hickman AE, Hieronymi M, Hirata T, Losa SN, Mouw CB, Organelli E, Raitsos DE, Uitz J, Vogt M and Wolanin A (2017) Obtaining Phytoplankton Diversity from Ocean Color: A Scientific Roadmap for Future Development. Front. Mar. Sci. 4:55. doi: 10.3389/fmars.2017.00055 <sup>1</sup> Phytoooptics Group, Climate Sciences, Alfred-Wegener-Institute Helmholtz Centre for Polar and Marine Research, Bremerhaven, Germany, <sup>2</sup> Department of Physics and Electrical Engineering, Institute of Environmental Physics, University Bremen, Bremen, Germany, <sup>3</sup> Department of Earth Sciences, University of Oxford, Oxford, UK, <sup>4</sup> Remote Sensing Group, Plymouth Marine Laboratory, Plymouth, UK, <sup>5</sup> National Centre for Earth Observation, Plymouth Marine Laboratory, Plymouth, UK, <sup>6</sup> Sorbonne Universités, UPMC-Université Paris-VI, UMR 7093, LOV, Observatoire Océanologique, Villefranche/Mer, France, <sup>7</sup> Centre National de la Recherche Scientifique, UMR 7093, LOV, Observatoire Océanologique, Villefranche/Mer, France, <sup>8</sup> Faculdade de Ciencias da Universidade de Lisboa, MARE, Lisboa, Portugal, <sup>9</sup> CEBIMar, Universidade de São Paulo, São Paulo, Brazil, <sup>10</sup> CSIRO Oceans and Atmosphere, Hobart, TAS, Australia, <sup>11</sup> Department of Fisheries and Oceans, Bedford Institute of Oceanography, Dartmouth, NS, Canada, <sup>12</sup> Institute of Atmospheric Sciences and Climate, Italian National Research Council (CNR-ISAC), Rome, Italy, <sup>13</sup> Department of Earth, Atmospheric and Planetary Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA, <sup>14</sup> CSIRO Oceans and Atmosphere, Perth, WA, Australia, <sup>15</sup> Ocean and Earth Science, National Oceanography Centre Southampton, University of Southampton, Southampton, UK, <sup>16</sup> Department of Remote Sensing, Institute of Coastal Research, Helmholtz-Zentrum Geesthacht, Geesthacht, Germany, <sup>17</sup> Faculty of Environmental Earth Science, Hokkaido University, Sapporo, Japan, <sup>18</sup> CREST, Japan Science and Technology Agency, Tokyo, Japan, <sup>19</sup> Graduate School of Oceanography, University of Rhode Island, Narragansett, RI, USA, <sup>20</sup> Department of Environmental Systems Science, Institute for Biogeochemistry and Pollutant Dynamics, ETH Zürich, Zürich, Switzerland, <sup>21</sup> GeoForschungsZentrum Potsdam, Potsdam, Germany

To improve our understanding of the role of phytoplankton for marine ecosystems and global biogeochemical cycles, information on the global distribution of major phytoplankton groups is essential. Although algorithms have been developed to assess phytoplankton diversity from space for over two decades, so far the application of these data sets has been limited. This scientific roadmap identifies user needs, summarizes the current state of the art, and pinpoints major gaps in long-term objectives to deliver space-derived phytoplankton diversity data that meets the user requirements. These major gaps in using ocean color to estimate phytoplankton community structure were identified as: (a) the mismatch between satellite, in situ and model data on phytoplankton composition, (b) the lack of quantitative uncertainty estimates provided with satellite data, (c) the spectral limitation of current sensors to enable the full exploitation of backscattered sunlight, and (d) the very limited applicability of satellite algorithms determining phytoplankton composition for regional, especially coastal or inland, waters. Recommendation for actions include but are not limited to: (i) an increased communication and round-robin exercises among and within the related expert groups, (ii) the launching of higher spectrally and spatially resolved sensors, (iii) the development of algorithms that exploit hyperspectral information, and of (iv) techniques to merge and synergistically use the various streams of continuous information on phytoplankton diversity from various satellite sensors' and in situ data to ensure long-term monitoring of phytoplankton composition.

Keywords: ocean color, phytoplankton functional types, algorithms, satellite sensors, roadmap

# USER NEEDS FOR PHYTOPLANKTON DIVERSITY FROM SPACE

Marine phytoplankton play an important role in the global carbon cycle via the biological carbon pump (e.g., IPCC, 2013) and contribute about 50% to the global primary production (Field et al., 1998). Over the past 30 years, ocean color remote sensing has revolutionized our understanding of marine ecosystems and biogeochemical processes by providing continuous global estimates of surface chlorophyll a concentration (chl-a, mg m−<sup>3</sup> ), a proxy for phytoplankton biomass (e.g., McClain, 2009). However, chl-a alone does not provide a full description of the complex nature of phytoplankton community structure and function. Phytoplankton have different morphological (size and shape) and physiological characteristics (growth and mortality rates, nutrient uptake kinetics, temperature, and light requirements) as well as different biogeochemical and ecological functions (e.g., silicification, calcification, nitrogen fixation, aggregation and sinking rates, lipid production, energy transfer; e.g., Le Quéré et al., 2005). Phytoplankton community structure is thus important to many fundamental biogeochemical processes, including: nutrient uptake and cycling, energy transfer through the marine food web, deep-ocean carbon export, and gas exchange with the atmosphere. Phytoplankton community composition also has important consequences for fisheries (e.g., fish recruitment) and specific species (Harmful Algal Blooms, HABs; a list of all abbreviations is given in **Table 1**) can directly impact human health (e.g., Cullen et al., 1997).

The ability to observe the spatial-temporal distribution (including phenology) and variability of different phytoplankton groups is a scientific priority for understanding the marine food web, and ultimately predicting the ocean's role in regulating climate and responding to climate change on various time scales. Thus, identifying the drivers of phytoplankton composition on global and regional scales is required to assess climate ecosystem interactions and to increase our understanding of the role of the ocean's biodiversity for marine ecosystem service provision. Coasts are especially vulnerable to major human threats caused by harmful algal blooms, eutrophication, hypoxia, and other processes deteriorating water quality. High resolution data on phytoplankton diversity is urgently needed for many socioeconomic applications (e.g., fisheries, aquaculture, and coastal management, see IOCCG, 2009).

Some fishery models (e.g., Jennigs et al., 2008) already utilize information on phytoplankton biomass derived from ocean color satellites, however information on size and taxonomic composition from satellite is highly desirable to improve stock assessments (IOCCG, 2009). To better represent the variable biogeochemical state of the ocean, Earth System, and climate

### TABLE 1 | Abbreviations and acronyms used throughout the text.


models (including those used in the IPCC assessments) have increasingly included a larger amount of biological complexity in their ocean biogeochemistry modules. To simplify the representation of the vast planktonic diversity, plankton have been grouped into plankton functional types according to their biogeochemical functions (e.g., Le Quéré et al., 2005). Biogeochemical models now commonly include 3–10 plankton functional types (e.g., Bopp et al., 2013; Laufkötter et al., 2015), with a few models including up to 100 or more types (Follows et al., 2007; Dutkiewicz et al., 2015; Masuda et al., 2017). Since in situ observations on plankton biogeography and abundance are scarce and many vast oceanic regions are too remote to be routinely monitored, biogeochemical modelers rely on surface ocean estimates of phytoplankton composition from satellite observations to evaluate model simulations and help to develop and validate their models. Increased biological realism in these models has been suggested as a mean to reduce the large uncertainty in future projections of net primary production, and carbon export (Bopp et al., 2013; Laufkötter et al., 2015). Information on global phytoplankton community composition from ocean color satellites is therefore highly desirable for Earth system model development and the quantification of key processes related to present and future global biogeochemical cycles. Particularly for the quantification of carbon fluxes in the world's ocean, high quality remote sensing data on phytoplankton community composition are a first priority (see science plan of the EXPORT project, Siegel et al., 2016).

Thus, continuous, global-scale, high-resolution satellite ocean color products that go beyond bulk chl-a and provide information on phytoplankton diversity is urgently needed to improve near-real time and forecasting models for marine services facilitating the above-mentioned applications. User requests for satellite data on phytoplankton diversity as an essential ocean/climate variable is providing impetus for its incorporation into international climate change initiatives and mission (capability) planning. In this article the current state of the art regarding algorithms, their validation and application is reviewed, then the gaps to meet user requirements are discussed, and finally detailed recommendations for future medium and long term actions are provided.

### STATE OF THE ART

Diversity of phytoplankton, often represented by species richness and evenness, can be characterized in multiple dimensions (e.g., taxonomic, phylogenetic, morphological, or functional diversity, among others). This diversity is staggeringly large and even within a species there are often a large range of ecotypes with different environmental niches, life stages and/or morphological, and physiological characteristics (e.g., Bouman et al., 2006). For almost all purposes scientists tend to cluster species into groups specific to the purposes of their research. For instance, climate scientists and marine biogeochemists define phytoplankton functional types (PFT) based on their biogeochemical functions (e.g., diatoms as silicifying PFT). Based on satellite products, we here refer to any clustering of species (and ecotypes) as "Phytoplankton Groups" (PG). PG defined based on taxonomic criteria are referred to as phytoplankton types (PT), and PG defined based on their size range are referred to as phytoplankton size classes (PSC).

Satellite ocean-color remote sensing is unsurpassed in its ability to characterize the state of the surface ocean biosphere at high temporal and spatial scales. Beyond chl-a, increasing efforts have been invested internationally over the last two decades to develop ocean color algorithms to retrieve information on phytoplankton composition and size structure (see recent summary in IOCCG, 2014 and list of global approaches applied to satellite data in **Table 2**). These developments provide an opportunity to yield new operational satellite products. Ocean color algorithms to assess phytoplankton diversity make use of information originating from phytoplankton abundance, cell size, bio-optical properties (such as pigment composition, absorption, and backscattering characteristics) to differentiate PG (**Table 2**, **Figure 1** left). The abundance based approaches of Uitz et al. (2006), Brewin et al. (2010), Brewin et al. (2015), and Hirata et al. (2011) use satellite chl-a as input to derive PSC or PT based on empirical relationships linking in situ marker pigments to chl-a which are determined using high precision liquid chromatography (HPLC). Abundance-based approaches use satellite chl-a as input and by that exploit the largest signal in water leaving radiance to extract variability due to PG out of chl-a. This is then a simple calculation and can be applied easily to chl-a products from different sensors. However, they cannot predict atypical associations and may not hold in a future ocean.

Another class of algorithms relies on spectral features in reflectance, absorption, and/or backscattering spectra caused

### TABLE 2 | A compilation of global algorithms to retrieve phytoplankton composition from satellite data.


by the variation in phytoplankton structure and pigment composition (Brown and Yoder, 1994; Subramaniam et al., 2002; Alvain et al., 2005, 2008; Westberry et al., 2005; Ciotti and Bricaud, 2006; Devred et al., 2006, 2011; Hirata et al., 2008; Bracher et al., 2009; Kostadinov et al., 2009, 2016; Mouw and Yoder, 2010; Fujiwara et al., 2011; Bricaud et al., 2012; Moore et al., 2012; Sadeghi et al., 2012a; Li et al., 2013; Roy et al., 2013; Ben Mustapha et al., 2014; Werdell et al., 2014). Spectral-based approaches exploit as much of the backscattered spectrum observed by satellite as necessary to extract the signatures of specific PG to ocean color. Generally, these methods are computationally much more expensive and require specific adaptations for each sensor. However, these algorithms rely on much less empirical relationships than the abundance based approaches and are based on physical principles (radiative transfer). Differences exist on the different satellite inputs (e.g., radiance, absorption, backscattering) and the underlying principles (for a comprehensive overview Mouw et al., 2017). Another approach incorporates various environmental parameters to predict PT based on their ecological preferences (Raitsos et al., 2008; Palacz et al., 2013). This method uses artificial neural networks to link the different biological and physical data sets. While the approach of Raitsos et al. (2008) was regionally developed for the North-Atlantic, the approach by Palacz et al. (2013) is not purely based on remote sensing data but also requires a coupling to a dynamic plankton model.

Products obtained from the PG algorithms (**Table 2**) are typically dominance (Brown and Yoder, 1994; Alvain et al., 2005; Moore et al., 2012; Ben Mustapha et al., 2014), presence or absence of a certain PT (Westberry et al., 2005; Werdell et al., 2014), fraction or concentration of chl-a of the three PSC (Devred et al., 2006, 2011; Uitz et al., 2006; Hirata et al., 2008, 2011; Kostadinov et al., 2009, 2016; Brewin et al., 2010, 2015; Fujiwara et al., 2011; Li et al., 2013; Roy et al., 2013) or a size factor characterizing the contribution of pico- (or micro-) phytoplankton to the phytoplankton community (Ciotti and Bricaud, 2006; Mouw and Yoder, 2010; Bricaud et al., 2012). Currently, only the products OC-PFT (Hirata et al., 2011) and PhytoDOAS (Bracher et al., 2009; Sadeghi et al., 2012a) enable the simultaneous determination of chl-a for several PT. PhytoDOAS retrieves the imprints of absorption characteristics of specific phytoplankton groups among all other atmospheric and oceanic absorbers from top of atmosphere data of the hyperspectral satellite sensor SCIAMACHY (Scanning Imaging Absorption Spectrometers for Atmospheric Chartography). All other satellite-based PG algorithms (**Table 2**) have been applied to water-leaving reflectance data from multispectral sensors [e.g., SeaWiFS (Sea-viewing Wide Field-of-view Sensor), MERIS (Medium Resolution Imaging Spectrometer), MODIS (Moderate Resolution Imaging Spectroradiometer)].

To be able to detect unexpected changes in phytoplankton community composition, satellite PG data based on exploiting the spectral signatures, and based on limited empirical assumptions are preferred. In the few last years, radiative transfer models (RTM) have been used to develop and assess the sensitivity of analytical (spectral) PG retrievals or to find suitable spectral characteristics necessary for ocean color sensors to retrieve PG. Werdell et al. (2014) and Wolanin et al. (2016) used the GIOP (Generalized Inherent Optical Property) model software (Werdell et al., 2013) to invert reflectance spectra (either water-leaving or top of atmosphere), and Wolanin et al. (2015) used the coupled ocean-atmosphere RTM SCIATRAN (Rozanov et al., 2014) to test the sensitivity of a PT retrieval (PhytoDOAS). Evers-King et al. (2014) and Xi et al. (2015) used the ocean RTM HydroLight (Sequoia Scientific.) to specifically model the variation of composition of PSC or certain (dominant) PT, respectively, and assessed the potential of retrievals in different water types. Werdell et al. (2014) optimized the inversion scheme of GIOP to finally retrieve absence or presence of Noctiluca miliaris from MODIS data, while Wolanin et al. (2016) used this method to identify optimal band placements for multi- and hyper-spectral satellite data for successful retrievals of certain PT. The results of this study indicate that four additional bands (381, 473, 532, and 594 nm) for the Ocean and Land Colour Instrument (OLCI) would potentially enable absorption-based quantitative retrievals of diatoms, cyanobacteria, and coccolithophores. Recent methods have been developed to retrieve PG from in situ hyperspectral algal or particulate absorption coefficients, and validated using in situ measurements (Moisan et al., 2013; Organelli et al., 2013; Zhang et al., 2015). As absorption coefficients can be estimated from satellite measurements using inverse bio-optical models, this opens the way to applications of these methods to satellite data.

Some PG algorithms (most of the ones listed in **Table 2**) have been inter-compared at the global scale: firstly using in situ PSC (derived from HPLC pigments) in terms of dominance (Brewin et al., 2011) and secondly, under the 2nd Satellite PFT Algorithm International Intercomparison Project: http://pft. ees.hokudai.ac.jp/satellite/index.shtml. The initiative strengthens the links between algorithm developers at a global scale which will help also to guide modelers and policy makers on the specific assumptions underlying each product: the intercomparison among most algorithms presented in **Table 2** and to an ensemble mean of Earth System Models is presented in Kostadinov et al. (2017). A user guide for application to open ocean waters on the most common algorithms (Mouw et al., 2017) explains the current global PG algorithms and their associated uncertainties and also includes a discussion on the advantages and disadvantages of these algorithms. A global in situ dataset of HPLC and optical properties is being developed to further evaluate these algorithms. This initiative organized and held breakout sessions at the International Ocean Color Symposia (IOCS) in 2013 and 2015 and at a specific expert International Ocean-Colour Coordinating Group (IOCCG) and National Aeronautics and Space Administration (NASA)-sponsored workshop in 2014, which focused on PG algorithms development, validation, and user needs. For each meeting the outcome resulted in a written summary of recommendation for community actions and the planning of future activities (see IOCS, 2013; Bracher et al., 2015a; IOCS, 2015) which also form the baseline for Section Recommendations toward Operational Products of Phytoplankton Diversity from Space.

To date, the majority of existing PG satellite retrieval approaches have relied on HPLC pigment data to derive in situ PG data: for developing and validating algorithms large in situ PT (e.g., Alvain et al., 2005; Hirata et al., 2011; Soppa et al., 2014; Swan et al., 2016) and PSC (e.g., Uitz et al., 2006; Brewin et al., 2010) data sets have been complied, complemented by the global pigment data set compiled under the MAREDAT project (Peloquin et al., 2013) and recent submissions to public data bases: e.g., SEABASS (http://seabass. gsfc.nasa.gov/), BODC (http://www.bodc.ac.uk), LTER Network Data Portal (https://portal.lternet.edu/nis/home.jsp), PANGAEA Data Publisher for Earth & Environmental Science (https:// www.pangaea.de). Among all available in situ PG data sets, HPLC-phytoplankton pigment data contain the largest number of observations resulting in the greatest spatial coverage with standardized quality control protocols (Hooker et al., 2012). However, size fractionated in situ data of chl-a serve as a more direct validation data set for assessing satellite retrievals on PSC (e.g., Brewin et al., 2014). The long-term and spatially extended Continuous Plankton Recorder (CPR) data sets have been used for constructing and evaluating ecological algorithms focusing on larger phytoplankton (Raitsos et al., 2008). The CPRs, especially with the recent global data effort (Global Alliance of CPR Surveys, http://www.globalcpr.org/), may provide a unique platform on taxonomic information to modern satellite sensors for several oceanic regions around the globe. Inline (coupled) flow cytometry and microscopy techniques have been developed and enable a more precise classification of the phytoplankton groupings than HPLC marker pigments (e.g., Sosik and Olson, 2007). In addition, phytoplankton group specific Inherent Optical Properties (IOPs, i.e., absorption and backscattering) determined in the field have been used as algorithm inputs for several spectral approaches (Ciotti and Bricaud, 2006; Bracher et al., 2009; Mouw and Yoder, 2010; Fujiwara et al., 2011; Sadeghi et al., 2012a). Hyperspectral IOP measurements when obtained via continuous measurements (e.g., Boss et al., 2013) can help validating satellite-derived PG by increasing the number of match-ups, assessing variability within a satellite pixel, and quantifying the uncertainties in the two-step satellite methods (i.e., from water-leaving reflectance to IOP to PG).

Satellite PG time series data have already been used to assess regionally and globally the variability and trend of phytoplankton community composition (e.g., Brewin et al., 2012; Sadeghi et al., 2012b) and PG phenology (Alvain et al., 2013; Soppa et al., 2016a) also linking to environmental variables. In addition, satellite PG data were used to assess globally particulate organic carbon export (Mouw et al., 2016), for detection of regional HAB events (e.g., Kurekin et al., 2014), the estimation of recruitment of juvenile fish (Trzcinski et al., 2013) and for inferring globally oceanic emissions of volatile organic compounds (Arnold et al., 2009; Booge et al., 2016).

All ocean color data are limited in coverage to sun-light, cloud and ice-free conditions, and only deliver information on the surface ocean (first optical depth which is 4.6 times shallower than the euphotic depth). Therefore, for many applications, additional methods have to be used to resolve variability and trends of phytoplankton community structure and abundance. Satellite-derived algorithms are increasingly compared not only to in situ data but to global model output from Earth System Models. Starting over two decades ago, biogeochemical models began incorporating multiple PT (e.g., Baretta et al., 1995; Le Quéré et al., 2005; Gregg and Casey, 2007) mainly to incorporate their biogeochemical relevance. As a first step models incorporated a "diatom" group given their importance in the silica cycle, but also given their potential important role in carbon export compared to other PT (e.g., Chai et al., 2002). As models became more sophisticated and started to simulate nitrogen biogeochemistry, many added a "diazotroph" class. Given the biogeochemical importance of these groups of phytoplankton, the modeling community refers to them as PFT (see **Figure 1** right). This classification is the closest to ocean color PT products as defined above. Though less common, models can also group phytoplankton in terms of size: the model of Ward et al. (2012) includes 25 size classes of phytoplankton. The advantage of such an approach is that it can use empirical allometric relationships of key growth parameters (e.g., maximum growth rates). Such a model output is more compatible with ocean color PSC products.

Since 2009, marine ecosystem modelers collaborate systematically with the remote sensing community in the MARine Ecosystem Model Intercomparison Project (MAREMIP). MAREMIP fosters the development of models based on PFT. Complementary to the Coupled Model Intercomparison Project Phase 5 and Phase 6 efforts (see http://cmip-pcmdi.llnl.gov/), MAREMP thus specifically targets the inter-comparison of the representation of current and future marine biology in global ocean models, and promotes the interactions between modelers and observationalists and the development of targeted observations. MAREMIP, as well as many single model studies conducted by marine ecosystem modelers worldwide (e.g., Ye et al., 2012; Dutkiewicz et al., 2015), have been using satellite-derived PT products for the evaluation of model performance in terms of plankton biogeography and global biogeochemical cycling (e.g., Hashioka et al., 2013; Vogt et al., 2013; Laufkötter et al., 2015). Initial studies have shown that models and satellite estimates of phytoplankton biogeography diverge, for example (a) in the timing of the phytoplankton bloom (Hashioka et al., 2013), (b) in phytoplankton dominance patterns and the global contribution of diatoms to total phytoplankton biomass (Vogt et al., 2013), and (c) in net primary production (Laufkötter et al., 2015).

# GAP ANALYSIS

Current satellite data sets on phytoplankton composition (PG) are not generally available in a format readily adoptable by a wide user community. Some potential users (e.g., fishery managers) still use chl-a rather than satellite PG data, in part due to a lack of confidence in the PG products, and climate modelers use only a limited fraction of the currently existing products, due to the lack of uncertainty estimates associated with each product, and issues related to the compatibility between model and satellite output. In the following section, we detail the gaps, which need to be addressed if we want to respond to user needs and promote the use of a wider range of new remote sensing products (see summary in **Table 3**, left columns).

# Gap 1: Information Mismatch between Satellite-Derived Phytoplankton Composition Products and User Group Target Variables

At present, there is a mismatch between the PG detected from satellite (and which differ between algorithms) and the groupings required by the user community. **Figure 1** illustrates PFT as they may be found in the environment, and which respond to environmental conditions based on the interplay of different variables (nutrients, temperature, salinity, light, and others). Optical (size, morphology, pigmentation, fluorescence) and non-optical (e.g., nutrient requirements, stoichiometry) properties of phytoplankton allow for distinctive groupings. The optical properties include photo-physiological responses which are driven by photoadaptation associated with certain P**G**, and photacclimation which is mostly independent of PG. From ocean color (**Figure 1** top-level left) absorption, scattering, and fluorescence properties of different PG can be derived. Coupled biogeochemical ocean models (**Figure 1** top-level right) often use groupings in terms functional groups (e.g., calcifiers, nitrogen fixers, etc.) which necessarily do not link to the optically based PG which are for example, either picophytoplankton (PSC algorithms), prokaryotic phytoplankton (e.g., Bracher et al., 2009), Synechococcus like cyanobacteria (e.g., Alvain et al., 2005), prochlorophytes (e.g., Hirata et al., 2011) or Trichodesmium only (e.g., Westberry et al., 2005) and not just nitrogen fixers. This highlights the need to enhance linkages between optical and functional PG to improve our knowledge. The algorithms listed above also provide results in different units (e.g., size factor, fraction of total chl-a, chl-a, dominance or just presence of a PG), which do not always match with users requirements (e.g., a numerical model might require carbon biomass). There are also substantial differences in the PG definitions among the users themselves (see IOCCG, 2014). While biogeochemical and RT models require a quantitative assessment of PT or PSC, end users for coastal environmental management need PG products as indicators for water quality, HAB presence, eutrophication and fisheries stock assessment. To help users selecting the appropriate PG data sets, the work already accomplished by inter-comparing (Kostadinov et al., 2017) and by setting up a user guide (Mouw et al., 2017) on global satellite PG needs to be extended to new algorithms and more explicit recommendations on which algorithm is best suited for specific users and science questions. The later can only be done when the uncertainties of these algorithms have been evaluated more consistently (see Gap 2). Improvements are also needed in terms of the representation of PG in the current generation of models to better constrain present and future projections of marine biogeochemistry. Furthermore, as the community is moving toward biogeochemical models of increased complexity, information on phytoplankton community composition from space including all PT, or other indices of biodiversity (pigments, size) will provide valuable resources for


### TABLE 3 | Summary of gap analysis for phytoplankton composition from space: gap (left), status of existing work (second left), and recommendations for actions (right columns).

Some actions are related to several gaps but are only stated at the medium term (second right) with agency supported activities embedded in long-term actions at international level (right); abbreviations in Table 1.

the next generation modelers. Thus, there is a need for ongoing product development along with effective communication between remote-sensing scientists, biological oceanographers, and modelers to ensure future developments are consistent and comparable between parties and that ultimately improve climate predictions.

# Gap 2: Lack of Traceability of Uncertainties in PG Algorithms

The quantitative assessment of uncertainty in PG satellite products is still insufficient. This is due to the above mentioned mismatch definition (see Gap 1), the limited theoretical background to connect optical signatures to diversity of phytoplankton communities across different environments and limitations in appropriate in-situ data.

At the cellular level, a detailed understanding of how pigment packaging (function of cell size and intracellular pigment concentration; Morel and Bricaud, 1981) and pigment composition that both govern the shape and magnitude of chl-a specific absorption (especially in the blue-green regions of the spectrum, which is commonly used in PG algorithms) requires further work. Both reconstruction (Bidigare et al., 1990) and decomposition (Hoepffner and Sathyendranath, 1993) methods are often applied separately to bio-optical datasets to explore the link between pigments and phytoplankton absorption. Reconstruction approaches conventionally apply a single pigment-specific absorption coefficient to a particular pigment or pigment type (e.g., photosynthetic and photoprotective carotenoids), often obtained from measurements of extracted pigments in solvent. Only a handful of studies have examined the absorptive properties of pigment-protein complexes (e.g., Johnsen and Sakshaug, 2007), yet differences in the spectral shape once pigments are embedded in proteins can be significant. Improved models on phytoplankton photoacclimation combined with new approaches in determining cell size should assist in improving our understanding of how pigment packaging influences the spectral signature of natural phytoplankton assemblages. Efforts inverting hyperspectral reflectance and absorption spectra to obtain PG have shown limited success, leading to identification of certain PT with no quantification (Werdell et al., 2014; Kudela et al, 2015; Xi et al., 2015) of PSC fractions (Organelli et al., 2013) or quantification of accessory pigments in addition to chl-a (Chase et al., 2013; Moisan et al., 2013; Bracher et al., 2015b). PT specific absorption properties are available but large spectral variability is related to algal culturing and variations in size, pigment composition and pigment packaging due to physiological responses of PT. In contrast, due to high measuring uncertainties spectral scattering properties (including back-scattering and volume scattering function) are still less known (Tan et al., 2015; Harmel et al., 2016). Thus, PG related specific IOPs are not adequately represented in RTMs. This further limits tracing uncertainty in algorithms, pointing to the need for coincident IOP observations along with expended in situ datasets of phytoplankton composition.

Other errors in algorithms are also difficult to assess, for instance the accuracy of in situ data used as input or for validation of algorithms due to mostly non-standardized acquisition (see details below), the above mentioned mismatch definition (see Gap 1), and the spatial and temporal upscaling of specific PT and PSC signatures of diverse communities. Several studies have demonstrated that adding spectrally-resolved optics to biogeochemical models improves model skill (e.g., Dutkiewicz et al., 2015) as well as comparability to observed optical properties (e.g., Fujii et al., 2007; Baird et al., 2016). A minority of global numerical models resolve the bio-optical properties of different PG (Gregg and Casey, 2007; Dutkiewicz et al., 2015; Baird et al., 2016). These advancements may provide a way forward to investigate the biological realism of phytoplankton biogeography using a larger range of satellite PG products.

In addition, the reliance on HPLC for development and validation presents challenges to quantitatively assess the uncertainty of PG satellite products. However, inputs to HPLC PG datasets are (diagnostic) accessory pigment concentrations, which are only to a certain degree congruent with taxonomy or phytoplankton size. Size can vary considerably within certain functional or taxonomic groups, e.g., diatoms can range from 3 to 500 µm but are characterized by the same diagnostic pigment (fucoxanthin) across this size range. Similarly, grouping by accessory pigments can be problematic as there is substantial variability in pigment concentration as a function of physiological response to the environmental conditions and more importantly a given biomarker pigment is present in several PT (e.g., fucoxanthin in diatoms and haptophytes). Some PT, e.g., coccolithophores, cannot be inferred from HPLC pigments. In consideration of the expanding satellite sensor capabilities, there is a need for coordinated efforts to compile and generate comprehensive in situ datasets (not just HPLC) for assessing phytoplankton composition. There is also a need to provide best practice guidance to merge the different types of datasets (e.g., HPLC, microscopy, flow cytometry) into an integrated product that encompasses different ways of grouping phytoplankton species.

# Gap 3: Missing Capabilities of Current Ocean Color Satellite Measurements

Differences among PT in their spectral absorption are small: many PT contain, despite specific marker pigments, the same suites of pigments or pigments of similar absorptive properties (note that besides the pigment absorption properties, spectral absorption is also ruled by the algal community size structure). Given the limited number of wavebands and the broad band resolution of current multi-spectral sensors can provide only limited information on the variability in phytoplankton spectral absorption caused by shifts in community structure (Bricaud et al., 2004; Organelli et al., 2011). This restricts all multispectral satellite phytoplankton composition products based on spectral principles to either indicating dominance, presence of PT or identifying major size class fractions within the total phytoplankton community to a high level of uncertainty.

Satellite instruments, with a very high spectral resolution (1 nm and better, originally designed for atmospheric applications), provide additional opportunities for distinguishing multiple PT based on their optical properties. The capability to retrieve quantitatively major PT groups based on their optical signature has been clearly shown with the PhytoDOAS method (Bracher et al., 2009; Sadeghi et al., 2012a) in the open ocean using hyperspectral satellite data from the atmospheric sensor SCIAMACHY. However, the exploitation of hyperspectral satellite data for ocean color has been so far very limited because hyperspectral sensors like SCIAMACHY (spectral resolution <0.5 nm) do not provide operational water-leaving radiance products and have very large foot-prints (30 by 60 km per pixel) and low global coverage (6 days). This provides a major constraint on assessing the retrieval's accuracy with in situ point measurements. It also limits the application of such PT satellite data sets. The difficulty of working with SCIAMACHY data is that one has to handle strong atmospheric absorbers (true for all hyperspectral satellite data) and the heterogeneity of big pixels; hence, the PhytoDOAS algorithm was designed to retrieve three PT directly from top of atmosphere radiances, by separating their high frequency absorptions from each other and relevant atmospheric absorbers, while accounting for broad band effects by using a low order polynomial. This method requires high spectral resolution (<1 nm). SCIAMACHY data acquisition ended with the lost contact to the ENVISAT satellite (April 2012). First results from adapting PhytoDOAS to the Ozone Monitoring Instrument (OMI) sensor (measuring since 2004) are very promising (Oelker et al., 2016) and will enable the extension of the spectrally derived PT data into the future with much improved global coverage (daily) and smaller foot print (13 × 24 km). OMI is also the precursor instrument to the in 2017 launched Sentinel-5-Precursor (S-5-P) with TROPOspheric Monitoring Instrument (TROPOMI) and in the 2020s launched Ultra-violet/Visible/Near-Infrared Instrument (UVN) instruments on Sentinel-4 and Sentinel-5 (all with a pixel size of 3.5 × 7 km).

The Hyperspectral Imager for the Coastal Ocean (HICO) provided data with high spatial (100 m) and spectral (∼6 nm) resolution and limited coverage (only a restricted number of scenes globally). However, so far lack of robust atmospheric correction for HICO (see current implementation in http:// seadas.gsfc.nasa.gov/) has prevented the exploitation of the full spectrum. Eventually, not much more than standard phytoplankton information (chlorophyll, fluorescence line height) as for multispectral data was derived (Ryan et al., 2014). It is a big challenge to provide spectrally consistent high quality atmospheric correction for PG retrievals.

The new ocean-color sensor OLCI on Sentinel-3 already provides two more bands in the visible range than its predecessor MERIS. It is anticipated that the number of bands will further increase for future multispectral ocean-color sensors. In addition, hyperspectral missions like Pre-Aerosol, Clouds, and ocean Ecosystem (PACE; global, high coverage, 1 km pixels, launch 2022) and Environmental Mapping and Analysis Program mission (EnMAP; regional, low coverage, 30 m pixels, launch 2019) are planned for operating in the near future. However, hyperspectral instruments like ENMAP or PACE with 5 nm resolution are still very different from atmospheric instruments like SCIAMACHY. Hence, algorithms will have to be developed (or adapted) to retrieve the PT from these new instruments.

To monitor marine ecosystem and assess their vulnerability to future anthropogenic and climate change, beyond a good spatial-temporal resolution of existing data long-term time series data are needed to monitor trends in phytoplankton community structure, and its variability on inter-annual to decadal time scales. The average cloud-free repeat time per pixel for an ocean color sensor is only 100 observations per year for the temperate and tropical zones (Werdell et al., 2007), while it is much lower for high latitudes (e.g., 12 per year for the East Greenland Sea, Cherkasheva et al., 2014). Yet, merged oceancolor products significantly increase this temporal coverage (Maritorena et al., 2010; Racault et al., 2015). The development of long time series of PG satellite products, covering more than a decade, has just started (e.g., references given in Mouw et al., 2017). Such data sets are necessary to respond to user needs. Efforts have been taken to apply the multispectral PG algorithms not only to SeaWiFS but also to MODIS and MERIS. Synergistic use of multiple sensors will enable creating long-term time series moving from monthly to daily resolution, but also provides an opportunity to improve performance of individual retrievals. The ESA project SynSenPFT is an example for that where an algorithm was developed by synergistically using PT information from SCIAMACHY-PhytoDOAS and Ocean Color Climate Change Initiative chl-a-OC-PFT retrievals. This was done to obtain high spatially and temporally resolved PT chl data using their spectral imprints retrieved from high spectrally resolved satellite data and a global PT data set was developed, from 2002 to 2012 on 4 by 4 km daily resolution (Soppa et al., 2016b).

# Gap 4: Lack of Regional Capability of PG Algorithms

Thus far, most PG algorithms work globally or some of them have been validated on restricted regions, but nearly all are limited to open ocean conditions. A spatial pattern matching between modeled- and satellite PG showed a relatively large discrepancy on smaller spatial scales than larger scales, especially around continental shelves (Hirata et al., 2013). However, PG satellite products retrieved are necessary especially for coastal areas and inland waters where water quality and HABs issues are most urgent. In these optically complex waters, optical constituents vary independently making ocean color retrievals challenging. In extremely high colored dissolved organic matter (CDOM) and low scattering waters, CDOM absorption dominates the whole visible spectrum resulting in very low water-leaving reflectance (<1%) and thus, the phytoplankton signal itself is weak. By contrast, the main problems in highly scattering waters are the masking of pigment absorption by non-algal (mineral) particle absorption and significant near infrared water reflectance (IOCCG, 2000). Successful results in these types of water are hampered by limited spatial and spectral resolution of sensors. This already makes it difficult to achieve accurate atmospheric correction and obtain reliable ocean color standard products. It also inhibits the observation of the patchy distribution of phytoplankton communities. To derive certain PT beyond sizeand/or pigment-based discrimination of phytoplankton requires developing empirical methods that rely on covariation: Via the exploitation of additional data (light, temperature, nutrients, ...), retrievals and optical modeling for specific regions could be further constrained (and optimized), as for example in the study by Brewin et al. (2015) where information on ambient light field extracted from satellite information was combined with an abundance based PSC approach.

# RECOMMENDATIONS TOWARD OPERATIONAL PRODUCTS OF PHYTOPLANKTON DIVERSITY FROM SPACE

In the following we give recommendations how to fill the gaps identified in the previous chapter. **Table 3** summarizes the mid- and long-term actions that are detailed below. Note, that several actions will address several gaps simultaneously. We recommend that the implementation of these actions is done in communication and collaboration between ocean color scientists, observationalists, numerical modelers, and other users. This will ensure that products are aligned to new in situ and satellite observational techniques and fulfill the ever changing needs of the wide range of user communities.

# Improving Match between Satellite PG and Users' Needs

A mechanistic framework needs to be developed which draws the complementary use of the various PG data and links them to PFT (**Figure 1**). This will assure that users are aware of the actual specific groups in the different satellite products and how they compare to the groups they require in their specific application. Such a framework requires an international effort and funding, including experts in in situ measurements (HPLC, microscopy, flow cytometry, genetics, bio-optics), algorithm developers and representatives of the user communities (modeling, marine services). Certain medium-term actions should be taken:


# Curation and Acquisition of In situ Data for Improving and Assessing PG Retrievals

Within international cooperation of space agencies the curation of existing measurements of in situ PG abundance (HPLC, microscopy, flow cytometry, particle imaging, genomics, ...) and corresponding optical [IOPs, apparent optical properties (AOPs)] data needs to be secured:


# Theoretical Background to Further Develop PG Retrievals and Assess Their Uncertainty

To fill Gap 2, the development of a framework for clear traceability of uncertainties in PG satellite products needs to be supported by the specific assessment of mismatch definition, in situ error, retrieval error, or errors due to the spatial and temporal upscaling of specific PG signatures in diverse communities. This requires the steps mentioned in Section Improving Match between Satellite PG and Users' Needs and in Section Curation and Acquisition of In situ Data for Improving and Assessing PG Retrievals, but also steps linked to improving PG algorithms (see also Section Sustaining Long-Term PG Satellite Data) which require a solid theoretical background.

On one hand inverse modeling needs to be optimized by further developing the theoretical background to connect optical signatures to diversity of phytoplankton communities across different environments (especially in optically complex waters). The degree of independent information in hyperspectral wavelength signals will depend on the water type (e.g., waters optically dominated by phytoplankton alone, or other particles, or CDOM) and will determine whether different phytoplankton products can be independently derived from a given hyperspectral spectrum. This statistical-informational problem needs to be considered in the application of (global or regional) inversion algorithms. Measurements on spectral specific IOPs (in particular scattering properties) on natural and cultured samples will lead to a better description of optics in RTM. On the other hand, developing a mechanistic understanding of the spectral properties of PG to retrieve bio-optical indices of diversity requires a better utilization of expertise crossing a wide range of fields, including taxonomy and molecular ecology in connection with optically-derived PG. The usage of global numerical models which resolve the biooptical properties of different PG will provide a way forward for connecting more specifically a larger range of satellite PG products (highlighted in **Figure 1** with red-blue arrow connecting optical PG with PFT). Models could group their "model phytoplankton-analogs" according to more dimensions of diversity (e.g., accessory pigments, scattering characteristics, etc.—see optical PG in **Figure 1**) that link closer to the satellite PG definitions than the more classical PFT designations (e.g., nitrogen fixers, silicifiers,...). Models that include spectrallyresolved optics and bio-optical properties of phytoplankton could also prove to be a powerful tool for exploring the interdependency and regionally varying skill of different satellite PG approaches.

# Sustaining Long-Term PG Satellite Data

As outlined in Section Gap 3: Missing Capabilities of Current Ocean Color Satellite Measurements, long-term data sets of sufficient spatial and temporal resolution are needed to be established which are also adequate for regional applications (see Section Gap 4: Lack of Regional Capability of PG Algorithms).

At first, the exploitation of hyperspectral data needs to be intensified in order to base those data on deriving the spectral imprints of phytoplankton groups in ocean color:

– In preparation for the exploitation of future hyperspectral ocean color sensor [PACE, EnMAP or Hyperspectral InfraRred Imager (HyspIRI) and hopefully more] missions, much more effort needs to be put into the development of atmospheric correction for hyperspectral satellite data; methods should be developed over open ocean and complex waters, with the help of RTM, and considering multispectral atmospheric correction methods. Also current hyperspectral satellite data sets, such as SCIAMACHY, HICO, OMI (and from 2017 also TROPOMI), should be explored further.


In addition a framework is needed at an international level for integrating PG information from different sensors (hyper- /multispectral, global coverage/high spatial, and/or temporal resolution) to meet user requirements across different scales with special focus to regional applications (in order to fill Gap 4, see Section Gap 4: Lack of Regional Capability of PG Algorithms).


Based on the outcome of the activities to foster hyperspectral data exploitation, synergistic use of new satellite sensors for detection of phytoplankton diversity across all oceanic, coastal, and inland water environments should allow for merged PG products. This will secure the prolongation of PG data time series as climate data records.

# CONCLUSIONS

Synoptic observations on phytoplankton diversity, obtained from satellite ocean color data, have the potential to improve models for assessing and predicting climate change and for managing marine services, and they are currently the only means available for high resolution, long-term monitoring of changes in marine ecosystem structure at the regional to global scale. Yet, to meet the requirements of an essential ocean/climate variable (highly accurate and error-characterized) further scientific investment into existing and further developed methods is needed. In particular, the satellite phytoplankton group products should: (i) match those requested by the user communities; (ii) provide quantitative per-pixel uncertainty; (iii) exploit past, current and future hyperspectral remote-sensing; (iv) be tuned for regional applications (including coastal and inland-water regions); and (v) exploit better the various streams of satellite information, from the various sensors in space. Improved understanding of how the optical signatures (inherent optical properties) of phytoplankton groups vary will also aid algorithm development. These actions can only be achieved with coordinated and sustained investment, across national and international agencies, and through interdisciplinary co-operation between satellite algorithm developers, in situ experts and end users (e.g., ecosystem modelers).

## REFERENCES


# AUTHOR CONTRIBUTIONS

AB organized the specific CLEO session, lead its discussion, and the writing team which set up the outline of this manuscript. She wrote the draft version and synoptically implemented the revisions by all other coauthors. HB, VB, SD, AH, MH, TH, SL, JU, MV, and AW planned and participated in the writing session after the workshop and substantially helped to revise several versions of the manuscript. In addition also several experts who could not participate in the discussion and writing sessions provided substantial comments and suggestions to improve the manuscript: ABri, RB, AC, LC, ED, AD, NH, CM, EO, and DR.

# FUNDING

The ESA SEOM SY-4Sci Synergy project (No. 400112410/14/ I-NB) funded partially the contribution of AB, ABri, RB, SL, and AW. SD acknowledges NASA (NNX13AC34G) for funding. CM acknowledges funding from NNX13AC34G for this effort.

# ACKNOWLEDGMENTS

We thank ESA/ESRIN within the context of activities of the ESA Scientific Exploitation of Operational Missions (SEOM) for funding the "Phytoplankton Diversity at Global and Regional Scale" Session at the Colour and Light from Earth Observation (CLEO) workshop and International Satellite Functional Type Algorithm Intercomparison Project (http://pft.ees.hokudai.ac.jp/ satellite/), with funding from JAXA, UK National Centre for Earth Observation, and the IOCCG for facilitation of a series of meetings that led to the development of this manuscript.

SeaWiFS imagery: application to the detection of phytoplankton groups in open ocean waters. Remote Sens. Environ. 146, 97–112. doi: 10.1016/j.rse.2013.08.046


reflectance for the prediction of phytoplankton pigment concentrations. Ocean Sci. 11, 139–158. doi: 10.5194/os-11-139-2015


Ozeanographie\_der\_Polarmeere/SEOM-SynSenPFT-ATBD-D1.2\_v8.1.pdf (Accessed September 15, 2016).


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Bracher, Bouman, Brewin, Bricaud, Brotas, Ciotti, Clementson, Devred, Di Cicco, Dutkiewicz, Hardman-Mountford, Hickman, Hieronymi, Hirata, Losa, Mouw, Organelli, Raitsos, Uitz, Vogt and Wolanin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Simulating PACE Global Ocean Radiances

### Watson W. Gregg\* and Cécile S. Rousseaux

*NASA Global Modeling and Assimilation Office, Greenbelt, MD, USA*

The NASA PACE mission is a hyper-spectral radiometer planned for launch in the next decade. It is intended to provide new information on ocean biogeochemical constituents by parsing the details of high resolution spectral absorption and scattering. It is the first of its kind for global applications and as such, poses challenges for design and operation. To support pre-launch mission development and assess on-orbit capabilities, the NASA Global Modeling and Assimilation Office has developed a dynamic simulation of global water-leaving radiances, using an ocean model containing multiple ocean phytoplankton groups, particulate detritus, particulate inorganic carbon (PIC), and chromophoric dissolved organic carbon (CDOC) along with optical absorption and scattering processes at 1 nm spectral resolution. The purpose here is to assess the skill of the dynamic model and derived global radiances. Global bias, uncertainty, and correlation are derived using available modern satellite radiances at moderate spectral resolution. Total chlorophyll, PIC, and the absorption coefficient of CDOC (aCDOC), are simultaneously assimilated to improve the fidelity of the optical constituent fields. A 5-year simulation showed statistically significant (*P* <0.05) comparisons of chlorophyll (*r* = 0.869), PIC (*r* = 0.868), and aCDOC (*r* = 0.890) with satellite data. Additionally, diatoms (*r* = 0.890), cyanobacteria (*r* = 0.732), and coccolithophores (*r* = 0.716) were significantly correlated with in situ data. Global assimilated distributions of optical constituents were coupled with a radiative transfer model (Ocean-Atmosphere Spectral Irradiance Model, OASIM) to estimate normalized water-leaving radiances at 1 nm for the spectral range 250–800 nm. These unassimilated radiances were within −0.074 mW cm−<sup>2</sup> µm−<sup>1</sup> sr−<sup>1</sup> of MODIS-Aqua radiances at 412, 443, 488, 531, 547, and 667 nm. This difference represented a bias of −10.4% (model low). A mean correlation of 0.706 (*P* < 0.05) was found with global distributions of MODIS radiances. These results suggest skill in the global assimilated model and resulting radiances. The reported error characterization suggests that the global dynamical simulation can support some aspects of mission design and analysis. For example, the high spectral resolution of the simulation supports investigations of band selection. The global nature of the radiance representations supports investigations of satellite observing scenarios. Global radiances at bands not available in current and past missions support investigations of mission capability.

### Edited by:

*Shubha Sathyendranth, Plymouth Marine Laboratory, UK*

### Reviewed by:

*Tim Moore, University of New Hampshire, USA Hajo Krasemann, Helmholtz-Zentrum Geesthacht Centre for Materials and Coastal Research (HZ), Germany*

> \*Correspondence: *Watson W. Gregg watson.gregg@nasa.gov*

### Specialty section:

*This article was submitted to Ocean Observation, a section of the journal Frontiers in Marine Science*

Received: *13 December 2016* Accepted: *17 February 2017* Published: *06 March 2017*

### Citation:

*Gregg WW and Rousseaux CS (2017) Simulating PACE Global Ocean Radiances. Front. Mar. Sci. 4:60. doi: 10.3389/fmars.2017.00060*

Keywords: PACE, ocean color, water-leaving radiances, biogeochemical model, radiative transfer model

# INTRODUCTION

The now 19-year time series of routine global ocean color observations from space has led to advancements in the science of ocean biology beyond expectations. From chlorophyll interannual variability to inherent optical properties to physicalbiological coupling, the time series has been an invaluable resource for scientists in a broad range of ocean and atmosphererelated fields. As is often the case in science, the proliferation of information from these moderate resolution missions has raised as many questions as it has answered. Coupled with improvements in detector technology, the time is now right for advancement of ocean biogeochemical science from space using higher spectral resolution missions.

Higher spectral resolution can potentially improve detection of optical constituents in the oceans that have important effects on biology, biogeochemistry, and light transmission. One major objective is the determination of phytoplankton groups from space. Research to detect phytoplankton groups from space has been going on for some time using the fleet of moderate spectral resolution sensors (e.g., Kamykowski et al., 2002; Alvain et al., 2005; Aiken et al., 2007; Bracher et al., 2009; Brewin et al., 2010, 2011; Kostadinov et al., 2010; Masotti et al., 2010; Hirata et al., 2011). Methods to identify size classes have also been pursued (e.g., Loisel et al., 2006; Brewin et al., 2011) but these only loosely relate to phytoplankton functionality/taxonomy. Several phytoplankton discrimination methods resolve dominant groups only (Sathyendranath et al., 2004; Alvain et al., 2005, 2008; Hirata et al., 2008; Raitsos et al., 2008). Hirata et al. (2011) provides taxonomic classifications, with relative and even absolute abundances quantified. Using satellite ocean chlorophyll concentrations rather than radiances, this empirical methodology essentially assumes that abundance reflects taxonomy, which is valid in many instances but not always (Rousseaux et al., 2013).

Moderate resolution ocean color sensors containing only a few discrete spectral bands, such as the global missions flown to date, do not contain sufficient spectral information to enable unequivocal phytoplankton functional/taxonomic discrimination. Many phytoplankton species/groups have subtle, but distinct spectral signatures. Use of hyper-spectral remote retrievals with many bands spanning the visible and ultraviolet spectrum holds potential for resolving these spectral distinctions (e.g., Bracher et al., 2009; Sadeghi et al., 2012; Palacios et al., 2015; Neukermans et al., 2016).

To close this knowledge gap, NASA has proposed the PACE mission, a global hyper-spectral sensor to test the ability to retrieve phytoplankton population distributions, as well as other important ocean constituents with optical signatures. The mission, proposed for launch in the early 2020's, can potentially demonstrate the feasibility and capability of hyperspectral observations from space and enable scientists to observe and quantify these important ocean biological features. PACE is intended to follow future planned hyperspectral missions PRISM (Meini et al., 2015) and EnMAP (Foerster et al., 2015) with extended spectral range into the ultraviolet, faster observational repeat times, and emphasis on global ocean observational capability.

Since there is no global observational precedent, many mission development activities, design tradeoff assessments, operational strategies, and other issues, are speculative. Here we develop a dynamic global model at extreme hyper-spectral resolution (1 nm) to provide a platform to approximate realistic ocean conditions and help with resolving at least some of these issues and understand if such a simulation can assist in resolving many of the issues that inevitably arise in the design and testing of a new mission. The objective of this effort is to quantitatively assess the skill of a global model using a forward radiance representation to simulate global ocean water-leaving radiances. The skill is evaluated spectrally with explicit error characterization.

# METHODS

### Global Ocean Physical-Biogeochemical Model Configuration

The underlying biogeochemical constituents are simulated by the NOBM which is coupled to a global ocean circulation model, Poseidon (Schopf and Loughe, 1995). It spans the domain from −84◦ to 72◦ latitude in increments of 1.25◦ longitude by 2/3◦ latitude, including only open ocean areas, where bottom depth >200 m. NOBM incorporates global coupled physicalbiological processes, including four phytoplankton groups (diatoms, chlorophytes, cyanobacteria, and coccolithophores), which span much of the functionality of the global oceans, four nutrients (nitrate, ammonium, silicate, and dissolved iron), three detrital components (particulate organic carbon, silicate, and iron), and two carbon components (dissolved organic and inorganic carbon). It is a three-dimensional representation of coupled circulation/ biogeochemical processes in the global oceans (Gregg et al., 2003; Gregg and Casey, 2007).

Optically-active constituents have been added to NOBM to improve realism and complexity of the ocean simulation and better represent the ocean optical variability that will be observed by PACE. We have added particulate inorganic carbon (PIC) and chromophoric dissolved organic carbon (CDOC) as prognostic state variables. PIC is produced by coccolithophores as detached coccoliths and is lost via sinking and dissolution. PIC is produced as a fraction (25%) of the coccolithophore

**Abbreviations:** aCDM, absorption coefficient of Chromophoric Dissolved and particulate organic Matter; aCDOC, absorption coefficient of CDOC; BIOSOPE, Biogeochemistry and Optics South Pacific Experiment; CDOC, Chromophoric Dissolved Organic Carbon; CZCS, Coastal Zone Color Scanner; DOC, Dissolved Organic Carbon; EnMAP, Environmental MAPping and Analysis Program; GMAO, Global Modeling and Assimilation Office; MAP, Modeling, Analysis and Prediction; MERRA, Modern-Era Retrospective Analysis for Research and Applications; MODIS, MOderate Resolution Imaging Spectroradiometer; NIR, Near InfraRed; NOBM, NASA Ocean Biogeochemical Model; OASIM, Ocean-Atmosphere Spectral Irradiance Model; PACE, Plankton, Aerosol, Cloud and ocean Ecosystems ; PIC, Particulate Inorganic Carbon ; PRISMA, PRecursore IperSpettrale della Missione Applicativa; PSU, Practical Salinity Units; SeaWiFS, Sea-viewing Wide Field-of-view Sensor; S-NPP, Suomi National Polar-orbiting Partnership.

growth rate (Gregg and Casey, 2007) minus respiration. The PIC sinking rate is represented here as an exponential function of concentration, assuming that large concentrations of PIC are associated with larger coccolith size.

$$\mathbf{w}\_{\mathbf{s}}(\text{PIC}) = \mathbf{a}\_{\mathbf{0}} \exp(\mathbf{a}\_{1} \mathbf{\*} \text{PIC}) \tag{1}$$

where w<sup>s</sup> is the PIC sinking rate (m d−<sup>1</sup> ), PIC is in units of µgC l −1 , a<sup>0</sup> = 0.1 m d−<sup>1</sup> and a<sup>1</sup> = 2.0 l µgC−<sup>1</sup> (Gregg and Rousseaux, 2016). Dissolution follows Buitenhuis et al. (2001), except that no dissolution is allowed for depths shallower than the calcium carbonate compensation depth, which we define as 3500 m.

Chromophoric dissolved organic carbon (CDOC) represents the biogeochemical constituent necessary for the simulation of absorption by aCDOC(λ), the absorption coefficient, which is an optical quantity. CDOC is formed and destroyed the same as DOC, using Aumont et al. (2002) with an assumed DOC:CDOC production/loss ratio of 0.5. It is additionally destroyed by the absorption of spectral irradiance. We follow the methodology of Gregg and Rousseaux (2016) for photo-destruction (photolysis) of CDOC per unit irradiance quanta, with a different quantum yield ϕCDOC of 3.0E-6 (µM µmol photons absorbed m−<sup>3</sup> ) for results in reasonable agreement with MODIS-Aqua data (Maritorena et al., 2010).

### Ocean-Atmosphere Spectral Irradiance Model

NOBM is coupled to OASIM (Gregg and Carder, 1990; Gregg, 2002; Gregg and Casey, 2009) to simulate the propagation of downward spectral irradiance in the oceans and the upwelling irradiance/radiance. The irradiance pathways for OASIM are shown in **Figure 1**. The atmosphere and ocean portions of the downwelling and upwelling irradiance are implemented at 25 nm spectral resolution. Higher spectral resolution is impractical for global models that integrate at 30 min time steps in our case. Upwelling radiance is produced at 1 nm resolution, however. Biases and uncertainties in the atmospheric component of OASIM have been characterized for clear sky high spectral resolution (1 nm; Gregg and Carder, 1990) and under mixed cloudy and clear skies for integrated spectral resolution (Gregg and Casey, 2009). We elaborate here on the ocean optical calculations.

### Optical Properties of Ocean Constituents

The coupled NOBM-OASIM model includes optically active constituents, including seawater, phytoplankton, detritus, PIC, and CDOC each with unique spectral characteristics (**Figure 2**). All are prognostic state variables, with independent sources and sinks. The optical properties of each constituent are taken from various efforts in the peer reviewed literature.

### Water

The spectral absorption and scattering properties of seawater was reported by Smith and Baker (1981) for the 200–800 nm spectral domain. Pope and Fry (1997) revised this for the range 380–720 nm, but this was for pure water. Morel et al. (2007) derived new data for absorption and scattering for the spectral range 300–500 nm using information in the clearest ocean waters of the South Pacific (although absorption values >420 nm were taken from Pope and Fry, 1997). Finally, Lee et al. (2015)reported absorption coefficients in the range 350–550 nm derived using remote sensing reflectance algorithms for the same clear ocean water data used by Morel et al. (2007). Mason et al. (2016) used laboratory observations to obtain new absorption coefficients for the spectral range 250–550 nm. Like Pope and Fry (1997), their results were specific to pure water.

Water absorption data used here are from Smith and Baker (1981) for 200–300 nm and 730–800 nm, Morel et al., 2007) for 300–350 nm, Lee et al. (2015) for 350–550 nm, Pope and Fry (1997) for 550–720 nm, Circio and Petty (1951) for 800 nm–2.5 µm, and Maul (1985) for 2.5–4 µm. Water scattering is from the method of Zhang et al. (2009), which accounts for temperature and salinity dependence. The backscattering-to-total scattering ratio ˜bbw for water is 0.5.

### Phytoplankton

Phytoplankton optical properties are obtained from various sources. Chlorophyll-specific absorption coefficients a<sup>∗</sup> <sup>p</sup>(λ) are derived by taking reported spectra and normalizing to the absorption at 440 nm [a<sup>∗</sup> <sup>p</sup>(440)]. Normalized specific absorption spectra [a<sup>∗</sup> <sup>p</sup>(λ)]<sup>N</sup> are computed for each of the four phytoplankton groups: diatom and chlorophyte [a<sup>∗</sup> <sup>p</sup>(λ)]<sup>N</sup> are taken from Sathyendranath et al. (1987), cyanobacteria from Bricaud et al. (1988), and coccolithophores from Morel and Bricaud (1981). Then the specific spectral a<sup>∗</sup> <sup>p</sup>(λ) values are derived using mean values at 440 nm. Diatom a<sup>∗</sup> <sup>p</sup>(440) represents the mean of 5 observations containing 4 different spp., chlorophytes 6 observations from 4 spp., cyanobacteria 5 observations from 3 spp., and coccolithophores 3 observations of 1 spp.

Phytoplankton specific scattering coefficients b<sup>∗</sup> <sup>p</sup>(λ) are obtained from measurements at 590 nm and extended to the entire spectrum from specific attenuation coefficients (Bricaud et al., 1988). Diatom and chlorophyte specific scattering coefficients at 590 nm, b ∗ <sup>p</sup>(590) and b∗p(590), are the mean of 5 observations and 6 observations, respectively, from Morel (1987), Bricaud and Morel (1986), and Bricaud et al. (1988). Cyanobacteria b<sup>∗</sup> <sup>p</sup>(590) isthe mean of 8 observations from Morel (1987), Bricaud and Morel (1986), Bricaud et al. (1988), and Ahn et al. (1992). Coccolithophore b<sup>∗</sup> <sup>p</sup>(590) is derived from the mean of 3 observations from Bricaud and Morel (1986), Bricaud et al. (1988), and Ahn et al. (1992).

We assume no spectral dependence in the backscatteringto-total scattering ratio ˜bbp. Ahn et al. (1992) suggested a spectral dependence for cyanobacteria but generally none for the other groups. Reported values for ˜bbp are 0.002 for diatoms (Morel, 1988), 0.00071 for chlorophytes, 0.0032 for cyanobacteria (Ahn et al., 1992), and 0.00071 for coccolithophores (Morel, 1988). Some of these values have come under question based on non-sphericity of many natural phytoplankton populations (Vaillancourt et al., 2004; Whitmire et al., 2010). Based on these results, we increased ˜bbp for chlorophytes and coccolithophores by a factor of 10, but kept them as reported for diatoms and cyanobacteria.

### Detritus

Detritus both absorbs and scatters light (**Figure 2**). Absorption is typically considered an exponential function of wavelength (Roesler et al., 1989; Gallegos et al., 2011).

$$\mathbf{a}\_{\rm d}(\lambda) = \mathbf{D} \mathbf{a}\_{\rm d}^\* \exp[-\mathbf{S}\_{\rm d}(\lambda - 440)] \tag{2}$$

where ad(λ) is the absorption coefficient of detritus (m−<sup>1</sup> ), D is the concentration of detritus µg C m−<sup>3</sup> , S<sup>d</sup> = 0.013 nm−<sup>1</sup> (Gallegos et al., 2011) and a<sup>∗</sup> d is the mass-specific absorption coefficient of detritus, which is set to 8.0E-5 m<sup>2</sup> mg−<sup>1</sup> for small detritus as typically found in oceanic waters (Gallegos et al., 2011). Only organic carbon detritus in the model is used for detrital optics.

Detritus scattering is also taken from Gallegos et al. (2011).

$$\mathbf{p\_{q}(\chi) = D\_{q} \mathbf{b\_{q}^{\*}} (550/\lambda)^{0.5}} \tag{3}$$

where b<sup>d</sup> is the total scattering coefficient, and b∗ <sup>d</sup> is the massspecific scattering coefficient, which is set as 0.00115 m<sup>2</sup> mg−<sup>1</sup> , and the backscattering-to-total scattering ratio ˜bbd is 0.005.

### PIC

PIC optical properties have been evaluated by Gordon et al. (2009). We adopt this formulation for our simulation. PIC scatters irradiance but does not absorb

$$\mathbf{b}\_{\rm PIC}(\lambda) = \text{PLC}\,\mathbf{b}\_{\rm PIC}^\* \text{ (\lambda)}\tag{4}$$

where PIC is the concentration of PIC (mgC m−<sup>3</sup> ) and b<sup>∗</sup> PIC(λ) is PIC-specific spectral scattering coefficient from Gordon et al. (2009) in units of m<sup>2</sup> mgC−<sup>1</sup> . The backscattering-to-total scattering ratio ˜bbpic is from Balch et al. (1996), using their lower bound of 0.01.

### CDOC

As a dissolved component, CDOC only absorbs and does not scatter. Its spectral absorption is similar to detritus but with a different slope

$$\mathbf{a}\_{\rm CDOC}(\lambda) = \mathbf{a}\_{\rm cddoc}^\* \exp[-\mathbf{S}\_{\rm cddoc}(\lambda - 443)] \tag{5}$$

where a<sup>∗</sup> cdoc is the mass-specific absorption coefficient of CDOC (m<sup>2</sup> mg−<sup>1</sup> ), Scdoc = 0.014 nm−<sup>1</sup> (Bricaud et al., 1981, 2010). S is in the low end range of observations in surface waters of the Equatorial Atlantic (Andrew et al., 2013) but only slightly lower than those observed in the Mediterranean Sea (Organelli et al., 2014). There are few reports of the mass-specific absorption coefficient of CDOC a<sup>∗</sup> cdoc . We have found three observations in the literature (Carder et al., 1989; Yacobi et al., 2003; and Tzortziou et al., 2007). The more recent two are in agreement at 2.98 × 10−<sup>4</sup> m<sup>2</sup> mg−<sup>1</sup> in 4 rivers in Georgia, USA (Yacobi et al., 2003) and 2.78 × 10−<sup>4</sup> m<sup>2</sup> mg−<sup>1</sup> as the mean of 4 stations in the Rhode River, Maryland, USA (Tzortziou et al., 2007). Carder et al. (1989) reported a mean over about nearly an order of magnitude lower in the Gulf of Mexico (4.74 × 10−<sup>5</sup> m<sup>2</sup> mg−<sup>1</sup> ). We choose Yacobi et al. (2003) for our simulation.

### Upwelling Spectral Radiance

OASIM uses 25-nm spectral resolution in the 350–7700 nm range in the coupled model for downwelling and upwelling irradiance

needed for phytoplankton growth and CDOC destruction. For enhanced realism of the PACE simulation of upwelling radiance we increase the spectral resolution to 1 nm. Since all of the optical properties data are available at 5 nm resolution or less, it is reasonable to simply interpolate the 5 nm data. The computation of upwelling spectral radiance LwN(λ) is derived from the coupled expressions of downwelling and upwelling irradiance by Aas (1987) as modified by Ackleson et al. (1994).

$$\frac{\mathrm{d}\mathbf{E\_{d}}(\lambda)}{\mathrm{d}z} = -\mathrm{C\_{d}(\lambda)\mathbf{E\_{d}}(\lambda)}\tag{6}$$

$$\frac{\mathrm{d}\mathrm{E}\_{\mathrm{s}}(\lambda)}{\mathrm{d}\mathrm{z}} = -\mathrm{C}\_{\mathrm{s}}(\lambda)\mathrm{E}\_{\mathrm{s}}(\lambda) + \mathrm{B}\_{\mathrm{u}}(\lambda)\mathrm{E}\_{\mathrm{u}}(\lambda) + \mathrm{F}\_{\mathrm{d}}(\lambda)\mathrm{E}\_{\mathrm{d}}(\lambda) \tag{7}$$

$$\frac{d\mathcal{E}\_{\rm u}(\lambda)}{d\mathbf{z}} = -\mathcal{C}\_{\rm u}(\lambda)\mathcal{E}\_{\rm u}(\lambda) - \mathcal{B}\_{\rm s}(\lambda)\mathcal{E}\_{\rm s}(\lambda) - \mathcal{B}\_{\rm d}(\lambda)\mathcal{E}\_{\rm d}(\lambda) \tag{8}$$

where Ed(λ) is the spectral downwelling direct irradiance at the bottom of a model layer, Es(λ) is the downwelling diffuse irradiance, and Eu(λ) is the upwelling diffuse irradiance. The attenuation terms C<sup>x</sup> (where x is an indicator for the irradiance pathway d for direct downwelling, s for diffuse downwelling, and u for diffuse upwelling), backscattering terms Bx, and forward scattering F<sup>x</sup> differ for each of the irradiance pathways because of different shape factors (Aas, 1987; Ackleson et al., 1994) and mean cosines.

$$\mathbf{C\_d(\lambda) = [a(\lambda) + b(\lambda)]/\overline{\mu}} \tag{9}$$

$$\mathbf{C}\_{\mathbf{s}}(\lambda) = \left[ \mathbf{a}(\lambda) + \mathbf{r}\_{\mathbf{s}} \mathbf{b}\_{\mathbf{b}}(\lambda) \right] / \underline{\mu}\_{\mathbf{s}} \tag{10}$$

$$\mathbf{C}\_{\mathbf{u}}(\lambda) = \left[ \mathbf{a}(\lambda) + \mathbf{r}\_{\mathbf{u}} \mathbf{b}\_{\mathbf{b}}(\lambda) \right] / \underline{\mu}\_{\mathbf{u}} \tag{11}$$

$$\mathbf{B\_{d}(\lambda) = \mathbf{b}^{\flat}(\lambda)/\underline{\mu}}\tag{12}$$

$$\mathbf{B}\_{\mathbf{s}}(\lambda) = \mathbf{r}\_{\mathbf{s}} \mathbf{b}\_{\mathbf{b}}(\lambda) / \underline{\mu}\_{s} \tag{13}$$

$$\mathbf{B}\_{\mathbf{u}}(\lambda) = \mathbf{r}\_{\mathbf{u}} \mathbf{b}\_{\mathbf{b}}(\lambda) / \underline{\mu}\_{\mathbf{u}} \tag{14}$$
 
$$\mathbf{E}\_{\mathbf{u}'(1)} = (1 - \mathbf{k}') \mathbf{k}(1) / \mu \tag{15}$$

$$\mathbf{F\_{d}(\lambda) = (1 - \mathbf{b}\_{p}')\mathbf{b}(\lambda)/\overline{\mu}}\tag{15}$$

where a is the absorption coefficient, b is the total scattering coefficient, b<sup>b</sup> is the backscattering coefficient, b′ b is the ratio of backscattering to total scattering, and µ is the mean cosine (constant for diffuse irradiance, but varies with solar zenith angle for direct irradiance). The shape factors are indicated by the r<sup>x</sup> terms, and are specified as in Ackleson et al. (1994). Equation 5 can be solved a priori, which can then be used as a boundary condition, greatly simplifying the solution of the coupled Equations 6, 7.

Equation 8 can be simplified for normalized upwelling radiance since by its definition the surface downwelling irradiance does not include attenuation effects of the atmosphere and the solar zenith angle is assumed to be 0◦ with overhead sun (Gordon, 1997). Substituting the mean extraterrestrial irradiance (Thuillier et al., 2004) for downwelling irradiance, we can obtain upwelling normalized water-leaving radiance solving the Aas (1987) expressions and correcting for surface reflectance.

$$\mathcal{L}\_{\text{w}}\mathcal{N}(\lambda) = \mathcal{F}\_{\text{o}}(\lambda, 0^{-})(1 - \rho)/(\text{n}^{2}\text{Q})\tag{16}$$

where F<sup>o</sup> is the mean extraterrestrial irradiance (mW cm−<sup>2</sup> µm−<sup>1</sup> ) just below the ocean surface (0−) derived using Aas (1987), ρ is the surface reflectance (0.021), n is the index of refraction (1.341) and Q is the radiance:irradiance distribution function (= π for normalized surface irradiance).

Using 1 nm spectral resolution LwN not only supports testing PACE sensor and mission concepts, it also simplifies comparison with MODIS-Aqua LwN by virtue of avoiding band mismatches. The pathways of optical constituents to optical properties to upwelling normalized water-leaving radiances as represented by the NOBM-OASIM global coupled physical-biogeochemicaloptical model is depicted in **Figure 3**.

### Data Assimilation

Global total chlorophyll from MODIS is assimilated into NOBM using the method described in Gregg (2008). Additionally, global PIC from MODIS (Balch et al., 2005) is assimilated, using the same methodology except that the data are not log-transformed before assimilation. CDOC is assimilated, however, it requires a transformation before the process is executed. There is no available satellite data for CDOC, but a satellite product called aCDM is available (Garver and Siegel, 1997; Maritorena and Siegel, 2005; Maritorena et al., 2010). We use the products from MODIS-Aqua in this effort. This product represents the absorption of both CDOM and detritus (hence the usage of CDM to minimize confusion about its nature). Siegel et al. (2002) estimated the detrital contribution as 12%. We assume this is globally constant and apply a correction of 0.88 to the aCDM(443) data fields prior to assimilation. We recognize this is a potential error, but it is difficult to separate the two in a reflectance inversion methodology because the spectral slopes of absorption are quite similar. The satellite aCDM(443) is assimilated with model aCDOC(443), which is then easily converted to CDOC using the mass-specific absorption coefficient of CDOC (Yacobi et al., 2003).

Upwelling radiances are not assimilated. They are computed using the distributions of optical constituents in the model, their optical properties (**Figure 2**), and Equation 16 at 1 nm spectral resolution.

### Model Setup

The model is integrated for 35 years from an initial state using climatological atmospheric forcing, with the new variables PIC and CDOC initialized to 0 concentrations. The model is then run forward in time from 2003 through 2007 using transient atmospheric forcing from MERRA (Rienecker et al., 2011) and assimilating MODIS-Aqua total chlorophyll, PIC, and CDOC.

### Statistical Comparison

The optical constituents of the NOBM-OASIM assimilation model are compared to in situ and/or satellite (MODIS) monthly data where and when available. Phytoplankton groups are compared to in situ data while total chlorophyll, PIC, and aCDOC are compared to satellite estimates. The statistics are aggregated over the 12 basins of the global oceans, mean differences (biases) computed, and then correlations computed over the basins. This provides an estimate of large scale correlations and is very stringent considering the low number of observations. The major ocean basins are divided into 3 main regions, high latitudes (poleward of ± 40◦ latitude): North Atlantic and Pacific and Southern Ocean, mid-latitudes (between ± 40◦ and ±10◦ latitude): North Central Atlantic and Pacific, South Atlantic, Pacific and Indian, and North Indian, and tropical basins (between ± 10◦ latitude): Equatorial Atlantic, Pacific, and Indian. Comparison of assimilated model results with the data used for assimilation is typically insufficient for assessing assimilation performance (Gregg et al., 2009). However, in this case the objective is to simulate dynamic global water-leaving radiances to support a proposed mission, not to assess the assimilation methodology. Here, knowledge of the biases and uncertainties in the underlying ocean optical constituents derived from the assimilation model is best achieved using the satellite data inputs for assimilation. Normalized water-leaving radiance using OASIM and the computed optical constituent distributions are compared to MODIS at the available MODIS bands, 412, 443, 488, 531, 547, and 667 nm. Using 1 nm upwelling radiances at the center of MODIS bands, we can evaluate the simulated bias and uncertainty with MODIS data and avoid model/data band misalignment. These statistics are not aggregated by basin.

# RESULTS

We evaluate ocean optical constituents, specifically phytoplankton, total chlorophyll, PIC, and aCDOC, the latter three of which are provided as data sets from MODIS-Aqua. Water

TABLE 1 | Comparison of simulated optical constituents in NOBM-OASIM with data (in situ or satellite).


*NS indicates not significant at 95% confidence. NA indicates data not available for comparison. The satellite comparison uses MODIS-Aqua and model data used are co-located and coincident with monthly mean MODIS data.*

is a constant background and we are not aware of global data on detritus. We evaluate water-leaving radiances by comparing model upwelling radiances at MODIS-Aqua wavelengths with those MODIS-Aqua radiance data.

### Global Ocean Optical Constituents

Total chlorophyll from the assimilated NOBM-OASIM model is within −35.9% of satellite data (model low), with a correlation across basins of 0.869 (P < 0.05; **Figure 4**; **Table 1**). The model is low because of uncorrected aCDM in the satellite data, especially near coasts and river mouths, which artificially drives up the estimates of chlorophyll.

Phytoplankton group relative abundances are positively correlated with in situ data for diatoms, cyanobacteria, and coccolithophores (P < 0.05) but chlorophytes are not correlated (**Table 1**). All four groups have relative abundance biases < ±20% compared to in situ data, with diatoms the largest at 17%.

Assimilated PIC is correlated with satellite estimates (P < 0.05) and concentrations are within −28.5% (**Figure 5**; **Table 1**). Simulated PIC is overestimated and more widespread in the Southern Ocean in December, but otherwise exhibits similar variability as indicated by the correlation coefficient (r = 0.868). It is unable to capture the localized extreme high concentrations in June in the northern high latitudes, which leads to model underestimates globally. Model comparison of aCDOC (443 nm) is within −24.6% of satellite estimates of aCDM (443 nm) (**Table 1**), which represents the combined absorption of dissolved matter and particulate matter (detritus). A basin correlation coefficient of 0.890 (P < 0.05) is obtained (**Table 1**). Maps of global distributions for June and December 2007 illustrate the comparison between model and data (**Figure 6**). Although river discharge is not included in the model, high aCDOC is produced at major river mouths (e.g., Amazon, Orinoco, Congo) via the assimilation of aCDM (see **Figure 6**).

# Global Normalized Water-Leaving Radiances

The mean of the global median difference of model normalized water-leaving radiances with MODIS-Aqua radiances for all 6 bands for the period 2003–2007 is −0.074 mW cm−<sup>2</sup> µm−<sup>1</sup> sr−<sup>1</sup> (−10.4%) with a mean semi-interquartile range of 0.077 and a significant correlation of 0.706 (P < 0.05). There is a positive and significant correlation with all the simulated radiances with satellite data (**Figure 7**). The largest relative difference (−30%) and lowest correlation (r = 0.48) occurs in the longest MODIS band, 667 nm (**Figure 7**). Band 1 (412 nm) has the largest absolute difference (−0.19 mW cm−<sup>2</sup> µm−<sup>1</sup> sr−<sup>1</sup> ; **Figure 7**), but only the third largest relative difference with a mean of −12.5%, and it has a high correlation of 0.946. All simulated radiances are low relative to data (**Figure 7**). Correlations of the longer visible wavelengths, 531, 547, and 667 nm are much lower than those of the shorter wavelengths.

Global maps of water-leaving radiances illustrate the spatial agreement and discrepancies between the model and satellite data

(**Figures 8**–**10**). The spatial distributions reflect the biases and correlations shown in **Figure 7**. Low biases in model radiances are apparent for all bands, but the locations differ. Low model radiances are most apparent for the shorter wavelengths (412 and 443 nm) in the central gyres (**Figure 8**). Mid-range bands (531 and 547 nm) show low model biases in the northern high latitudes (**Figures 9**, **10**). The longest MODIS band (667 nm) does not exhibit a model bias as shown in **Figure 7**, but the bias is below the spectral resolution of the figure.

Maps of normalized water-leaving radiances at various wavelengths from the 1 nm hyper-spectral resolution capability are shown in **Figures 11**, **12**. The radiance wavelengths are broken into the two figures to capture variability over the widelyranging radiance values shown. The second set of radiance maps (**Figure 12**) uses a different scale for radiance values. Otherwise, spatial variability in these radiances is not visible.

Two locations in the North Pacific Ocean are selected to show hyperspectral variability in different oceanic environments (**Figure 13**). One is a low-chlorophyll central gyre location which is characterized by low chlorophyll, PIC and CDOC, southwest of Hawaii. The other is in the high latitude North Pacific just south of the Aleutian Islands, where high chlorophyll, PIC and CDOC prevail. Hyperspectral 1 nm normalized water-leaving radiances show considerable differences in magnitude and local spectral slopes, suggesting the potential for discrimination of ocean constituents from PACE.

## DISCUSSION

We have described a comprehensive model of optical constituents and their influences on hyper-spectral upwelling radiance in the global oceans. The model contains a representation of major optical constituents, namely, water, total chlorophyll, four major phytoplankton taxonomic/functional groups, organic detritus, PIC, and CDOC. All except water are prognostic variables in the model with individual sources and sinks, and with full dynamical capability arising from advection and diffusion processes in the global oceans.

Normalized water-leaving radiances from the global distributions of optical constituents have been quantitatively compared to MODIS-Aqua radiances for the 6 wavelengths available at 412, 443, 488, 531, 547, and 667 nm. These 6 discrete wavelengths provide only a partial basis for estimating the potential of a global dynamical model to represent the hyper-spectral capability of the next generation PACE mission. Thus, the error estimation is incomplete, and relevance to PACE and its ability to simulate future global hyper-spectral radiances is unconfirmed. However, the comparison of the model with the 6 MODIS bands suggests a level of skill sufficient to support some analysis of mission capability and design, and the level of caution necessary to proceed in these activities is quantified here.

# Global Ocean Optical Constituents

The global ocean biology model is optically comprehensive, but it is not complete. There are optical constituents in the oceans that are not included in the model. Some can be important, sometimes globally but most often regionally. For example, bacteria and virus scattering is not present in the model. Bacteria scattering is considered an important component of the scattering from the living part of the particulate pool, possibly dominating the phytoplankton (Balch et al., 2002; Stramski et al., 2004). However, the scattering contributions from the living components are estimated to be small relative to detritus (Stramski et al., 2004). We assume here that bacteria covary with detritus. Virus scattering is disputed. Balch et al. (2002) suggest it may be important while Stramski et al. (2004) consider it negligible.

Minerals/suspended sediments are not included. These are most important near river mouths at times of high discharge, but they also occur from particulate deposition from the atmosphere, such as desert dust (Wozniak Stramski, 2004) or organic carbon from biomass burning. Absorption by mycosporine-like amino acids (Moisan and Mitchell, 2001) is not included in the model. This is most important in the ultraviolet spectrum, and casts suspicion on the simulated representations of water-leaving radiances in this spectral region by the model. PACE is nominally expected to detect as low as 350 nm (PACE Mission Science Definition Team Report, 2012), but there may be interest in expanding that range if it is technically and economically feasible. The most recent configuration concept is to expand the detection limit to 320 nm. Inclusion of the effects of mycosporine-like amino acids should be included in future improvements of the biological global model.

Finally, four phytoplankton groups cannot possibly represent the range and complexity of the phytoplankton taxa living in the oceans. Unfortunately, detailed knowledge of the optical, physical, and physiological properties of the world's ocean phytoplankton, which is required to parameterize our coupled optical, physical, and biological model, is not available. We recognize our four groups as a shortcoming, but they do capture a substantial range of functionality. Diatoms represent the fast growing, fast sinking component particularly important in the carbon and silicon cycles. Cyanobacteria represent the functional opposite, as a slow growing, nearly floating, very small phytoplankton that occupy the nutrient-desolate vast ocean gyres, and additionally have a limited nitrogen-fixing capability (Rousseaux et al., 2013). Coccolithophores represent a unique category of calcium-producing phytoplankton, which scatter light out of the oceans effectively and play a role in the carbon cycle by affecting alkalinity in addition to photosynthesis and respiration processes. Finally, chlorophytes represent (or at least are intended to represent) intermediate phytoplankton with characteristics between diatoms and cyanobacteria. It

is this intermediate category that is most under-represented here and is where much of the diversity of the global ocean arises.

The fact that chlorophytes are not significantly correlated with in situ data in the model is particularly important because they are the only group in the model representative of the diverse phytoplankton component between the functional extremes of diatoms and cyanobacteria, save for the unique coccolithophore class. This is a deficiency in the model as it pertains to PACE and we acknowledge that their lack of correlation with data is important. However, in the model we assume chlorophytes represent a very wide range of phytoplankton, often reported to as nanoplankton. Since in situ data sets rarely specifically identify chlorophytes, we compare our model chlorophytes to in situ data reports of nanoplankton, non-diatoms or nonpico-prokarytotes, representing this middle ground between diatoms and cyanobacteria. We note that most of the lack of correlation with in situ data occurs in the high latitudes, where chlorophytes are not common, but other types on nanoplankton are sometimes abundant. The abundance of these reported nanoplankton in the high latitudes, coupled with the nearabsence of chlorophytes in the model, is the cause of the lack of correlation. The model representation of chlorophyte abundance corresponds much more closely with reported observations of nanoplankton in the lower latitudes, suggesting that simulation of PACE radiances in these basins is likely to be more realistic.

# Using Data Assimilation to Improve the Representation of Global Optical Constituents

The assimilation of chlorophyll has been demonstrated to improve the representation of distributions regionally and globally (Hu et al., 2012; Fontana et al., 2013; Gregg and Rousseaux, 2014). Assimilation of PIC and aCDM has not been attempted globally, to our knowledge. Our purpose in assimilating PIC and aCDM is not novelty but fidelity. The optical properties of PIC have been established (Balch et al., 1996; Gordon et al., 2009) and one can find models of production and dissolution in the literature (Buitenhuis et al., 2001; Gangsto et al., 2011; Barrett et al., 2014). Our parameterization of sinking processes is a matter of trial and error using global satellite fields of PIC from MODIS-Aqua. Assimilation of aCDM is a larger challenge. Although assimilation of optical properties, in particular the diffuse attenuation coefficient, has shown value (Ciavatta et al., 2014), the assimilation of aCDM is more problematic because there a few examples of its use in coupled physical-biogeochemical models (e.g., Buitenhuis et al., 2001; Xiu and Chai, 2014; Dutkiewicz et al., 2015) We approach the problem in a bottom-up fashion, adding a dynamical tracer to the biogeochemical model suite, i.e., CDOC, which has the optical properties of aCDOC(λ). The characterization of the biological production and loss terms for CDOC is more or less straightforward, as it can be related to those from the optically inert DOC (e.g., Aumont et al., 2002). Loss of CDOC via the

absorption of spectral irradiance is more difficult. Although the absorption characteristics are well-established, how that relates to CDOC concentration and subsequent destruction is difficult to quantify. There is regional information on defining a quantum yield for CDOC photolysis, ϕcdoc (e.g., Reader and Miller, 2012, 2014), but we require a global spectrally integrated solution. We consider our parameterization of ϕcdoc to be tenuous, but we take consolation that the assimilation guides us to a reasonable result in the end, and even rectifies the absence of river input in the model, which is a major source of CDOC to the oceans. For the present purpose of providing a model to assist in the early stages of development of a future mission, we believe our approach has support as an initial step. The statistical comparison of CDOC distributions with satellite data supports this approach as well (**Table 1**; **Figure 6**).

## Global Water-Leaving Radiances

The comparison of model water-leaving radiances with MODIS-Aqua at the 6 MODIS bands suggests some skill in the simulation: the mean of the global median difference is −0.077 ± 0.079 mW cm−<sup>2</sup> µm−<sup>1</sup> sr−<sup>1</sup> (−10.4%). A statistically significant correlation with all the simulated radiances with satellite data is found (**Figure 7**), although some of the correlation coefficients are low. We emphasize that the radiances are not assimilated. We emphasize that the radiances are not assimilated. Rather, they are the result of the distribution of optical constituents in the coupled model.

The longer visible wavelengths, 531, 547, and 667 nm have lower correlations with satellite data than the shorter ones. There is much less spatial variability in the longer wavelengths (**Figures 9**, **10**). Ocean color sensors have much larger uncertainty in these wavelengths (Mélin et al., 2016) which contributes to the decrease in correlation of these radiances here.

The model is always low relative to the MODIS normalized water-leaving radiances. The low model radiances occur in different regions for the different bands. For the shortest MODIS wavelengths, 412 and 443 nm, largest biases occur in the ocean gyres (**Figure 8**), where ocean biological optical constituents are at their lowest magnitudes. The 412 nm band has a larger modeldata discrepancy than the 443 nm band (**Figure 7**). For the midrange bands 531 and 547 nm, the model-data discrepancies occur in the northern high latitudes.

The model low bias for LwN(412) and LwN(443) in the central gyres suggests either missing scattering in the model or overestimated absorption. These regions are biologically the most barren regions in the global ocean, where the main optical constituent is water. The southeast Pacific gyre has been the subject of an intensive field campaign (BIOSOPE), and several investigators have relied upon this data set to revise the understanding of the optical properties of seawater (Morel et al., 2007; Lee et al., 2015), CDOM and particulate detrital absorption (Bricaud et al., 2010), and total particulate backscattering (Twardowski et al., 2007). The Lee et al. (2015) seawater absorption revision reduced the absorption coefficients,

thus producing more scattering, which has helped in our model here, since the revision is used in our calculations. Residual underestimation of scattering and/or overestimation of absorption still prevails in the simulation.

It is possible that the exclusion of mineral scattering in the model is important in the central gyres. However, this argument would be more persuasive for the North Central Pacific and North Central Atlantic gyres than the South Pacific gyre, since there are few atmospheric depositions to this region. One cannot neglect the possibility of radiative model error as well. Perhaps the use of empirical constants in a remote sensing reflectance algorithm, such as Lee et al. (2002) or Gordon et al. (1988), would improve radiances. However, this would sever connections in the radiative modeling system, which uses an analytical model for simulation of both irradiance transmittance in the ocean and the irradiance and radiance re-emerging to and above the surface.

Finally, the spectral slope of detrital absorption Sd(λ) used here, 0.013 nm−<sup>1</sup> , which was derived from assessment of small particulates in the Chesapeake Bay (Gallegos et al., 2011), is higher than that derived from the southeast Pacific by Bricaud et al. (2010), 0.0094 nm−<sup>1</sup> . This could lead to the higher absorption and subsequent lower backscatter, especially in the shorter wavelengths, as we observe here. How much will depend upon the concentration of detritus in this region and the other central gyres.

The model also exhibits low radiances compared to MODIS for the 531 and 547 nm bands (**Figures 9**, **10**), except these are mostly located in the northern high latitudes. These discrepancies appear to be related to the distributions of PIC (**Figure 5**). Model PIC distributions here largely correspond with satellite distributions, although local maxima in the southern central North Pacific and the Greenland Sea are subdued in the model (**Figure 5**). These two locations are responsible for the largest disagreements. However, additional local maxima in satellite PIC occur in the northern Bering Sea and western Sea of Okhotsk (**Figure 5**), that are not accompanied by high water-leaving radiances in the MODIS 531 and 547 nm bands (**Figures 10**, **11**). High chlorophyll (**Figure 4**) and aCDOM (**Figure 6**) in the model and MODIS likely suppress the scattering of PIC in the northern Bering Sea and Sea of Okhotsk. But the lack of representation of the high scattering by PIC in the south-central North Pacific and Greenland Sea results from the spatially smoother PIC distributions in the model compared to MODIS (**Figure 5**). Overall widespread higher radiance dispersed throughout the northern basins in likely due to inadequate PIC scattering in the model, considering the correspondence between model and satellite PIC distributions. Excessive absorption by other constituents in the model can contribute to the differences in radiances between model and data here. Such high absorption would likely be due to phytoplankton (particularly diatoms, which are predominant in the North Pacific), or coccolithophores which are prevalent in the North Atlantic.

Global maps of selected normalized water-leaving radiances other than those coincident with MODIS-Aqua show

considerable spectral and spatial variability (**Figures 11**, **12**). The figures are divided into two groups because the spectral range is so large that different scales must be utilized. **Figure 10** shows radiances from two ultraviolet-b bands (300 and 320 nm), to an ultraviolet-a band (340 nm), and 13 through mid-range visible (360–560 nm). There is a steady increase in radiance intensity as we progress from shorter to longer wavelengths until about 400–410 nm, then a slow decline to 560 nm. An exception to this trend is the radiance at 430 nm, which shows a sharp decline relative to its neighbors at 420 and 440 nm (**Figure 11**). This is due to a local minimum in the extraterrestrial irradiance that is employed at 1 nm resolution (Thuillier et al., 2004). These local minima and maxima occur occasionally in the radiance spectrum and represent a potential issue when choosing band locations for PACE. There can be very large swings in signal strength in short wavelength segments.

The second selection of radiance wavelengths, at extreme ultraviolet-b along with the long end of visible and 3 near infrared wavelengths (**Figure 12**), shows increasing intensity from 250 through 270 nm, and another from 600 to 630 nm, before reversing from 650 to 720 nm. There is very little normalized water-leaving irradiance at 720 nm and spatial variability will require another scale change to be visible. There is another anomaly, this time a local maximum, at 270 nm, again due to the high spectral variability in the extraterrestrial irradiance. This set of radiances, with the possible exceptions of the shorter 600 nm bands, suggests that ocean signal detection from a satellite will be challenging. The longer 600 nm wavelengths are conventionally used for atmospheric correction since there is so little ocean contribution to the normalized water-leaving radiance (e.g., Gordon, 1997) while NIR bands (e.g., Wang et al., 2016) have shown additional promise for the rare conditions when the ocean does contribute here.

# Potential Uses for Pace Mission Design and Analysis

The hyper-spectral 1 nm resolution ocean model presented here suggests skill for simulating global normalized waterleaving radiances, as shown by the comparison with the moderate resolution bands for MODIS-Aqua. Quantitative error characterization shows the limits of usefulness in the MODIS bands and the potential for simulating radiances outside the current satellite observational capability. This suggests at least some usefulness for pre-launch PACE design and analysis activities, guided by due caution of the limits of the simulation.

Representation of remotely-sensed normalized water-leaving radiances may be approached using airborne (e.g., Airborne Visible/Infrared Imaging Spectrometer, Portable Remote Imaging SpectroMeter), or in situ data, or coastal spaceborne

imagers, such as the Hyperspectral Imager for the Coastal Ocean. However, the global observing simulation capability of the present assimilated model can contribute in other important ways that airborne, in situ and limited spaceborne data cannot.

The most important attribute that separates PACE from previous ocean color missions is its global hyper-spectral resolution capability. The global simulation described here at 1 nm can help clarify questions about band selection, specifically choice of bands, band widths, number of bands and their center location. Variability over orbital tracks encountering a range of solar and satellite angles complicates band selection decisions in ways that in situ and most airborne activities cannot resolve. The global seasonal nature of the simulation assists in understanding potential signal strength issues over the diverse regions and seasons encountered in a global mission. It is possible to sample the simulated 1 nm bands in various scenarios to observe and optimize their location and widths, subject to the viewing constraints of an orbiting platform. Optical effects, such as spectral response function can be included in the analysis. As mission design and construction proceeds, issues can arise and tradeoffs must be assessed. These often include signal-to-noise ratios, detector saturation effects, gain selection and operation (if applicable), stray light, and bright target recovery. The existence of the simulation described here can provide numerical answers from an orbital perspective, even if approximate, as these issues emerge. The limitations of the model are quantitatively characterized here and can be factored into the decisions on how to proceed. A much more modest simulation, using only a single global map of ocean color data derived from the entire CZCS mission (Gregg et al., 1997), proved helpful in designing and managing the SeaWiFS mission, which, like PACE, had no global observational precedent.

The second most important feature of this simulation is to provide a platform for algorithm development activities. Although the phytoplankton differentiation in the model is necessarily simplified, it can be used in coarse algorithm activities. At worst, algorithms that cannot differentiate among the simple phytoplankton assemblage in the simulation would likely have difficulties in actual ocean observations, where the phytoplankton diversity is enormous.

The simulation can also assist in studies of data collection strategies on orbit. Seasonal variability in phytoplankton/PIC/CDOC distributions is explicitly incorporated in the simulation to include a full representation of optical combinations as seen to date with current missions. If coupled with a similarly comprehensive and hyper-spectral atmospheric simulation, and an orbital viewing platform, the combined models can be used to

explore signal retrieval at the sensor and help maximize the ability to meet the challenging goals of this ambitious mission.

### AUTHOR CONTRIBUTIONS

WG was responsible for writing and organizing the manuscript. CR was responsible for deriving hyper-spectral data and assisting in the writing and reviewing of the manuscript.

### REFERENCES

(high chlorophyll).


### ACKNOWLEDGMENTS

We thank the NASA/MERRA Project, the MODIS Ocean Color Processing Team, and the algorithm developers for PIC and aCDM for the data sets and public availability. We also thank the members of the PACE Science Team for optics parameters and data and two reviewers. This work was supported by NASA PACE, S-NPP, CMS, and MAP Programs. Hyperspectral model radiances are available at the GMAO web site https://gmao.gsfc. nasa.gov/research/oceanbiology/data.php.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Gregg and Rousseaux. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Light Absorption by Phytoplankton in the Upper Mixed Layer of the Black Sea: Seasonality and Parametrization

Tanya Churilova<sup>1</sup> \*, Vyacheslav Suslin<sup>2</sup> , Olga Krivenko<sup>1</sup> , Tatiana Efimova<sup>1</sup> , Nataliia Moiseeva<sup>1</sup> , Vladimir Mukhanov <sup>1</sup> and Liliya Smirnova<sup>1</sup>

*<sup>1</sup> A.O. Kovalevsky Institute of Marine Biological Research, Sevastopol, Russia, <sup>2</sup> Department of Ocean Dynamics, Marine Hydrophysical Institute, Sevastopol, Russia*

Standard NASA ocean color algorithm OC4 was developed on the basis of ocean optical data and while appropriate for Case 1 oceanic waters could not be adequately applied for the Black Sea waters due to its different bio-optical properties. OC4 algorithm is shown to overestimate chlorophyll concentration (Chl-*a*) in summer and underestimate Chl-*a* during early spring phytoplankton blooms in the Black Sea. For correct conversion of satellite data to Chl-*a*, primary production and other indicators regional algorithms should be developed taking into account bio-optical properties of the Black Sea waters. Light absorption by phytoplankton pigments– *aph*(λ) have been measured in open sea and shelf Black Sea waters in different seasons since 1998. It was shown that the first optical depth was located within the upper mixed layer (UML) for most of the year with the exception of the spring when seasonal stratification was developing. As a result spectral features of water leaving radiance were determined by optical properties of the UML. Significant seasonal differences in Chl-*a* specific light absorption coefficients of phytoplankton within UML have been revealed. These differences were caused by adaptive changes of composition and intracellular pigment concentration due to variable environment conditions–mainly light intensity. Empirical relationships between *aph*(λ) and Chl-*a* were derived by least squares fitting to power functions for different seasons. Incorporation of these results will refine the regional ocean color models and provide improved and seasonally adjusted estimates of chlorophyll a concentration, downwelling radiance and primary production in the Black Sea based on satellite data.

Keywords: phytoplankton, light absorption, parameterization, chlorophyll a concentration, upper mixed layer, the Black Sea

# INTRODUCTION

Visible spectral radiometric data are used widely to assess water productivity (Saba et al., 2011) and to study effect of climate change on ocean productivity (Behrenfeld et al., 2006). Optical scanners of Sea-viewing Wide Field-of-view Sensor (SeaWiFS), MEdium Resolution Imaging Spectrometer (MERIS), Moderate Resolution Imaging Spectroradiometer aboard the Terra and Aqua satellites (MODIS-Aqua/Terra) measure water leaving radiance at several spectral bands (RRS) (Feldman and McClain, 2013). The spectral distribution of RRS is influenced by particulate scattering and absorbance of solar radiance by all in-water optically active components: phytoplankton, non-algal particles (NAP), colored dissolved organic matter (CDOM) and pure water

### Edited by:

*Shubha Sathyendranth, Plymouth Marine Laboratory, UK*

### Reviewed by:

*Bob Brewin, Plymouth Marine Laboratory, UK Toru Hirawake, Hokkaido University, Japan*

> \*Correspondence: *Tanya Churilova tanya.churilova@gmail.com*

### Specialty section:

*This article was submitted to Ocean Observation, a section of the journal Frontiers in Marine Science*

Received: *29 December 2016* Accepted: *15 March 2017* Published: *04 April 2017*

### Citation:

*Churilova T, Suslin V, Krivenko O, Efimova T, Moiseeva N, Mukhanov V and Smirnova L (2017) Light Absorption by Phytoplankton in the Upper Mixed Layer of the Black Sea: Seasonality and Parametrization. Front. Mar. Sci. 4:90. doi: 10.3389/fmars.2017.00090* (Kirk, 1994). Light absorption by particles (a<sup>p</sup> (λ)), phytoplankton (aph(λ)), NAP (aNAP (λ)) and CDOM (aCDOM(λ)) have been studied in different regions of the global ocean since the 80-s (Hoepffner and Sathyendranath, 1992; Bricaud et al., 1995, 1998; Cleveland, 1995; Babin et al., 2003) to develop algorithms for assessment of water productivity based on remote sensing. Inherent optical properties (IOPs) vary throughout the world ocean. Due to high variability in light absorption and scattering by optically active components, the world ocean needs to be subdivided into various provinces based on regional IOPs, and their features could be used to improve remote-sensing algorithms for each province (Hoepffner and Sathyendranath, 1992; Lutz et al., 1996; Suzuki et al., 1998). Originally the standard NASA algorithm could be applied if there was a high correlation between aph(λ) and absorption by colored dissolved and suspended organic matter (aCDM(λ)) (Morel and Prieur, 1977). Although NASA standard algorithms are continually being updated (O'Reilly et al., 2000), the latest versions (OC4 for SeaWiFS, and OC3M for MODIS-Aqua /Terra) do not provide an adequate assessment of chlorophyll a concentration (Chl-a) in the Black Sea waters (Suslin and Churilova, 2016) which belong to the Case 2 (Suslin et al., 2007). Berthon et al. (2008) underlined that important uncertainties for the retrieval of marine products like Chl-a still persisted in areas (including the Black Sea) where relatively high CDOM absorption and optically active water constituents CDOM and NAP do not co-vary in a predictable manner with Chl-a.

For correct conversion of optical scanner signals into water productivity indices regional algorithms need to be developed taking into account bio-optical properties of the Black Sea. The assessment of the Chl-a needs to derive aph(λ) from total light absorption by all optically active components and then estimate Chl-a based on relationship between aph(λ) and Chl-a. This relationship is also required for development of regional algorithms of downwelling radiance and primary production by spectral approach. Early versions of the regional bio-optical algorithms (Suslin et al., 2008; Churilova et al., 2009; Churilova and Suslin, 2010) were based on limited amount of bio-optical data available. The bio-optical properties of the Black Sea (namely aph(λ), aNAP(λ) and aCDOM(λ)) have been studied since 1995 in open and coastal waters of the Black Sea (Churilova, 2001; Churilova and Berseneva, 2004; Churilova et al., 2004; Chami et al., 2005; Berthon et al., 2008; Dmitriev et al., 2009). Variability in aph(λ) spectral distributions and coefficient values in coastal (Churilova and Berseneva, 2004; Chami et al., 2005; Dmitriev et al., 2009) and open waters (Churilova et al., 2004; Berthon et al., 2008) have been demonstrated but seasonal variability in Chl-a specific phytoplankton light absorption coefficients remains not known in details. The bio-optical data measured in the deep waters of the Black Sea from 2011 to 2015 will be examined in this study.

The aim of the current research is to analyze seasonal variability of relationship between phytoplankton light absorption coefficients and chlorophyll a concentrations in upper mixed layer (UML) of the Black Sea and derive season-specific modeling parameters.

### MATERIALS AND METHODS

### Sampling

Bio-optical measurements were carried out during 7 cruises of RV "Professor Vodyanitsky" in different seasons during 2011– 2015 in the deep-water areas (deeper 100 m isobath) of Black Sea (**Table 1**, **Figure 1**). Water samples were collected at 5–7 depths within euphotic zone with 10 liter Niskin bottles of CTD/rosette system MARK-III (Neil Brown Ocean Sensors, Inc) or SBE-911plus (Sea Bird Electronics). Sampling depths were chosen based on water transparency by Secchi disc depth (Zs), as well as temperature and salinity profiles measured by CTD system. The euphotic zone (Zeu), determined as penetration depth for 1% of photosynthetically available radiance (PAR), was calculated based on the light (I) attenuation with depth (z) (Kirk, 1994):

$$I(z) = I(0) \times e^{(-K\_d \times z)},\tag{1}$$

where Kd–light attenuation coefficient on average for both euphotic layer and for visible light waves (400–700 nm). Zeu was calculated based on the Equation (1):

$$Z\_{\epsilon u} = \frac{4.6}{K\_d} \tag{2}$$

Values of K<sup>d</sup> were estimated based on the relationship between K<sup>d</sup> and Z<sup>s</sup> obtained for the Black Sea (Vedernikov, 1989). Average light intensity in the UML (PARUML) was estimated in accordance with (Babin et al., 1996):

$$PAR\_{UML} = PAR\,(0) \times \frac{\left(1 - e^{\left(-4.6 \times \frac{Z\_{\text{sum}}}{Z\_{\text{cu}}}\right)}\right)}{\left(4.6 \times \frac{Z\_{\text{uml}}}{Z\_{\text{cu}}}\right)},\tag{3}$$

where PAR(0)–PAR at the sea surface, data from Suslin et al. (2015), Zuml–UML depth was determined using temperature difference criterion (0.5◦C) and mean water temperature at 0– 3 m as reference level. Optical depth (ζ ) of Zuml was assessed using K<sup>d</sup> calculated based on Z<sup>s</sup> (Vedernikov, 1989).

### Pigment Analysis

For chlorophyll and phaeopigment concentration analysis 1– 2 L water samples were gently vacuum filtered (<25 kPa)



onto 25 mm diameter Whatman GF/F glass fiber filters. Filters were wrapped in an aluminum foil and stored in a liquid nitrogen until analysis on a laboratory. Filters were placed in 5 ml of 90% acetone in a 10 ml glass centrifuge tube, then were treated with vibration for 20 s using a vibration mixer (FALK Falc instruments, Italy), extracted at 5◦C or below, for at least 10 h and then centrifuged. The above procedure was repeated using an additional 5 ml of 90% acetone for a more complete extraction of phytoplankton pigments. The second extraction of the pigments contributed 15% on average to total concentration values. The extracts were then analyzed for pigment content by spectrophotometric method (Lorenzen, 1967; Jeffrey and Humphrey, 1975) using spectrophotometer Lambda 35 (Perkin Elmer). Proportion of non-photosynthetic pigments in the total phytoplankton pigment content (NPP) was determined in accordance with relationship between environmental light condition (PAR) and NPP proposed by Babin et al. (1996).

### Phytoplankton Light Absorption

Optical densities of particulate matter were determined by the filter pad technique ("wet filter technique") (Yentsch, 1962; Mitchell and Kiefer, 1988). aph(λ) was determined by the difference between ap(λ) and aNAP(λ):

$$a\_{\mathbb{P}^\hbar}(\lambda) = a\_{\mathbb{P}}(\lambda) - a\_{\text{NAP}}(\lambda) \tag{4}$$

Values of aph(λ) were obtained from optical densities after correction for differential scattering (setting the mean absorption between 740 and 750 nm to zero) and for the path length amplification factor, converting decimal to natural logarithms, taking into account the volume filtered and the filter area of filtration, and subtracting aNAP(λ) (Churilova, 2001). The sample optical densities were measured from 350 to 750 nm on Perkin Elmer Lambda 35 spectrophotometer equipped with an integrating sphere. aNAP(λ) values were experimentally determined using the chemical (bleaching by NaClO solution) method (Tassan and Ferrari, 1995). The path length amplification factor (beta-correction) was estimated applying the quadratic equation described by Mitchell (1990). To get Chl-a specific light absorption coefficients of phytoplankton (a ∗ ph(λ)) the values of <sup>a</sup>ph(λ) (m−<sup>1</sup> ) were divided by the sum of chlorophyll a and phaeopigments concentrations (Chl-a) (mg m−<sup>3</sup> ). Relationships between aph(λ) and Chl-a were derived by least squares fitting to power functions for visible spectral domain 400–700 nm with 1 nm resolution.

# Phytoplankton

Identification of phytoplankton species (micro- and nanosize fractions), counting of cells and cell size measurements were performed with transmission microscope Ergaval (Carl Zeiss Jena) using Naumann chamber. Water samples (2–5 L) were concentrated by inverse filtration method using nuclepore filters with 1 µm pore diameter. The concentrated samples were fixed with a solution (4% final concentration) of 25 g paraformaldehyde dissolved in 100 ml of hot (80◦C) 25% glutaraldehyde, clarified with few drops of 1 N NaOH solution. Cells were sized and its volumes were assessed using geometrical figures (sphere, ellipsoid or cylinder) corresponding to the cell shapes. Phytoplankton analysis was conducted only at selected stations in August 2011, September 2014 and December 2014, 2015.

In August 2011, September 2014 and December 2014, 2015 flow cytometric analysis was performed by flow cytometer Cytomics FC 500 (Beckman Coulter, USA) equipped with a single-phase argon laser (488 nm) (Marie et al., 1999; Schapira et al., 2010). For all detected particles phycoerythrin fluorescence emission (575 nm), and chlorophyll fluorescence emission (675 nm) were measured. The samples were fixed with formaldehyde (final concentration 2%) immediately after the sampling, then the samples were frozen in liquid nitrogen (−80◦C) and stored at −20◦C until analysis in the laboratory. The cytometer measurements were calibrated by the addition of a known concentration of the Fluorospheres Flow-CheckTM (Beckman

Coulter). Cytometric data were analyzed using CXP software (Beckman Coulter).

# RESULTS

# Chlorophyll a Concentration

Chl-a in surface layer of the deep water regions of the Black Sea were low in a summer. In the deep western part of the sea in August 2011 (**Figure 1**) Chl-a in the UML were in a range 0.15–0.30 mg·m−<sup>3</sup> . At this time, seasonal thermocline was well developed with maximum of temperature gradient 3.3 ± 1.1◦C m−<sup>1</sup> at the 12 ± 2.3 m depth, where optical depth (ζ ) was 1.5 ± 0.42 (**Table 2**). In August water transparency was high. Values of Z<sup>s</sup> and K<sup>d</sup> were 16 ± 2.1 m and 0.12 ± 0.013 m−<sup>1</sup> , correspondingly. Zeu values were 37 ± 4.0 m. Vertical Chl-a profiles were characterized by rather homogeneous Chl-a distribution within UML and deep Chl-a maximum (DCM) located near the bottom of the euphotic zone with Chl-a values 5–10 times higher than in the UML (**Figure 2**). In the surface layer of the deep eastern part of the Black Sea in September 2014 and 2015 (**Figure 1**) Chl-a values (0.21–0.35 mg·m−<sup>3</sup> ) were very similar to these measured in summer. In September maximum temperature gradient (4.3 ± 1.2◦C m−<sup>1</sup> ) and its location (9.5 ± 2.7 m with ζ = 1.1 ± 0.40) were similar to those observed in the summer. Vertical Chl-a distribution was similar to that observed in August (**Figure 2**), but with less variability in Chl-a: Chl-a concentration in the DCM was 3 times higher than in the UML in comparison with 5–10 times differences in August. Water transparency in the August and September was comparable (Z<sup>s</sup> = 16 ± 1.4 m; K<sup>d</sup> = 0.12 ± 0.0073 m−<sup>1</sup> ; Zeu = 38 ± 2.6 m).

In the western deep part of the Black Sea in the late autumn (November 2015) the seasonal thermocline was substantially destructed. It resulted in an enlargement of the UML (28 ± 3.4 m), which become ∼3 times deeper than in the summer (**Figure 2**). Chl-a in the surface layer in November 2015 varied from 0.54 to 1.4 mg m−<sup>3</sup> with less transparency (Z<sup>s</sup> = 13 ± 1.0 m; K<sup>d</sup> = 0.15 ± 0.009 m−<sup>1</sup> ) (**Table 2**). Maximum temperature gradient (1.5 ± 0.38◦C m−<sup>1</sup> ) was located at the optical depth of 4.2 ± 0.72. Consequently, Zeu (31 ± 2.0 m) was close to UML depth. Chl-a was distributed homogeneously within UML and decreased sharply in thermocline (**Figure 2**). Thus, vertical pigment distribution in November contrasted with that in summer, when seasonal thermocline divided euphotic zone into two quasi isolated layers with different environments. In fact, in late autumn phytoplankton was present in UML only. In December 2014 and 2015 in surface layer of eastern deep water part of the Black Sea (**Figure 1**) Chl-a varied from 1.0 to 2.0 mg m−<sup>3</sup> (1.3 ± 0.25 mg m−<sup>3</sup> ). UML was 32 ± 7.0 m. Vertical distribution of Chl-a was homogeneous within UML similar to Chl-a profiles observed in western waters in November 2015. In December Z<sup>s</sup> and K<sup>d</sup> were equal to November 2015 data (12 ± 2.5 m and 0.15 ± 0.023 m−<sup>1</sup> , correspondingly). Maximum temperature gradient was 0.93 ± 0.45◦C m−<sup>1</sup> and located at optical depths of 4.9 ± 1.2 (**Table 2**). Consequently, in December euphotic zone (30 ± 4.9 m) occurred within UML as it was observed in November. In both December and November phytoplankton was present within UML only.

### Phytoplankton

In August 2011 in UML of western deep waters of the Black Sea phytoplankton was dominated by dinoflagellates. Wet biomass of phytoplankton was 540 ± 310 mg·m−<sup>3</sup> on average. Assuming intracellular organic carbon (C) content at 10% of wet biomass, C to Chl-a ratio (C/Chl-a) was 145 ± 76 mg mg−<sup>1</sup> . Biomass of photosynthetic picoplankton was 2.7 ± 0.46 mg·m−<sup>3</sup> on average. The contribution of picoplankton to total phytoplankton biomass was <1%. In September 2014 phytoplankton biomass in UML was assessed at selected stations. Wet biomass was ∼450 mg m−<sup>3</sup> . The phytoplankton was dominated mainly (50–70%) by dinoflagellates Gymnodinium spp (Gymnodinium fungiforme and Gymnodinium paululum). C/Chl-a ratio was ∼110 mg mg−<sup>1</sup> . Photosynthetic picoplankton biomass was equal 1.7 ± 1.0 mg·m−<sup>3</sup> on average and its contribution to total phytoplankton biomass was < 1%. In December 2014 and 2015 wet phytoplankton biomass in UML varied from 190 to 430 mg m−<sup>3</sup> . In 2014 Proboscia alata dominated (by biomass) in phytoplankton community. In 2015 phytoplankton was represented mainly by large diatoms Pseudosolenia calcar-avis with cell volume 19000–83000 µ m<sup>3</sup> . C/Chl-a ratio was ∼25–40 mg mg−<sup>1</sup> . In December 2014 and 2015 biomass of photosynthetic picoplankton was 11.0 ± 4.9 and 13.0 ± 4.4 mg·m−<sup>3</sup> on average correspondingly, and picoplankton contribution to total phytoplankton biomass was ∼5%.

# Phytoplankton Light Absorption

Phytoplankton light absorption spectra measured in UML are presented on **Figure 3**. To examine the relationship between aph(λ) and Chl-a in the UML results were grouped into 2 datasets: (1) summer dataset that included results from August 2011, September 2014, 2015; (2) winter dataset with results from November 2014, December 2014 and 2015 (**Table 1**). September 2014, 2015 was considered part to the summer season, due to persistence of strong seasonal stratification with typical "summer" type of vertical distribution of pigments. November 2015 was considered part of winter, because of water column structure similarity to that in December 2014 and 2015. In November depths of UML and euphotic zone were close and all phytoplankton was present within UML as it was in December.

In the aph(λ) spectra two main peaks were observed: in blue (near 440 nm) and red (near 678 nm) spectrum domains (**Figure 3**). The seasonal differences in the phytoplankton light absorption were manifested in both spectral shapes and values of chlorophyll a specific coefficients. In the summer a ∗ ph(λ) were relatively high in the blue spectrum domain. Ratio between blue and red peaks (R) was 3.4 (± 0.61) on average in summer, which was significantly higher than in winter (2.2 ± 0.45) (**Figure 3**). In both winter and summer R values decreased if Chl-a increased. The variations of aph(λ) as a function of Chl-a are shown in **Figure 4** at two wavelengths (∼440 and 678 nm) corresponding to the blue and red peaks of the spectra. To describe the relationship between aph(λ) and Chl-a a power function was used (**Figure 4**):

$$a\_{ph}(\lambda) = A(\lambda) \times \text{(Chl-a)}^{\text{B(\lambda)}},\tag{5}$$

TABLE 2 | Hydrophysical characteristics: maximum temperature gradient (1T) and depth (Ztc)/optical depth (ζ ) of its location; Secchi disc depth visibility (Zs); euphotic zone (Zeu); diffuse attenuation coefficient for downwelling irradiance over the Zeu (Kd); photosynthetically availably radiance incident on the Black Sea surface [PAR(0)]and averaged over upper mixed layer [PAR(UML)].


where A(λ)–spectral coefficient, which is equal to a ∗ ph(λ) in case when Chl-a equal to 1 mg m−<sup>3</sup> .

For two data sets following fit equations were obtained (**Figure 4**):

(1) In summer:

$$a\_{\rm ph} \text{(440)} = 0.076 \times \text{(Chl-}a)^{0.84} \text{(r}^2 = 0.66\text{)} \tag{6}$$

$$a\_{\rm ph} \text{(678)} = 0.024 \times \text{ (Chl-}a)^{0.95} \text{(r}^2 = 0.63\text{)}\tag{7}$$

(2) In winter:

$$a\_{\rm pli} \text{(440) } = \ 0.045 \times \text{(Chl-a)}^{0.81} \text{(r}^2 = 0.78\text{)}\tag{8}$$

$$a\_{ph} \text{(678)} = 0.021 \times \text{(Chl-}a)^{0.95} \text{(r}^2 = 0.88\text{)}\tag{9}$$

To infer aph(λ) spectral distribution from Chl-a relationship between these parameters needs to be determined for entire visible spectrum (400–700 nm). Based on two empirical data sets the aph(λ) vs Chl-a dependencies were parameterized

using Equation (5) for summer and winter. The results of the parameterization performed from 400 to 700 nm with 1 nm spectral resolution are presented in **Figure 5** and in **Tables 3, 4**. It is evident that a ∗ ph(λ) values are higher in summer than those in winter (**Figure 3**). This seasonal difference is more pronounced in the blue spectrum domain. For summer phytoplankton the value of A(λ) coefficient at 440 nm is about twice higher than that for winter.

Photosynthetically available radiance incident on the Black Sea surface [PAR(0)] varied seasonally (Suslin et al., 2015). In August and September PAR(0) was on average 52 ± 1.2 and 38 ± 2.6 E m−<sup>2</sup> d −1 correspondingly (**Table 2**). In November and December PAR(0) was 17 ± 2.5 and 12 ± 1.7 E m−<sup>2</sup> d −1 , which were about 3 times lower than those in warm months. PAR in UML depends not only on PAR(0) but also on water transparency and ratio between ZUML and Zeu. In winter, waters were less transparent than in summer. Moreover, UML was comparable with Zeu in winter while in summer ZUML was located between first and second optical depths. As results PAR in UML differed more (∼10 times) between summer and winter in comparison with seasonal dynamics of PAR(0) (**Table 2**). PAR in UML was equal in August and September 27 ± 8.1 and 23 ± 2.9 E m−<sup>2</sup> d −1 , correspondingly (**Table 2**). In November and December PAR in UML was equal 3.9 ± 1.2 and 2.4 ± 0.8 E m−<sup>2</sup> d −1 , correspondingly (**Table 2**).

## DISCUSSION

The first optical depth which determines water leaving radiance spectral patterns (Gordon and McGlunev, 1975) detectable by remote scanners is located within the UML in the deep open waters of the Black Sea. It should be noted that K<sup>d</sup> averaged over the euphotic zone was used in our assessment. However, domination of CDOM in total light absorption in the Black Sea results in sharply decreasing K<sup>d</sup> values with depth. In the subsurface layer K<sup>d</sup> values were estimated to be ∼1.6 times higher than mean K<sup>d</sup> for euphotic zone (Churilova et al., 2009). Therefore, in our assessment Zuml was

and winter (blue lines) with comparison with data (gray lines) follow in Bricaud et al. (1995).

underestimated by using average Kd. In fact, it gives more reasonable conclusion about location of the first optical depth within UML. Consequently, bio-optical properties of the UML in the Black Sea determine remotely sensed optical signals and could be used for development and refining of the regional models of productivity indicators.

Analysis of the link between phytoplankton light absorption coefficients and chlorophyll a concentration revealed seasonal differences in UML (**Figures 3**, **4** and **Tables 3, 4**) which were related to difference of a ∗ ph(λ) values between summer and winter. The difference was more pronounced in the blue spectrum domain (**Figure 3**). In summer values of parameterization coefficient A(λ) relevant to red and blue peaks were on average 15 and 70% higher than those for winter (**Figure 5**), respectively. Seasonal differences in normalized (on Chl-a) phytoplankton light absorbance capacity were related to strong changes in environmental conditions in UML, mainly due to the averaged PAR within UML (**Figure 6**). Observed seasonal dynamics of (UML) and euphotic zone in the deep open waters of the Black Sea are consistent with intraannual changes of these parameters (Zuml and Zeu) in the Black Sea outlined earlier (Ivanov and Belokopytov, 2011; Agirbas et al., 2014).

In winter [PAR(0)] decreased but the ratio between ZUML and Zeu increased in comparison with summer. As the result in winter average light field within UML decreased in almost 10 fold in comparison with PARUML in summer. Seasonal changes of environmental conditions in UML caused ∼5–7 fold variability in C/Chl-a ratio between winter and summer. Observed C/Chl-a variability agrees with a change of intracellular concentration of chlorophyll a (MacIntyre et al., 2002; Behrenfeld et al., 2005) due to physiological acclimation of algae cultures and phytoplankton to light intensity decreased in the same range as PAR varied in the UML of the Black Sea. Intracellular pigment concentration defines degree of pigment packaging, which in turn effects on a ∗ ph(λ) (Morel and Bricaud, 1981; Bidigare et al., 1990; Hoepffner and Sathyendranath, 1991; Kirk, 1994; Fujiki and Taguchi, 2002). In the current research it was shown that a ∗ ph(λ) and C/Chl-a were significantly less in winter than values of those parameters in summer, which were relevant to "pigment packaging" effected on a ∗ ph(λ) at red peak (∼678 nm) where light quanta are absorbed by chlorophyll a and phaeopigments only (Jeffrey et al., 1997). At shorter wavelengths (in blue spectrum domain) seasonal variation in a ∗ ph(λ) was more pronounced than at red wavelengths (**Figure 3**). In the blue part of the spectrum besides chlorophyll a, other accessory pigments absorb light quanta (Bidigare et al., 1990; Jeffrey et al., 1997) which lead to "smoothing" of spectra due to accessory pigment "packaging" if Chl-a specific absorption coefficients are considered. Ratio of accessory pigment-to-Chla changes due to photoacclimation of algae (MacIntyre et al., 2002; Grant and Louda, 2010), which is related mainly to photoprotective (i.e., non-photosynthetic) pigments (NPP). In review of photoacclimation of different microalgae taxons it was shown that ∼order increase of light intensity resulted in ∼3–4 times increase of photoprotective xanthophyll to Chl-a ratio on average (Figure 9 in MacIntyre et al., 2002). Investigations of phytoplankton accessory pigments variability have demonstrated that photoprotective pigments tend to be greater in the surface low Chl-a waters at latitudes where radiance incident on the sea surface is relatively high (Stuart et al., 1998; Barlow et al., 2004; Sathyendranath et al., 2005). Variability of R (aph(440) /aph(678)) was shown to be correlated with NPP to Chl-a ratio (Lutz et al., 2003). Altogether the increase of a ∗ ph(440) is related to low Chla waters with lower intracellular concentrations of pigments, and a greater proportion of photoprotective pigments occurred in stratified, high light, nutrient-limited regions (Bricaud et al., 1995; Cleveland, 1995; Aguirre-Hernandez et al., 2004). In general these results are in a good agreement with the Black Sea observations. Although in current research pigment composition


TABLE 3 | Spectral values of the constant obtained when fitting the variations of aph(λ) vs. the (chlorophyll a + phaeopigment) concentration (Chl-a) to power laws of the form.

*(Continued)*

### TABLE 3 | Continued


*aph(*λ*)* = *A(*λ*) (Chl-a)ˆB(*λ*) and determination coefficients on the log-transformed data r<sup>2</sup> (summer).*

was not analyzed, but rough assessment of NPP (share of photoprotective pigments in total weight all pigments) based on dependence of NPP on light intensity (Babin et al., 1996) showed that NPP in UML was ∼5 times higher in summer in comparison with NPP in winter (**Figure 6**).

Seasonal phytoplankton succession observed is typical for the deep-water ecosystem of the Black Sea (Georgieva, 1993; Berseneva et al., 2004; Mikaelyan et al., 2005). In general biomass of the phytoplankton consists of Bacillariaphyceae, Dinophyceae, and Prymnesiophyceae (presented mainly by coccolithophores). Two-weekly monitoring at fixed stations in the western deepwater part of the Black Sea showed a change in phytoplankton species composition within an year (Berseneva et al., 2004): in general diatoms were dominating in winter and in yearly spring "blooms," dinoflagellates and coccolithophores were prevailing in the community in summer. Coccolithophores "bloom" in May-June.

Shift in species dominating in phytoplankton community is attributed with changes in size and shape of the cells. Cells size effects on pigment package within the cells which results in decreasing of a ∗ ph(λ) due to self-shading of pigments within large cells (Morel and Bricaud, 1981; Sosik and Mitchell, 1994; Fujiki and Taguchi, 2002). In different ocean regions variability in the a ∗ ph(λ) was related with change in phytoplankton species composition and cell size (Bricaud et al., 1995; Cleveland, 1995; Millan-Nunez et al., 2004). Package effect caused by cell size is detected by decreasing of a ∗ ph(678) (Fujiki and Taguchi, 2002) because at shorter wavelengths a ∗ ph(λ) is affected by accessory pigments as well. Values of a ∗ ph(678) decreased in winter by ∼15% compared with summer due to both C/Chl-a and phytoplankton variability (**Figures 3**, **5**) although the large diatoms (Pseudosolenia calcar-avis) were dominated in phytoplankton community.

The cells of Pseudosolenia calcar-avis have cylindrical shape unlike dinoflagellates, cells of which are closer to the ellipsoid. Volume of Pseudosolenia calcar-avis cell exceeds ∼2 orders of magnitude the volume of the dinoflagellates (Gymnodinium spp) cells. However, in the case of the cylindrically shaped cells the large volume is not critical for the cell's capacity to absorb


TABLE 4 | Spectral values of the constant obtained when fitting the variations of aph(λ) vs. the (chlorophyll a + phaeopigment) concentration (Chl-a) to power laws of the form.

*(Continued)*

### TABLE 4 | Continued


*aph(*λ*)* = *A(*λ*) (Chl-a)ˆB(*λ*) and determination coefficients on the log-transformed data r*<sup>2</sup> *(winter).*

light. The light absorption capacity of cylindrically shaped cells is determined by the diameter of their section (Kirk, 1976). Therefore, despite the difference in the cell volume optically significant size of cylindrical diatoms (10–30µm) was similar to that of dinoflagellates (10–40 µm). Consequently, in this case the effect of the size (volume) of the cells on the degree of pigment packaging is not as significant as in the case of large spherical cells (Morel and Bricaud, 1981). It explains weak (∼15%) seasonal difference in a ∗ ph (678) observed in the Black Sea (**Figure 5**).

## CONCLUSIONS

Seasonal differences in chlorophyll-a specific phytoplankton light absorption coefficients are caused by annual dynamics in environmental conditions in the (UML) and adaptive response of algae cells/population (via variation of pigment composition and concentration in the cell) and of phytoplankton community (via shift in phytoplankton species composition with attributed changes in size and shape of cell). Consequently, parameterization of the relationship between phytoplankton light absorption coefficients and chlorophyll a concentration proceeded for different seasons (summer and winter) will allow to refine the regional algorithm of Chl-a assessment based on remote sensing (Suslin and Churilova, 2016). Because in the Black Sea light absorption by dissolved organic matter there is relatively high and not correlated with phytoplankton absorption or chlorophyll a concentration regional Chl-a algorithm requires splitting of light absorption into aph(λ) and aCDM(λ) (Suslin and Churilova, 2016) and then Chl-a is retrieved from the aph(λ) at 490 nm. Relationships between Chl-a and aph(λ) obtained for the summer and winter conditions in the Black Sea differ by coefficients A(λ) in power equation, but coefficients B(λ) are practically the same (**Figure 5**). Consequently, values of A(λ) coefficient define the seasonal difference in retrieval of Chl-a based on aph(λ). Values of A(λ) at 490 nm are equal 0.048 and 0.031 m<sup>2</sup> mg−<sup>1</sup> correspondently for summer and winter conditions in UML of the Black Sea (**Tables 3, 4**). For instance, using the summer relationship between Chl-a and summer aph (490) values one can get Chl-a equal to 0.2–0.3 mg m−<sup>3</sup> , but using the winter link between these parameters or link

obtained for different regions of the world ocean (Bricaud et al., 1995) one gets Chl-a equal to 0.36–0.53 or to 0.34–0.55 mg m−<sup>3</sup> . Consequently, the retrieved Chl-a values become almost twice lower if one takes into account the Black Sea summer conditions and relevant relationship between aph(λ) and Chl-a. Undoubtedly accuracy of splitting of light absorption into aph(λ) and aCDM(λ) also affect the accuracy of Chl-a assessment (Suslin and Churilova, 2016).

Moreover, seasonal difference in links between aph(λ) and Chl-a could provide more correct assessment of downwelling radiance and primary production in the Black Sea using spectral approaches (Churilova et al., 2016). However, it should be noted that application of the obtained aph(λ) parametrization is limited by the rather narrow range of Chl-a, which was measured in the deep waters. The relatively narrow range of Chl-a caused the high values of B(λ) coefficients in comparison of those obtained based on numerical data measured in different regions of World Ocean with Chl-a covering the range 0.02–25 mg m−<sup>3</sup> (**Figure 5**; Bricaud et al., 1995).

The parameterization obtained based on bio-optical data measured in deep waters is unlikely to be correct for coastal waters. Coastal waters may differ from deep waters in nutrient availability, transparency and turbulence. These different environmental conditions would results in change of intracellular pigment concentration and phytoplankton species composition which in turn effect on a ∗ ph(λ). In this regard since 2014 bio-optical properties have been investigated in the Crimean coastal waters in different seasons. These new data will be merged with summer results measured before (Churilova and Berseneva, 2004; Dmitriev et al., 2009) and then analyzed to determine the seasonality in aph(λ) parameterization.

# AUTHOR CONTRIBUTIONS

TC, idea, management and writing. VS, mathematical proceeding and interpretation data. OK, contribution to the discussion. TE, chlorophyll and absorption measurements, calculations. NM, chlorophyll and absorption measurements, calculations. VM, phytoplankton data analysis. LS, phytoplankton species identification.

## FUNDING

The work presented in this paper was carried out in the framework of RF state task according to plan of scientific research of the AO. Kovalevsky Institute of Marine Biological Research (theme # 0828-2014-0016) and of the Marine Hydrophysical Institute (theme # 0827-2014-0011). The work was partially supported by RFBR (project 17-05-00113).

# ACKNOWLEDGMENTS

The authors are very thankful to ESA and PML for invitation us to attend the workshop "Color and Light in the Ocean (CLEO)" held at ESRIN, Frascati, Italy on 6-8 September, 2016. The authors would like to thank the reviewers for their helpful and constructive comments that greatly contributed to improving the paper. The authors thank very much ESA for covering of the article processing charges.

# REFERENCES


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer BB and handling Editor declared their shared affiliation, and the handling Editor states that the process nevertheless met the standards of a fair and objective review.

Copyright © 2017 Churilova, Suslin, Krivenko, Efimova, Moiseeva, Mukhanov and Smirnova. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Uncertainty in Ocean-Color Estimates of Chlorophyll for Phytoplankton Groups

Robert J. W. Brewin1, 2 \*, Stefano Ciavatta1, 2, Shubha Sathyendranath1, 2 , Thomas Jackson<sup>1</sup> , Gavin Tilstone<sup>1</sup> , Kieran Curran<sup>1</sup> , Ruth L. Airs <sup>1</sup> , Denise Cummings <sup>1</sup> , Vanda Brotas <sup>3</sup> , Emanuele Organelli <sup>1</sup> , Giorgio Dall'Olmo1, 2 and Dionysios E. Raitsos 1, 2

<sup>1</sup> Plymouth Marine Laboratory, Plymouth, UK, <sup>2</sup> National Centre of Earth Observation, Plymouth Marine Laboratory, Plymouth, UK, <sup>3</sup> Faculdade de Ciências, Marine and Environmental Sciences Centre, Universidade de Lisboa, Lisboa, Portugal

### Edited by:

Chris Bowler, École Normale Supérieure, France

### Reviewed by:

Ramaiah Nagappa, National Institute of Oceanography, India Salvatore Marullo, National Agency For New Technologies, Energy and Sustainable Economic Development, Italy

> \*Correspondence: Robert J. W. Brewin robr@pml.ac.uk

### Specialty section:

This article was submitted to Ocean Observation, a section of the journal Frontiers in Marine Science

Received: 13 January 2017 Accepted: 27 March 2017 Published: 13 April 2017

### Citation:

Brewin RJW, Ciavatta S, Sathyendranath S, Jackson T, Tilstone G, Curran K, Airs RL, Cummings D, Brotas V, Organelli E, Dall'Olmo G and Raitsos DE (2017) Uncertainty in Ocean-Color Estimates of Chlorophyll for Phytoplankton Groups. Front. Mar. Sci. 4:104. doi: 10.3389/fmars.2017.00104 Over the past decade, techniques have been presented to derive the community structure of phytoplankton at synoptic scales using satellite ocean-color data. There is a growing demand from the ecosystem modeling community to use these products for model evaluation and data assimilation. Yet, from the perspective of an ecosystem modeler these products are of limited use unless: (i) the phytoplankton products provided by the remote-sensing community match those required by the ecosystem modelers; and (ii) information on per-pixel uncertainty is provided to evaluate data quality. Using a large dataset collected in the North Atlantic, we re-tune a method to estimate the chlorophyll concentration of three phytoplankton groups, partitioned according to size [pico- (<2µm), nano- (2–20µm) and micro-phytoplankton (>20µm)]. The method is modified to account for the influence of sea surface temperature, also available from satellite data, on model parameters and on the partitioning of microphytoplankton into diatoms and dinoflagellates, such that the phytoplankton groups provided match those simulated in a state of the art marine ecosystem model (the European Regional Seas Ecosystem Model, ERSEM). The method is validated using another dataset, independent of the data used to parameterize the method, of more than 800 satellite and in situ match-ups. Using fuzzy-logic techniques for deriving per-pixel uncertainty, developed within the ESA Ocean Colour Climate Change Initiative (OC-CCI), the match-up dataset is used to derive the root mean square error and the bias between in situ and satellite estimates of the chlorophyll for each phytoplankton group, for 14 different optical water types (OWT). These values are then used with satellite estimates of OWTs to map uncertainty in chlorophyll on a per pixel basis for each phytoplankton group. It is envisaged these satellite products will be useful for those working on the validation of, and assimilation of data into, marine ecosystem models that simulate different phytoplankton groups.

Keywords: phytoplankton, size, function, chlorophyll, ocean-color, uncertainty

# 1. INTRODUCTION

The size structure and taxonomic composition of phytoplankton influence many processes in phytoplankton biology, marine biogeochemistry and marine ecology (Chisholm, 1992; Raven, 1998; Le Quéré et al., 2005; Marañón, 2009, 2015; Finkel et al., 2010). Photosynthesis, growth, light absorption, nutrient uptake, carbon export, and the transfer of energy through the marine food chain, are all influenced by phytoplankton community structure (Platt and Denman, 1976, 1977, 1978; Morel and Bricaud, 1981; Prieur and Sathyendranath, 1981; Probyn, 1985; Geider et al., 1986; Legendre and LeFevre, 1991; Maloney and Field, 1991; Chisholm, 1992; Sunda and Huntsman, 1997; Raven, 1998; Laws et al., 2000; Ciotti et al., 2002; Bricaud et al., 2004; Devred et al., 2006; Guidi et al., 2009; Briggs et al., 2011). In the face of considerable challenges (Shimoda and Arhonditsis, 2016), growing emphasis has been placed on the representation of biogeochemistry in ecosystem models by explicitly incorporating different phytoplankton groups as state variables, often partitioned according to their size or taxonomic composition (Aumont et al., 2003; Blackford et al., 2004; Le Quéré et al., 2005; Kishi et al., 2007; Marinov et al., 2010; Ward et al., 2012; Butenschön et al., 2016). With this aspiration comes a demand for observations on phytoplankton groups (e.g., for model validation and data assimilation) that is not being met with current in situ observations that are sparse in time and space. To address the issue of data availability, the past decade has seen many attempts to estimate phytoplankton groups using satellite remote-sensing (IOCCG, 2014), which is capable of viewing the ocean with high temporal and spatial coverage.

Current techniques to estimate phytoplankton groups using satellite data can be partitioned into three categories: spectral, abundance and ecological approaches (Nair et al., 2008; Brewin et al., 2011b; IOCCG, 2014). Spectral-based approaches seek to use the optical signatures of the phytoplankton groups directly for their detection from space. Abundance-based approaches invoke relationships between the phytoplankton groups and some index of phytoplankton abundance or biomass (e.g., chlorophyll concentration) that can be retrieved from satellites. Ecological-based approaches use ocean-color together with additional environmental data (e.g., sea surface temperature (SST), irradiance, wind) that can also be retrieved from satellite to identify ecological niches where particular phytoplankton communities may be found. Spectral-based approaches are more direct as they target known optical signatures, whereas abundance-based and ecological-based approaches are indirect, in that they use satellite remote-sensing as a means to extrapolate known relationships between the phytoplankton groups and a property that can by derived accurately from space (e.g., chlorophyll concentration, SST). Though it would appear more sensible to use a direct approach, issues with spectral-based techniques can arise when the signal-to-noise ratio in the oceancolor data is too low to detect the targeted signature (Garver et al., 1994; Wang et al., 2005), when the phytoplankton group being targeted has a similar optical signature to other groups, when the spectral signatures are not known sufficiently well, or when the spectral resolution is not adequate for detecting the target signature. In such cases, an indirect method (e.g., ecological or abundance based) would be more suitable. Future oceancolor missions will help address some of these issues through improved accuracy and spectral resolution. For instance, the recently launched Ocean and Land Color Instrument (OLCI) onboard ESA's Sentinel-3a satellite offers more spectral wavebands than its predecessor (MERIS), and NASA's planned Pre-Aerosol Clouds and ocean Ecosystem (PACE) mission will aim to provide hyperspectral ocean-color data, improving the potential for phytoplankton group retrievals. For further details on all of these methods, the reader is referred to the works of Nair et al. (2008), Brewin et al. (2011b), De Moraes Rudorff and Kampel (2012), IOCCG (2014), and Mouw et al. (2017). Recently, efforts have been made to combine abundance and ecologicalbased approaches, for instance, Brewin et al. (2015) and Ward (2015) modified the relationship between the chlorophyll concentration of the phytoplankton groups and total chlorophyll (abundance-based) according to the environmental (ecologicalbased) conditions (e.g., temperature or light availability).

Phytoplankton group-specific satellite products are now being used for the validation of (Ward et al., 2012; Hirata et al., 2013; Hashioka et al., 2013; Rousseaux et al., 2013; Vogt et al., 2013; Holt et al., 2014; de Mora et al., 2016; Laufkötter et al., 2016), or assimilation of data into (Xiao and Friedrichs, 2014), ecosystem models. However, there are two challenges that modelers face when undertaking such analyses (Bracher et al., 2017). Firstly, there is often a mismatch between phytoplankton products provided by the remote-sensing community and those required by the ecosystem modelers. These difficulties arise in cases where a phytoplankton group adopted by the ecosystem modeler has similar optical properties to other phytoplankton groups, meaning they may not be detected directly using spectralbased methods, or the phytoplankton group does not co-vary in a predictable manner with variables amenable from remotesensing, limiting abundance-based and ecological-based methods and rendering the use of satellite products difficult. Greater dialog between ecosystem modelers and the remote-sensing community is required to bridge this mismatch where feasible.

The second challenge is associating a level of uncertainty to the satellite phytoplankton group products, ideally on a per-pixel basis (per grid cell of the model). This is an essential prerequisite for both ecosystem model validation and data assimilation. If the uncertainties in the satellite products are too high they may not be useful for validation and may have little impact on a data assimilation scheme, since the target for data assimilation is to modify model simulations such that they agree with the observations within their uncertainties (e.g., Gregg et al., 2009; Ford et al., 2012; Ciavatta et al., 2014, 2016). Whereas many approaches have been proposed to derive satellite phytoplankton group products (IOCCG, 2014), few provide estimates of perpixel uncertainty.

There are two methods commonly used to estimate uncertainty in ocean-color products: error propagation, or model-based uncertainties, and comparison of satellite estimates with in situ data (validation). Error propagation typically involves propagation of errors from input to output products, knowing the uncertainties in the input and model parameters. These techniques have been used for estimating uncertainties in chlorophyll concentration and inherent optical properties (Maritorena et al., 2010; Lee et al., 2011; Werdell et al., 2013a), and for some satellite phytoplankton group products (Kostadinov et al., 2009, 2016; Roy et al., 2013; Brewin et al., 2017). In addition to estimating per-pixel uncertainty, these techniques can be very useful for understanding the sensitivity of model parameters and model inputs on the output products (Roy et al., 2013; Kostadinov et al., 2016; Brewin et al., 2017).

In a user consultation of ocean-color products, conducted as part of the ESA Ocean Colour Climate Change Initiative (OC-CCI), there seemed to be a preference from ecosystem modelers for estimates of uncertainties based on comparison with in situ data, rather than model-based uncertainties (Sathyendranath, 2011). For most techniques, satellite phytoplankton group products have been validated with in situ data (see Table 3 of Mouw et al., 2017). However, this information is typically provided as a single statistic (e.g., root mean square error), which can be difficult to convert to a per-pixel error, considering uncertainties are likely to vary with the environmental conditions and the magnitude of the product. Furthermore, the distribution of data used in validation datasets may not be an adequate representation of the spatial and temporal variability in the region under study.

To overcome these issues, Moore et al. (2001, 2009, 2012) proposed the use of an optical classification of pixels, together with fuzzy-logic statistics, to estimate per-pixel errors in satellite ocean-color products based on comparison with in situ data. In this approach, satellite and in situ match-ups are segregated into dominant optical water types (ranging from oligotrophic to turbid waters), then error statistics are computed for each dominant optical water-type. An ocean-color spectrum (at a given pixel) is then compared with all the optical water type spectra to determine its fuzzy membership. The fuzzy membership is then used to compute the error by weighting the errors in each dominant optical water type according to the fuzzy membership. This approach can, to a certain degree, overcome issues with the distribution of data used in the validation, and account for uncertainties varying with the conditions and the magnitude of the product. It has been adopted in the ESA OC-CCI project and is used to provide per-pixel errors (root mean square error and bias) for all OC-CCI products, including: chlorophyll, diffuse attenuation coefficient, and the inherent optical properties of oceanic waters. However, this approach has not been applied to satellite phytoplankton group products.

The Copernicus Marine Environment Monitoring Service (CMEMS) project "Toward Operational Size-class Chlorophyll Assimilation (TOSCA)" seeks to address these issues by: (i) providing remotely-sensed products on phytoplankton groups that map onto those simulated by the European Regional Seas Ecosystem model (ERSEM; Butenschön et al., 2016), which is the ecosystem model adopted in this project; and (ii) provide uncertainty estimates for the remotely-sensed products on a perpixel basis, based on in situ match-ups (the preferred choice for ecosystem modelers; Sathyendranath, 2011). In this paper, we retuned an abundance-based method (Brewin et al., 2010, 2015) to estimate the chlorophyll concentration of three phytoplankton groups, partitioned according to size, from satellite data in the North Atlantic. The abundance-based method was modified to account for the influence of SST (i.e., combining the method with an ecological-approach), and partition microphytoplankton into diatoms and dinoflagellates, so that the phytoplankton groups provided by the satellite approach match those simulated by ERSEM. Using an optical classification of pixels with fuzzylogic statistics (Moore et al., 2001, 2009, 2012; Jackson and Sathyendranath, 2015), we present a method for deriving perpixel uncertainty for each phytoplankton group based on a validation dataset of satellite and in situ match-ups, which is independent of the data used to parameterize the method.

# 2. METHODS

# 2.1. Study Area: The North Atlantic

The chosen study site was the North Atlantic (**Figure 1**), spanning 46◦ W to 13◦ E and 20◦ N to 66◦ N, and categorized by the CMEMS Ocean Colour Thematic Assembley Centre (OCTAC) as the Atlantic (ATL) region. This region encompasses a range of bio-optical conditions from clear, deep open-ocean waters to shallower optically-complex shelf seas. We chose this site because of two factors: (i) it is a region that has been extensively sampled over the past few decades, resulting in a relatively large number of in situ observations on phytoplankton groups when compared with other regions of the ocean; and (ii) it has been subject to many studies on marine ecosystem modeling (e.g., Holt et al., 2014). The North Atlantic is also home

FIGURE 1 | Locations of High Performance Liquid Chromatography (HPLC) and size-fractionated filtration (SFF) in situ data (<20 m depth) used in this study (CMEMS OCTAC ATL region). Background color show pixel-by-pixel correlation coefficients (r) of monthly Sea Surface Temperature (ESA SST products) and monthly average light in the mixed-layer between 2000 and 2010 [computed using Equation 11 of Brewin et al. (2015) with a monthly climatology of mixed-layer depth (de Boyer Montégut et al., 2004), monthly photosynthetic available radiation products from NASA SeaWiFS (http://oceancolor.gsfc.nasa.gov/), and K<sup>d</sup> estimated from Morel et al. (2007) using OC-CCI monthly chlorophyll products].

to one of the largest spring phytoplankton blooms on the planet (Ducklow and Harris, 1993) and is known as a major region for the biological drawdown of seawater CO<sup>2</sup> (Takahashi et al., 2002, 2009) and primary production (Tilstone et al., 2014).

### 2.2. Statistical Tests

To compare the in situ and satellite chlorophyll concentrations, we used the root mean square error (9) and bias (δ), consistent with the statistical tests adopted in the ESA OC-CCI project and used to provide per-pixel errors. The 9 and δ values were computed according to

$$\Psi = \left[\frac{1}{N} \sum\_{i=1}^{N} \left(\mathbf{X}\_i^E - \mathbf{X}\_i^M\right)^2\right]^{1/2},\tag{1}$$

and

$$\delta = \frac{1}{N} \sum\_{i=1}^{N} \left( X\_i^E - X\_i^M \right), \tag{2}$$

where X is the variable (chlorophyll concentration) and N is the number of samples. The superscript E denotes the estimated variable (e.g., satellite estimate) and M the measured variable (e.g., in situ). Note that the unbiased root mean square error (1) can be computed from 9 and δ according to 1 = (9<sup>2</sup> − δ 2 ) 1/2 . In addition we also used the Pearson linear correlation coefficient (r), to see how well estimated variables and measured variables are correlated. All statistical tests were performed in log<sup>10</sup> space, considering that the chlorophyll concentration is approximately log-normally distributed (Campbell, 1995). Definitions for all symbols used in the paper are provided in **Table 1**.

### 2.3. Data

### 2.3.1. High Performance Liquid Chromatography (HPLC) Pigment Data

A total of 2,791 samples collected in the North Atlantic region and analyzed by High Performance Liquid Chromatography (HPLC) were used in this study (**Figure 1**), spanning 1995–2014. This dataset comprised of samples from: the Atlantic Meridional Transect (AMT) cruises 1-23 (Gibb et al., 2000; Barlow et al., 2002; Aiken et al., 2009; Brewin et al., 2010; Airs and Martinez-Vicente, 2014a,b,c; Brewin et al., 2015); the GeP&CO program (Dandonneau et al., 2004); the North Atlantic bloom experiment (Werdell et al., 2003; Westberry et al., 2010); the eastern Atlantic Ocean (Brotas et al., 2013); the North Atlantic, collected by the Bedford Institute of Oceanography (Sathyendranath et al., 2001; Devred et al., 2006); the Western Channel Observatory in the English Channel (Station L4 and E1; Smyth et al., 2010); a series of UK NERC-funded research cruises (D261, D262, D264, D325, JC011, JC037, and JCR656) in the North Atlantic and North Sea (Tilstone et al., 2015); and from the NASA bio-Optical Marine Algorithm Dataset (NOMAD Version 2.0 ALPHA, Werdell and Bailey, 2005), following the removal of any AMT data so as to avoid duplication. Details of HPLC methods used can be found in the aforementioned references.

Only samples collected within the top 20 m of the water column (or within the 1st optical depth as in the case of the NASA NOMAD dataset) were used [i.e., within the surface mixedlayer depth (rarely <20 m; de Boyer Montégut et al., 2004)]. To control the quality of the pigment data, we used only HPLC data for which the total chlorophyll concentration was greater than 0.001 mg m−<sup>3</sup> (Uitz et al., 2006), and the difference between the total chlorophyll concentration and the total accessory pigments was less than 30% of the total pigment concentration (Trees et al., 2000; Aiken et al., 2009; Brewin et al., 2015).

### **2.3.1.1. Size-fractionated chlorophyll estimates from HPLC**

The fractions of total chlorophyll for the three phytoplankton size classes (Fp, Fn, and Fm, for pico-, nano-, and microplankton, respectively) were estimated following the methods of Brewin et al. (2015), adapted from Vidussi et al. (2001), Uitz et al. (2006), Brewin et al. (2010), and Devred et al. (2011). Note, whenever we refer to microplankton, nanoplankton and picoplankton, we are referring to phytoplankton. First, the total chlorophyll concentration (C) was estimated from the weighted sum of the seven diagnostic pigments, hereafter denoted Cw, according to

$$C\_{\le} = \sum\_{i=1}^{7} W\_i P\_{i\text{,}} \tag{3}$$

where, the weights are denoted [**W**], and the diagnostic pigments [**P**] = {fucoxanthin; peridinin; 19′ -hexanoyloxyfucoxanthin; 19′ -butanoyloxyfucoxanthin; alloxanthin; total chlorophyll-b; zeaxanthin}. We computed the weights [**W**] using multi-linear regression on the 2,791 samples. Retrieved values for the weights compare reasonably to values derived globally (**Table 2**), and total chlorophyll (C) and total chlorophyll estimated from Equation (3) (Cw) were in good agreement (r = 0.99, 9 = 0.10). Having derived Cw, the fractions of chlorophyll in each size class relative to the total chlorophyll concentration were estimated.

Following Brewin et al. (2015), the fraction of picoplankton chlorophyll concentration (Fp) was computed according to

$$F\_p = \begin{cases} \frac{(-12.5C + 1)W\_3P\_3}{C\_\text{w}} + \frac{\sum\_{i=6}^7 W\_i P\_i}{C\_\text{w}} & \text{if } C \le 0.08 \text{ mg} \,\text{m}^{-3} \\\frac{\sum\_{i=6}^7 W\_i P\_i}{C\_\text{w}} & \text{if } C > 0.08 \text{ mg} \,\text{m}^{-3} .\end{cases} \tag{4}$$

The fraction of nanoplankton chlorophyll concentration (Fn) was estimated by first apportioning part of the fucoxanthin pigment (P1) to the nanoplankton pool, as conducted by Devred et al. (2011), such that

$$P\_{1,n} = 10^{\{q\_1 \log\_{10}(P\_3) + q\_2 \log\_{10}(P\_4)\}},\tag{5}$$

where P<sup>3</sup> and P<sup>4</sup> refer to 19′ -hexanoyloxyfucoxanthin and 19′ -butanoyloxyfucoxanthin. This is to account for the fact that fucoxanthin is a precursor to 19′ -hexanoyloxyfucoxanthin and 19′ -butanoyloxyfucoxanthin (Devred et al., 2011). We recomputed these coefficients (q<sup>1</sup> and q2) using the 2,791 HPLC samples, and arrived at values of q<sup>1</sup> = 0.14 and q<sup>2</sup> = 1.35. For any sample where P1,<sup>n</sup> was higher than P1, then P1,<sup>n</sup> was set to equal P1. Following Brewin et al. (2015), the fraction of

### TABLE 1 | Symbols and definitions.


### TABLE 2 | Key taxonomic groups of phytoplankton, their typical size class, their category in the ERSEM model and their diagnostic pigment.


The table also shows a comparison of the weights ([W]) computed for Equation (3) using the 2791 HPLC data samples collected in this study, with weights derived from two other studies of the global ocean.

<sup>∗</sup>Total Chlorophyll-b refers to the sum of Chlorophyll-b and divinyl chlorophyll-b.

&Micro refers to cell cells >20µm, Nano cells 2–20µm and Pico cells <2µm in size.

\$Bracketed values refer to the standards deviations for each coefficient.

#Phytoplankton state variables in ERSEM model.

<sup>a</sup>Diatoms can be found in the nano size class.

<sup>b</sup>Prymnesiophytes and 19′ -hexanoyloxyfucoxanthin pigment can be found in the pico size class.

<sup>c</sup>Some chlorophytes can be found in the nanoplankton size class (Latasa et al., 2004).

<sup>d</sup>Also named microplankton in ERSEM.

<sup>e</sup>Fucoxanthin can be found in the nano size class.

nanoplankton chlorophyll concentration (Fn) was then estimated according to

$$F\_{it} = \begin{cases} \frac{12.5 C W\_3 P\_3}{C\_w} + \frac{\sum\_{i=4}^5 W\_i P\_i + W\_1 P\_{1,t}}{C\_w} & \text{if } C \le 0.08 \text{ mg} \,\text{m}^{-3} \\\frac{\sum\_{i=3}^5 W\_i P\_i + W\_1 P\_{1,t}}{C\_w} & \text{if } C > 0.08 \text{ mg} \,\text{m}^{-3} . \end{cases} \tag{6}$$

Finally, following Devred et al. (2011) and Brewin et al. (2015), the fraction of microplankton chlorophyll concentration (Fm) was estimated as

$$F\_m = \frac{\sum\_{i=1}^2 W\_i P\_i - W\_1 P\_{1,n}}{C\_w}.\tag{7}$$

Note that F<sup>m</sup> can also be computed by simply subtracting F<sup>n</sup> and F<sup>p</sup> from one. The fractions of chlorophyll in each size class were then multiplied by the corresponding HPLC-derived total chlorophyll concentration (C) to derive the size-specific chlorophyll concentrations for each sample (Cp, Cp,n, Cn, and Cp, where the subscripts "p" refers to pico-, "n" nano- and "m" microphytoplankton, and the subscript "p, n" refers to combined pico and nanophytoplankton).

### **2.3.1.2. Partitioning the fraction of microphytoplankton chlorophyll into fractions of diatoms and dinoflagellates**

The fraction of microphytoplankton chlorophyll concentration (Fm) is estimated from two diagnostic pigments, fucoxanthin in microphytoplankton (P1,m) and peridinin (P2). It is generally assumed that fucoxanthin in microphytoplankton is the primary pigment for diatoms (Stauber and Jeffrey, 1988) and peridinin for dinoflagellates, as the majority of photosynthetic dinoflagellates contain a chloroplast with peridinin as the major carotenoid (see Table 1 and Zapata et al., 2012). Following Hirata et al. (2011), this assumption was used to partition microphytoplankton chlorophyll into the concentrations of the two groups.

The fraction of microplankton diatoms to total chlorophyll (Fdiat) and the fraction of microplankton dinoflagellates to total chlorophyll (Fdino) were computed as

$$F\_{diat} = \frac{W\_1 P\_1 - W\_1 P\_{1,n}}{C\_w},\tag{8}$$

and

$$F\_{dino} = \frac{W\_2 P\_2}{C\_w},\tag{9}$$

respectively. The chlorophyll concentrations for diatoms and dinoflagellates (Cdiat and Cdino) were then obtained by multiplying the fractions by the corresponding HPLC-derived total chlorophyll concentration (C).

### 2.3.2. Size-Fractionated Filtration (SFF) Data

A total of 263 size-fractionated fluorometric chlorophyll (SFF) measurements collected previously in the North Atlantic region were also used in this study (**Figure 1**), spanning 1996–2015. This comprised of samples from: the Atlantic Meridional Transect cruises 2–23 (see Marañón et al., 2001; Serret et al., 2001; Robinson et al., 2002; Brewin et al., 2014a,b; Tilstone et al., 2017, for details); the Western Channel Observatory in the English Channel (Station L4 and E1; see Barnes et al., 2014, for details); and the NERC shelf seas biogeochemistry programme.

In all cases, ∼200–300 ml samples were sequentially filtered through 20, 2, and 0.2µm polycarbonate filters. Following filtration, pigments were extracted by storing the filters in 90% acetone at −20◦C for between 10 and 24 h. Samples were then analyzed using a Turner Design Fluorometer, pre- and postcalibrated using pure chlorophyll-a in 90% acetone as a standard. The total chlorophyll concentration was taken as the sum of the size fractions for each sample. The concentration of chlorophyll passing through the 2µm filter was designated C<sup>p</sup> (picoplankton chlorophyll), chlorophyll retained on the 20µm filter designated C<sup>m</sup> (microplankton chlorophyll) and the chlorophyll retained on the 2µm filter, having passed through the 20µm filter, designated C<sup>n</sup> (nanoplankton chlorophyll).

# 2.4. Merging of in situ Datasets

Systematic biases in size-fractionated chlorophyll estimated from HPLC pigments and from SFF have been observed in the Atlantic Ocean (Brewin et al., 2014a), with implications for models that estimate size-fractionated chlorophyll as a function of total chlorophyll (Brewin et al., 2014b) and models that estimate sizefractionated primary production (Brewin et al., 2017). Therefore, care needs to be taken when combining these two datasets. **Figure 2** shows a comparison of 31 concurrent and co-located data points of total chlorophyll (**Figure 2A**), picoplankton chlorophyll (**Figure 2B**), nanoplankton chlorophyll (**Figure 2C**) and microplankton chlorophyll (**Figure 2D**), from the HPLC and SFF dataset used here.

Despite there being biases in size-fractionated chlorophyll consistent with those observed by Brewin et al. (2014a) (**Figure 2**), these biases are notably smaller (e.g., for picoplankton chlorophyll δ = −0.07 compared with δ = −0.27 in see their Figure 3 Brewin et al. (2014a), and for nanoplankton δ = 0.15 compared with δ = 0.22), suggesting for surface waters in the North Atlantic, there is reasonable agreement between the two methods, at least for the datasets used here. Given the good agreement in **Figure 2**, the two datasets were combined into a single dataset, providing 3,054 measurements of size-fractionated chlorophyll (2,791 HPLC and 263 SFF). **Figure 3** shows a schematic diagram of how the datasets were combined and subsequently used for model parameterization and validation.

For each sample, SST data were extracted by matching each in situ sample in time (daily temporal match-up) and space (closest latitude and longitude) with daily, 1/4◦ resolution Optimal Interpolation Sea Surface Temperature (OISST) data (Version 2.0; Reynolds et al., 2007) acquired from the NOAA website (http://www.esrl.noaa.gov/psd/data/gridded/ data.noaa.oisst.v2.highres.html).

# 2.5. Partitioning Into Parameterization and Validation Datasets

The merged dataset was matched to daily, level 3 (4 km sinusoidal projected) satellite chlorophyll and optical water type (OWT) data, from version 3.0 of the Ocean Colour Climate Change Initiative (OC-CCI, a merged MERIS, MODIS-Aqua, SeaWiFS and VIIRS product available at http://www.oceancolour.org/), between 1997 and 2015. Each in situ sample was matched with a single satellite pixel in time (daily match-up) and space (closest pixel with a distance <4 km away). Of the 3,054 samples, there

FIGURE 2 | Concurrent and co-located size-fractionated chlorophyll estimated from High Performance Liquid Chromatography (HPLC) and size-fractionated filtration (SFF) for surface waters in the North Atlantic region. (A) shows a comparison of total chlorophyll (C), (B) picoplankton chlorophyll (Cp), (C) nanoplankton chlorophyll (Cn), and (D) microplankton chlorophyll (Cm). Black line represents the 1:1 line and dotted lines represent the 1:1 line ±30% log<sup>10</sup> chlorophyll. N refers to the number of samples used to compute statistics, r refers to the Pearson linear correlation coefficient, 9 the root mean square error (Equation 1) and δ the bias (Equation 2).

were 815 corresponding satellite chlorophyll and optical water type (OWT) data. These 815 measurements were set aside and used for independent validation of the satellite model and for characterizing per-pixel error, leaving 2,239 measurements that were used for model development (parameterization). **Figure 3** shows a schematic diagram of how the data were partitioned into the parameterization and validation dataset.

The OWT data provided in version 3.0 of the OC-CCI dataset contains the per-pixel membership of 14 different optical classes, ranging from oligotrophic (e.g., OWT 1) to very turbid (OWT 14) waters. Building on the work of Moore et al. (2001, 2009, 2012), this new set of optical classes were constructed for use with OC-CCI remote sensing reflectance (Rrs) spectra (Jackson and Sathyendranath, 2015). These classes were trained using Rrs spectra from satellite data, rather than using a database of in situ observations, as conducted in Moore et al. (2009), and the number of optical water classes were increased to 14, to better cover the range of Rrs spectra observed in the global oceans, particularly the oligotrophic gyres. For further details of the training and production of the 14 OWT the reader is referred to Jackson and Sathyendranath (2015).

# 2.6. Satellite Model of Phytoplankton Groups

2.6.1. Three-Component Model of Brewin et al. (2010) As a starting point, we used the three-component model of Brewin et al. (2010) to estimate the chlorophyll concentrations in three phytoplankton size classes [pico- (<2µm), nano- (2– 20µm), and micro-phytoplankton (>20µm)] as a function of total chlorophyll in the study region (**Figure 1**). This approach has been successfully tuned to the global ocean (Brewin et al., 2015; Ward, 2015) as well as different oceanic regions, including: the Atlantic Ocean (North and South; Brewin et al., 2010, 2014b; Tilstone et al., 2014); the North East Atlantic (Brotas et al., 2013); the Indian Ocean (Brewin et al., 2012b); the Western Iberian coastline (Brito et al., 2015); the Mediterranean Sea (Sammartino et al., 2015); and the South China Sea (Lin et al., 2014). Estimating size-fractionated chlorophyll from satellite data (using satellite total chlorophyll as input to the three-component model) has been tested extensively with in situ data in different oceanic regions (Brewin et al., 2010, 2012b; Lin et al., 2014; Brewin et al., 2015).

The three-component model is based on two exponential functions (Sathyendranath et al., 2001), where the chlorophyll concentration of picoplankton (Cp, cells <2µm) and combined pico- and nanoplankton (Cp,n, cells <20µm) are obtained from

$$\mathcal{C}\_{\mathcal{P}^m} = \mathcal{C}\_{\mathcal{P}^n}^m [1 - \exp(-\frac{D\_{\mathcal{P},n}}{\mathcal{C}\_{\mathcal{P},n}^m} \mathcal{C})],\tag{10}$$

and

$$C\_p = C\_p^m[1 - \exp(-\frac{D\_p}{C\_p^m}C)].\tag{11}$$

The parameters Dp,<sup>n</sup> and D<sup>p</sup> determine the fraction of total chlorophyll in the two size classes (<20µm and <2µm, respectively) as total chlorophyll tends to zero, and C m p,n and C m p are the asymptotic maximum values for the two size classes (<20µm and <2µm respectively). The chlorophyll concentration of nano-phytoplankton (Cn) and micro-phytoplankton (Cm) are simply calculated as C<sup>n</sup> = Cp,<sup>n</sup> − C<sup>p</sup> and C<sup>m</sup> = C − Cp,n.

A single set of model parameters was first derived by fitting (Equations 10 and 11) using a standard, nonlinear least-squared fitting procedure (Levenberg-Marquardt, IDL Routine MPFITFUN, Moré, 1978; Markwardt, 2008) with relative weighting (Brewin et al., 2011a). The parameters Dp,<sup>n</sup> and D<sup>p</sup> were constrained to be less than or equal to one, since size-fractionated chlorophyll cannot exceed total chlorophyll. We used the method of bootstrapping (Efron, 1979; Brewin et al., 2015) to compute a parameter distribution, and from the resulting parameter distribution, median values and 95% confidence intervals were computed (see **Table 3**). The parameters Dp,<sup>n</sup> and D<sup>p</sup> were found to be significantly different from the global parameters derived in Brewin et al. (2015) (see **Table 3**). The model was found to capture the trends in the fractions (Fp, Fn, Fp,n, and Fm) and absolute concentrations


TABLE 3 | Parameter values for Equations 10 and 11 compared with global parameters derived in Brewin et al. (2015).

\$Model parameters are computed as the median of the bootstrap parameter distribution and bracket parameter values refer to the 2.5 and 97.5% confidence intervals on the distribution. #N = Number of samples used for model parameterization

<sup>∗</sup>Denotes units in mg m−<sup>3</sup> .

(Cp, Cn, Cp,n, and Cm) of the size classes as a function of total chlorophyll for the North Atlantic parameterization dataset (**Figure 4**).

### 2.6.2. Modification of Three-Component Model Using SST

Brewin et al. (2015) and Ward (2015) have investigated the influence of light availability and SST respectively on the parameterization of the three-component model. In the North Atlantic, seasonal variations in SST and the average light in the mixed-layer are highly correlated (**Figure 1**). Therefore, considering: (i) that there is, regionally, a covariation of SST with the average light in the mixed-layer (**Figure 1**); (ii) that three inputs are required to compute the average light in the mixedlayer (photosynthetically-active radiation, diffuse attenuation and mixed-layer depth), one of which is not amenable from remote-sensing (mixed-layer depth); and (iii) that the maturity (operational use) and accuracy of SST retrievals is very high (Merchant et al., 2014), we chose to investigate the influence of SST on model parameters in the study area, similar to the study of Ward (2015) for a global dataset.

**Figure 4** illustrates the general inverse correlation between SST and total chlorophyll (r = −0.67 for SST and log10(C)), highlighting that higher fractions of smaller cells (lower fractions of large cells) are typically associated with higher SST. To investigate if SST has any influence on the parameters of the three-component model, we partitioned the parameterization data into lower temperature waters (< 15◦C) and higher temperature waters (≥ 15◦C), and fitted the model separately to the two datasets of a roughly equal number (>1,000, see **Table 3**). We observed significantly different model parameters for high and low temperature waters (see **Table 3** and **Figure 4**), suggesting a relationship between SST and model parameters. We then sorted the dataset according to SST, and conducted a running fit of the three-component model (Equations 10 and 11) as a function of SST with a bin size of 600 samples [chosen to ensure each fit had reasonable representation of observations over the entire trophic range (low to high chlorophyll)]. We used the method of bootstrapping (100 iterations) and derived median values and 95% confidence intervals on each parameter distribution (**Figure 5**).

Significant relationships between all model parameters (C m p,n , C m p , Dp,n, and Dp) and SST were observed (**Figure 5**). The relationship between SST and model parameters could be represented using a logistic function, such that C m p,n and C m <sup>p</sup> may be expressed as

$$C\_{p,n}^{m} = 1 - \{ \frac{G\_1}{1 + \exp[-G\_2(\text{SST} - G\_3)]} + G\_4 \},\tag{12}$$

and

$$C\_p^m = 1 - \{ \frac{H\_1}{1 + \exp[-H\_2(\text{SST} - H\_3)]} + H\_4 \},\tag{13}$$

where G<sup>1</sup> and G<sup>4</sup> control the upper and lower bounds of C m p,n , G<sup>2</sup> represents the slope of change in C m <sup>p</sup>,<sup>n</sup> with SST, and G<sup>3</sup> is the SST mid-point of the slope between C m p,n and SST. For C m p , H<sup>i</sup> , where i = 1–4, is analogous to G<sup>i</sup> for C m p,n . The parameter Dp,<sup>n</sup> and D<sup>p</sup> were expressed as

$$D\_{p,n} = \frac{J\_1}{1 + \exp[-J\_2(\text{SST} - J\_3)]} + f\_4,\tag{14}$$

and

$$D\_{\mathcal{P}} = \frac{K\_1}{1 + \exp[-K\_2(\text{SST} - K\_3)]} + K\_4,\tag{15}$$

where J<sup>1</sup> and J<sup>4</sup> control the upper and lower bounds of Dp,n, J<sup>2</sup> represents the slope of change in Dp,<sup>n</sup> with SST, and J<sup>3</sup> is the SST mid-point of the slope between Dp,<sup>n</sup> and SST. For Dp, K<sup>i</sup> is analogous to J<sup>i</sup> for Dp,n. The parameters for Equations (12)–(15) were fitted using a nonlinear least-squared fitting procedure (Levenberg-Marquardt) with bootstrapping, and parameter values are provided in **Table 4**. The equations are seen to capture the relationships between parameters and SST accurately (**Figure 5** and **Table 4**).

**Figure 6** shows simulations of size-fractionated chlorophyll as a function of total chlorophyll for different SST, when incorporating (Equations 12–15) into the three-component model (Equations 10 and 11). In general, the performance for all size classes improved when using the SST-dependent parameterization, when compared with that using a single set of parameters (**Figure 7**), with a significant improvement in the correlation coefficient for C<sup>p</sup> (Z-test, p < 0.05). Whereas modeled Cp,n, Cn, and C<sup>p</sup> reach static asymptotes at high concentrations when using a single set of parameters (see **Figure 7**, top-row, horizontal purple dashed lines), the SST-dependent parameterization does not, and captures the variability in the size-fractionated chlorophyll at these higher concentrations.

from Table 3), overlain. The top row (a,e,i,m) and middle-bottom row (c,g,k,o) show bivariate histogram plots with the shading indicating the number of observations (N). The bottom row (d,h,l,p) and middle-top row (b,f,j,n) show the same bivariate plots but the shading represents the median sea surface temperature (SST) of the data points that lie within the bins.

### 2.6.3. Partitioning of Microphytoplankton Chlorophyll Into Diatoms and Dinoflagellates

Considering diatoms are known to dominate the microphytoplankton community in the North Atlantic during the initiation of the spring bloom when SST is still relatively low and nutrient concentrations high (Ducklow and Harris, 1993; Sieracki et al., 1993; Savidge et al., 1995), and that dinoflagellates typically increase in late summer and early autumn (McQuatters-Gollop et al., 2007; Widdicombe et al., 2010) when SST is generally at its highest in the North Atlantic, we investigated the use of SST to partition microplankton chlorophyll (Cm) into diatoms (Cdiat) and dinoflagellates (Cdino). **Figure 8A** shows a significant relationship between ratio of Cdino to C<sup>m</sup> and SST (r = 0.28, p < 0.001), with the ratio increasing with increasing SST. We modeled this relationship by fitting a logistic function to the data (**Figure 8A**), such that

$$\frac{C\_{dino}}{C\_m} = \frac{1}{1 + \exp[-\alpha(\text{SST} - \beta)]},\tag{16}$$

where α = 0.10 (0.08↔0.13) and β = 32.5 (29.7↔36.1). **Figures 8B,C** show model estimates of Cdino (obtained by

multiplying the modeled ratio (Equation 16) by Cm) plotted against measured Cdino, and estimates of Cdiat (obtained as Cm(1 − (Cdino/Cm))) against measured Cdiat. In general, there is good agreement between the estimates and measurements, with higher correlations and lower root mean square errors for Cdiat compared with Cdino (**Figures 8B,C**). Combining estimates of C<sup>m</sup> using the three component model (Equations 10–15) with estimates of the ratio of Cdino to C<sup>m</sup> (Equation 16), Cdino and Cdiat can be estimated as a function of total chlorophyll (C) and SST.

# 2.7. Validation of the Satellite Model, Estimates of Per-Pixel Uncertainty and Application to Satellite Data

The satellite match-up dataset (not used for model parameterization) was used to validate the model by using satellite-derived total chlorophyll (OC-CCI) and SST (NOAA OISST) as inputs to Equations (10–15) and comparing the results with independent in situ chlorophyll concentrations for each phytoplankton group. In addition, the satellite match-ups were partitioned into 14 OWT by selecting the highest OWT membership for each sample. The root mean square error (9, Equation 1) and bias (δ, Equation 2) in the satellite estimates were computed separately for each OWT and for each phytoplankton group.

We applied the model to a relatively cloud-free 8-day chlorophyll (OC-CCI) and SST (NOAA OISST) composite for the data between 17th and 24th June 2008, to illustrate its application to a satellite image. Uncertainties (9 and δ) in each pixel of the study area were computed by weighing the uncertainties in each OWT by their membership. For instance, 9 at a given pixel for a hypothetical phytoplankton group would be computed as

$$\Psi = \frac{\sum\_{i=1}^{14} \Psi\_i T\_i}{\sum\_{i=1}^{14} T\_i},\tag{17}$$

where i represents each OWT and T represents the membership of each OWT.

# 3. RESULTS AND DISCUSSION

### 3.1. Satellite Validation

Considering the agreement between total satellite and in situ chlorophyll in the validation dataset (r = 0.86, 9 = 0.29, δ = −0.01), the satellite estimates of size-fractionated chlorophyll compare well with the independent in situ data (**Figure 9**, r = 0.49 to 0.86, and 9 = 0.30 to 0.45), in agreement with previous studies (Brewin et al., 2010, 2012b; Lin et al., 2014; Brewin et al., 2015). Although the SST-dependent parameterization (C SST i ) has a similar statistical performance compared with that obtained when using a single set of parameters, the SST-dependent parameterization is not constrained by static asymptotes for Cp,n, Cn, and C<sup>p</sup> (**Figure 9** top-row, horizontal purple dashed lines) and captures better the variability in the size-fractionated chlorophyll at these higher concentrations. Correlation coefficients for picoplankton chlorophyll are higher for the SST-dependent parameterization (C SST p ) when compared with the single set of parameters (Cp) in both the parameterization (**Figure 7**) and validation (**Figure 9**) datasets. This finding is consistent with results from Pan et al. (2013) who highlighted the benefits of including SST when estimating zeaxanthin (diagnostic pigment for picoplankton) from satellite data.

Satellite estimates of diatom and dinoflagellate chlorophyll also compare reasonably well with the independent in situ data (**Figure 10**). Satellite estimates of diatom chlorophyll have higher correlation coefficient (r) and lower error (9) when compared with dinoflagellate chlorophyll estimates, suggesting better performance for this phytoplankton group. High errors in satellite estimates of dinoflagellate chlorophyll reflect how challenging it is to retrieve this phytoplankton group from space (Raitsos et al., 2008; Shang et al., 2014), though it is encouraging to observe significant correlations between the satellite and in situ dinoflagellate chlorophyll concentrations (r > 0.64, p < 0.001) in the validation dataset, especially when considering the lower range of chlorophyll variability in dinoflagellates (**Figure 10**).

# 3.2. Changes in Performance With Optical Water Types (OWT)

For each of the 14 OWT and for each of the phytoplankton groups, the root mean square error (9), bias (δ) and number of observations (N) for match-ups in the validation dataset are



\$ Model parameters are computed as the median of the bootstrap parameter distribution and bracket parameter values refer to the 2.5 and 97.5% confidence intervals on the distribution. # Correlation coefficients (r) were computed using the median parameter values reported.

& p refers to the significance of each correlation (<0.001 is highly significant), computed using the correlation coefficient (r) and the number of samples (N), based on the probability that the correlation could have been produced by random data.

\*Denotes units in mg m−<sup>3</sup> .

provided in **Table 5**. The 9, δ, and N are also plotted in **Figure 11** for satellite estimates of total chlorophyll and chlorophyll for the four phytoplankton groups using the SST-dependent parameterization (Equations 12 to 15). The 9 values in each OWT for total chlorophyll are consistent with those provided in version 3.0 of the OC-CCI dataset, based on a much larger global match-up dataset (∼14,500) (**Figure 11A**). The 9 values for total chlorophyll increase from lower OWTs (characteristic of oligotrophic open-ocean waters) to higher OWTs (characteristic of more optically complex turbid coastal waters). A result that is also consistent with the original work of Moore et al. (2009), see their Table 2, and the theoretical limitations of using empirical ocean-color chlorophyll algorithms in optically-complex waters (IOCCG, 2000). Biases (δ) in total chlorophyll are generally quite low (**Figure 11B**), consistent with version 3.0 of the OC-CCI dataset, though do not always have the same sign, and are much higher for OWT14, probably due to very few match-ups (N = 4) in this class (**Figure 11B**).

Consistent with satellite estimates of total chlorophyll, there is a tendency for 9 to increase from lower to higher OWTs for all the phytoplankton groups (**Table 5**, **Figures 11C,E,G,I**), particularly for smaller cells (pico and nano-plankton) and for dinoflagellates. This is likely due to: i) the satellite estimates of total chlorophyll, which are used as input to the phytoplankton group model, having larger errors at higher OWTs (**Figure 11A**); and ii) possible deviations in the relationships between the phytoplankton groups and total chlorophyll in optically complex waters, when compared with typical open-ocean conditions. With the exception of diatoms, there is a slight tendency for the models to overestimate chlorophyll for the phytoplankton groups at higher OWTs (e.g., 8–14), as indexed by a positive bias (**Table 5**, **Figures 11F,H,J**).

# 3.3. Application of the Model to a Satellite Image

**Figure 12** illustrates the application of the phytoplankton group model (SST-dependent parameterization; Equations 10– 15) to satellite chlorophyll (OC-CCI) and SST (NOAA OISST) composites for the period 17th to 24th June 2008. Satellite products used as inputs to the model – chlorophyll (**Figure 12A**), OWT membership (plotted by dominance (highest membership) in **Figure 12B**) and SST (**Figure 12C**)—highlight the different biogeochemical areas in the region, with oligotrophic waters to the south (high SST, low total chlorophyll, low OWT), more productive waters to the north (lower SST, higher chlorophyll and OWT), and very productive coastal waters (variable SST, high chlorophyll and OWT). **Figures 12D,G,J,M**, show estimates of chlorophyll for the four phytoplankton groups,

using a single set of parameters (Table 3).

diatoms (Cdiat), dinoflagellates (Cdino), nanoplankton (Cn) and picoplankton (Cp), respectively. Picoplankton (Cp) are the dominant group in the warm oligotrophic waters, nanoplankton (Cn) in intermediate (mesotrophic waters), and diatoms (Cdiat) in the northern productive waters and coastal regions (eutrophic waters). Dinoflagellates rarely dominate (i.e., rarely have the highest chlorophyll of the four groups), but typically have higher concentrations in coastal regions.

In addition to the concentrations, per-pixel uncertainties (9 and δ) are plotted for each phytoplankton group (**Figure 12**), through application of Equation (17) on a per-pixel basis, using per-pixel OWT membership provided by the OC-CCI products and statistics from **Table 5**. In general, lower 9 is observed in the oligotrophic waters to the south of the region, with 9 increasing toward more productive waters. Dinoflagellates have the highest 9 in these productive waters, reflecting higher uncertainty in deriving the concentrations of this phytoplankton group (see also **Figure 10**). Lower 9 are seen for nano- and picoplankton, when compared with the larger size classes. Diatoms display a less variable 9 throughout the entire region, when compared with the other three phytoplankton groups.

Biases (δ) are close to zero for all phytoplankton groups in the warm oligotrophic waters (**Figure 12**), with positive biases seen for dinoflagellates, nanoplankton and picoplankton in the more productive waters, implying a slight overestimation in chlorophyll by the satellite model in these waters. These biases can be caused by two reasons: (i) biases in model input (total chlorophyll); and (ii) biases in model parameters used for partitioning total chlorophyll into the phytoplankton groups. There were no major biases (with the exception of OWT14) in total chlorophyll (model input) in the validation dataset (**Figure 11B**). Nonetheless, it is likely that the use of alternative input chlorophyll algorithms (e.g., a semi-analytical algorithm) will impact these biases. The positive biases seen for dinoflagellates, nanoplankton and picoplankton in the more productive waters are likely caused by biases in model parameters at higher OWTs. In the future, with a larger database, modifications to model parameters according to OWT could be feasible, and would likely reduce observed biases.

As well as varying within the region as illustrated in **Figure 12**, temporal variations in chlorophyll concentration and associated per-pixel errors can be captured by application of the model to satellite data over the course of the seasons.

### 3.4. Potential Caveats In the Approach 3.4.1. In situ Estimates of Phytoplankton Group Chlorophyll

The performance of a model is tightly related to the quality of data used to tune it. We used estimates of phytoplankton group chlorophyll principally from HPLC. Whereas recent refinements in the use of HPLC to infer size-fractionated chlorophyll (Uitz et al., 2006; Brewin et al., 2010; Devred et al., 2011; Brewin et al., 2015) were used, diagnostic pigments determined by HPLC can be found in a variety of phytoplankton taxa and size classes, such that its use as a single in situ method may not always be dependable (Nair et al., 2008). Therefore, we combined data on size-fractionated chlorophyll estimated from HPLC with those from SFF, which encouragingly, were found to

be in reasonable agreement with each other for surface waters in the North Atlantic region (**Figure 2**). Yet, biases between the two techniques have been observed in Atlantic waters (Brewin et al., 2014a). Uncertainties in the SFF technique can arise from filter clogging, inaccurate pore sizes and cell breakage. The partitioning of microplankton chlorophyll into diatoms and dinoflagellates was based on the assumption that fucoxanthin in microphytoplankton can be attributed to diatoms and peridinin to dinoflagellates (Equations 8 and 9). Yet, there can also be fucoxanthin-containing dinoflagellates (e.g., Kryptoperidinium foliaceum) in Atlantic waters (Kempton et al., 2002), though there occurrence is generally not well known. Greater efforts to combine other sources of in situ data (e.g., flow cytometry, video imagery, optical measurements and microscopy) should help improve, and quantify uncertainty in, estimates of phytoplankton group chlorophyll in situ and ultimately, the parameterization of satellite models.

### 3.4.2. The Satellite Phytoplankton Group Model

The conceptual framework of the Brewin et al. (2010) model has been supported by data from: phytoplankton spectral absorption measurements (Brewin et al., 2011a); spectral particle backscattering measurements (Brewin et al., 2012a); chlorophyll estimated by size-fractionated filtration (Raimbault et al., 1988; Chisholm, 1992; Riegman et al., 1993; Gin et al., 2000; Marañón et al., 2012; Brewin et al., 2014a; Ward, 2015); flow cytometry and microscopy (Brotas et al., 2013). The model has also been found to reproduce inter-annual variations in size structure consistent with theories on coupling between physical-chemical processes and ecosystem structure (Brewin et al., 2012b), and found to reproduce the typical normalized-biomass size-spectrum of phytoplankton (Brewin et al., 2014b). The model has captured relationships between size structure and total chlorophyll in a variety of contrasting regions (e.g., Lin et al., 2014; Brito et al., 2015; Sammartino et al., 2015).

Yet, as with any abundance-based method, the model does not directly detect the phytoplankton groups: it simply infers the concentrations of chlorophyll in each group based on relationships, developed using data collected in the past, with properties that can by derived accurately from space (e.g., chlorophyll concentration and sea surface temperature). The model is not expected to capture blooms that deviate from the general trends observed in the parameterization dataset (**Figures 4**,**5**). For this reason, such techniques may not be appropriate for certain applications. For instance, under a climate-change scenario, there is the possibility that the relationships between properties (e.g., total chlorophyll and group-specific chlorophyll) may change, which may not be detected using an abundance-based approach (Sathyendranath et al., submitted). For such applications, spectral-based methods are likely to be preferable.

Two versions of the re-tuned Brewin et al. (2010) model were carried forward in this study: one using a fixed set of parameters (**Table 3**); and the other where the parameters were tied with SST (**Table 4**). The Brewin et al. (2010) model with a fixed parameter set has an advantage that only four parameters are required to compute the size fractions (**Table 3**), compared with 16 that are used in the SST-dependent model (**Table 4**). A larger dataset is required to tune the SST-dependent model for regional applications, when compared with the model with a fixed parameter set. Furthermore, when considering all samples together, only a slight improvement in model performance (9 and δ) was achieved when using the SST-dependent model (**Figures 7**, **9**). Yet, the SST-dependent model captured variations in model parameters, such as the asymptotic maximum values for small cells (C m p,n and C m p ), that are known to vary with changes in bottom-up (e.g., nutrients and light) and top-down (grazing) processes (Riegman et al., 1993; Brewin et al., 2014b). The

fixed parameter model simply failed to capture these variations, resulting in unrealistic static asymptotes (**Figures 7**, **9** top-row, horizontal purple dashed lines).

Variations in the relationships of size structure with total chlorophyll and with SST were generally consistent with those proposed by Ward (2015), with the fractions of larger cells (e.g., microplankton) generally increasing with decreasing SST, for concentrations of total chlorophyll less than 1 mg m−<sup>3</sup> , and the fractions of small cells (picoplankton) increasing (**Figure 6**). Yet, in the Ward (2015) study these variations were typically observed at lower temperature (<5◦C) than those shown in this study (<17◦C). Results are also relatively consistent for small cells (picoplankton) with those proposed by Brewin et al. (2015), when using average light in the mixed-layer, rather than SST, to vary model parameters, though differ for microplankton (see Figures 4, 5 of Brewin et al., 2015). Differences between studies are possibly due to the regional-tuning of the model when compared with the global studies of Ward (2015) and Brewin et al. (2015). There are also differences in the two approaches: whereas Ward (2015) introduces an additional term to the three-component model to account for temperature dependence, here we have let the model parameters change in response to SST variation.

Motivated by the need to provide satellite products of phytoplankton groups that match those as defined in ecosystem models, particularly ERSEM (**Table 2**), we proposed a partitioning of microplankton chlorophyll (Cm) into diatoms (Cdiat) and dinoflagellates (Cdino), by modeling the ratio of Cdino to C<sup>m</sup> as a function of SST (**Figure 8A**). This differs to that proposed by Hirata et al. (2011) which is based solely on total chlorophyll. We observed a significant relationship between Cdino/C<sup>m</sup> and SST that was consistent with known seasonal variations of the two phytoplankton groups in the region (McQuatters-Gollop et al., 2007; Widdicombe et al., 2010). Yet, there still are significant variations surrounding this relationship (**Figure 8A**), and Cdino was found to have the highest errors in the satellite model (**Figures 11**, **12**). The approach may fail to capture blooms of microplankton chlorophyll (Cm) entirely dominated by dinoflagellates (**Figure 8A**), that can occur in the region (Widdicombe et al., 2010). Future improvements in Cdino satellite estimates may be possible by incorporating spectral information (Shang et al., 2014) or other environmental data (Raitsos et al., 2008). Such improvements may significantly aid ecosystem models considering the difficulties in modeling this group due to their motility and complex trophic behavior (Ciavatta et al., 2011).

### 3.4.3. Per-Pixel Uncertainties

In-line with methods used in the OC-CCI project (Jackson and Sathyendranath, 2015), our satellite estimates of the chlorophyll concentration of each phytoplankton group come with perpixel uncertainty (**Figure 12**), an essential requirement for use in many applications, such as ecosystem model validation, data assimilation and quantifying evidence of trends in a time-series. Yet, estimates of uncertainty we provide are based on the assumption that the in situ data is the truth. As discussed in the previous section, in situ measurements of phytoplankton group chlorophyll also have their uncertainties, which are difficult to quantify (Brewin et al., 2014a). In addition, the estimates of

uncertainty are based on comparisons of co-incident discrete in situ point measurements, representing volumes of sea water of the order of 5 litres or less, with 4 km satellite pixels representing a signal from ∼16 × 10<sup>10</sup> litres of water, assuming a 10 m optical depth. Additional uncertainties can occur because of vast differences in the temporal scales associated with the two types of measurements. In the future, such uncertainties may be reduced with the aid of new in situ methods capable of continuously measuring the optical and biogeochemical properties of the water (Dall'Olmo et al., 2012; Boss et al., 2013; Chase et al., 2013; Werdell et al., 2013b; Brewin et al., 2016).

By computing uncertainty statistics for each OWT, we can overcome issues with the distribution of data used in the validation. For instance, in our validation dataset, the majority of samples came from three OWTs (10, 11, and 12, see **Figure 11**), yet in the satellite image (**Figure 12B**), the majority of the region is dominated by OWTs less than 10. If one were to consider a single value of any statistical metric (as provided in **Figures 9**, **10**) as representative of the uncertainty in the entire satellite data, it would not be well representative of the majority of the region. Yet, as the number of samples in each OWT vary, so does our confidence in the error statistics for each OWT. Some OWTs (e.g., 1, 2, and 14) have very few observations (**Table 5**), and consequently we have low confidence in the uncertainty estimates for these OWTs.


TABLE 5 | Root mean square error (

9) and bias (δ) for 14 OC-CCI optical water types (OWT) for the four

phytoplankton

 groups, using the two approaches

(SST-dependent

 with superscript

 SST, and

# 4. SUMMARY

We re-tuned an abundance-based model (Brewin et al., 2010, 2015) for estimating the chlorophyll concentration of three phytoplankton size classes as a function of total chlorophyll (available from satellite data) in the North Atlantic region using a large dataset of size-fractionated chlorophyll measurements. The model was modified to account for the influence of sea surface temperature (SST, also available from satellite data) on model parameters, and on the partitioning of chlorophyll in

large phytoplankton (microphytoplankton) into diatoms and dinoflagellates, so that the phytoplankton groups provided matched those used in a marine ecosystem model (ERSEM). Results indicate that in the North Atlantic: (i) the relationship between size-fractionated chlorophyll and total chlorophyll changes with the environmental conditions (SST); and (ii) the ratio of dinoflagellate chlorophyll to microplankton chlorophyll increases with SST.

Application of the method to satellite estimates of total chlorophyll and SST was validated using an independent dataset of satellite and in situ match-ups. This dataset was used with information on the optical water type, based on fuzzy-logic statistics developed within the ESA OC-CCI project, to derive uncertainties in 14 different optical water types, which were then used to map uncertainties in chlorophyll on a per-pixel basis for each phytoplankton group in a satellite image. These satellite products will be useful for those evaluating the performance of the ERSEM model and assimilating chlorophyll for each phytoplankton group into ERSEM in research and operational applications. Such an approach could be extended to other ecosystem models that simulate phytoplankton functional groups in the oceans.

# AUTHOR CONTRIBUTIONS

RB synthesized the data, re-tuned and further-developed the algorithm, organized, prepared and wrote the first version of the manuscript, and prepared all figures and tables. SC, SS, TJ, EO, GD, and DR contributed to the intellectual development of the algorithms, and GT, KC, RA, DC, and VB collected and processed parts of the datasets used in the paper. All authors contributed to the final version of the manuscript.

### FUNDING

This work has been carried out as part of the Copernicus Marine Environment Monitoring Service (CMEMS) project "Toward

## REFERENCES


Operational Size-class Chlorophyll Assimilation (TOSCA)." CMEMS is implemented by MERCATOR OCEAN in the framework of a delegation agreement with the European Union. This work was also supported by the UK National Centre for Earth Observation (NCEO). Additional support from the Ocean Colour Component of the Climate Change Initiative of the European Space Agency (ESA) is gratefully acknowledged. Data collection by GT was supported by NERC-UK ECOMAR (grant no: NE/C513018/1). We thank ESA for covering publication costs.

### ACKNOWLEDGMENTS

The authors would like to acknowledge all scientists and crew involved in the collection of the in situ data used in this manuscript, without which this work would not have been feasible. We owe a debit of gratitude to all those involved in data collection. AMT data were funded through the UK Natural Environment Research Council, through the UK marine research institutes' strategic research programme Oceans 2025 awarded to PML and the National Oceanography Centre. The authors would like to thank European Space Agency (ESA) for CCI data used, NOAA for the OISST products, and NASA for MODIS-Aqua and SeaWiFS products used. This is a contribution to MARE - UID/MAR/04292/2013, the Ocean Colour Climate Change Initiative of ESA and contribution number 311 of the AMT programme.

Oceans expedition. Methods Oceanogr. 7, 52–62. doi: 10.1016/j.mio.2013. 11.002


profile data and a profile-based climatology. J. Geophys. Res. 109:C12003. doi: 10.1029/2004JC002378


International Ocean-Colour Coordinating Group, No. 3, IOCCG, Dartmouth, NS.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Brewin, Ciavatta, Sathyendranath, Jackson, Tilstone, Curran, Airs, Cummings, Brotas, Organelli, Dall'Olmo and Raitsos. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Impact of El Niño Variability on Oceanic Phytoplankton

Marie-Fanny Racault 1, 2 \*, Shubha Sathyendranath1, 2, Robert J. W. Brewin1, 2 , Dionysios E. Raitsos 1, 2, Thomas Jackson<sup>1</sup> and Trevor Platt <sup>1</sup>

<sup>1</sup> Plymouth Marine Laboratory, Plymouth, UK, <sup>2</sup> Plymouth Marine Laboratory, National Centre for Earth Observation, Plymouth, UK

Oceanic phytoplankton respond rapidly to a complex spectrum of climate-driven perturbations, confounding attempts to isolate the principal causes of observed changes. A dominant mode of variability in the Earth-climate system is that generated by the El Niño phenomenon. Marked variations are observed in the centroid of anomalous warming in the Equatorial Pacific under El Niño, associated with quite different alterations in environmental and biological properties. Here, using observational and reanalysis datasets, we differentiate the regional physical forcing mechanisms, and compile a global atlas of associated impacts on oceanic phytoplankton caused by two extreme types of El Niño. We find robust evidence that during Eastern Pacific (EP) and Central Pacific (CP) types of El Niño, impacts on phytoplankton can be felt everywhere, but tend to be greatest in the tropics and subtropics, encompassing up to 67% of the total affected areas, with the remaining 33% being areas located in high-latitudes. Our analysis also highlights considerable and sometimes opposing regional effects. During EP El Niño, we estimate decreases of −56 TgC/y in the tropical eastern Pacific Ocean, and −82 TgC/y in the western Indian Ocean, and increase of +13 TgC/y in eastern Indian Ocean, whereas during CP El Niño, we estimate decreases −68 TgC/y in the tropical western Pacific Ocean and −10 TgC/y in the central Atlantic Ocean. We advocate that analysis of the dominant mechanisms forcing the biophysical under El Niño variability may provide a useful guide to improve our understanding of projected changes in the marine ecosystem in a warming climate and support development of adaptation and mitigation plans.

Keywords: El Niño variability, ENSO, climate, ocean-color, ESA climate change initiative, phytoplankton

# INTRODUCTION

Phytoplankton, the microscopic vegetal cells living at the surface of the oceans, yield globally and annually some fifty billion tons of organic carbon through primary production (Longhurst et al., 1995), contributing to the oceanic uptake of ∼25% of the carbon dioxide (CO2) emitted to the atmosphere every year (Le Quéré et al., 2015). The rates of primary production are not uniformly distributed across the ocean domain: the most highly productive oceanic regions are found at high-latitudes and in coastal upwelling systems. Oceanic primary producers are under the control of physical forcing on a broad spectrum of scales, and the forcing will be modified under climate change. In the latest assessment report (AR5), the Intergovernmental Panel on Climate Change (IPCC) has recognized "medium evidence" for the response of the highly productive oceanic regions to recent warming (especially since the 1970s) and "low confidence"

### Edited by:

Laura Lorenzoni, University of South Florida, USA

### Reviewed by:

Monique Messié, Monterey Bay Aquarium Research Institute, USA Peter Allan Thompson, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Australia

> \*Correspondence: Marie-Fanny Racault mfrt@pml.ac.uk

### Specialty section:

This article was submitted to Ocean Observation, a section of the journal Frontiers in Marine Science

Received: 17 January 2017 Accepted: 20 April 2017 Published: 08 May 2017

### Citation:

Racault M-F, Sathyendranath S, Brewin RJW, Raitsos DE, Jackson T and Platt T (2017) Impact of El Niño Variability on Oceanic Phytoplankton. Front. Mar. Sci. 4:133. doi: 10.3389/fmars.2017.00133 in the understanding of how equatorial upwelling systems might change in response to El Niño variability (Hoegh-Guldberg et al., 2014).

El Niño activity is characterized by anomalous warming of Sea Surface Temperature (SST) in the tropical Pacific, linked to a perturbation of atmospheric circulation patterns known as the Southern Oscillation. This ocean-atmosphere coupling, called the El Niño Southern Oscillation (ENSO), is a dominant mode of variability in the Earth-climate system with a typical frequency of 2–7 years (McPhaden et al., 2006; Cai et al., 2015). Each El Niño event is unique, exhibiting differences in surface and subsurface water temperature amplitude, duration, and spatial patterns (Capotondi et al., 2015). As an aid to classify El Niño events, the location of maximum anomalous SST warming observed during boreal winter has been used to delineate two extreme types of El Niño (Trenberth and Stepaniak, 2001; Larkin and Harrison, 2005; Ashok et al., 2007; Yu and Kao, 2007; Ashok and Yamagata, 2009; Kao and Yu, 2009; Kug et al., 2009; Lee and McPhaden, 2010; Takahashi et al., 2011; Cai et al., 2015; Capotondi et al., 2015). The Eastern-Pacific (EP) El Niño, also referred to as the "typical" or canonical El Niño, is characterized by maximum anomalous SST warming in the eastern tropical Pacific. In contrast, the Central-Pacific (CP) El Niño, variously referred to as El Niño Modoki (a Japanese word meaning pseudo), warm-pool El Niño, or dateline El Niño, is characterized by weak anomalous SST warming along the western coast of South America and maximum anomalous SST warming in the central tropical Pacific. The climatic perturbations generated by these two types of El Niño are induced through different atmospheric teleconnections: both have been associated with changes in temperature and rainfall patterns over the continental U.S. (Yu et al., 2012; Yu and Zou, 2013), in storm tracks in the Southern Hemisphere (Kao and Yu, 2009) and in cyclone trajectories in the North Atlantic (Kim et al., 2009). Since the beginning of the 1900s, two most extreme EP and CP El Niño events (in terms of amplitude of maximum SST anomalies in the Eastern and Central Pacific regions) have occurred within the last 20 years, in 1997/1998 and 2009/2010 respectively (Capotondi et al., 2015). The latest 2015/2016 El Niño has been reported with comparable magnitude of SST anomalies to the 1982/1983 and 1997/1998 events but with more limited intensity in the Eastern Pacific region (Paek et al., 2017; L'Heureux et al., in press).

Contrasting influence of the extreme El Niño events of 1997/1998 and 2009/2010 on oceanic phytoplankton has been characterized in the tropical Pacific domain (Gierach et al., 2012; Radenac et al., 2012). In this region, ENSO is recognized as the main driver of inter-annual phytoplankton variability. Tropical, as well as extra-tropical, influences of ENSO and ENSO Modoki have been demonstrated using statistical analyses based on a range of indices applied to ocean-color remotesensing observations (Yoder and Kennelly, 2003; Behrenfeld et al., 2006; Chavez et al., 2011; Vantrepotte and Mélin, 2011; Messié and Chavez, 2012, 2013; Racault et al., 2012, 2017; Couto et al., 2013; Raitsos et al., 2015). One of the most widely-used environmental indices to characterize influence of ENSO on ocean biology is the Multivariate ENSO Index (MEI). This index is based on combined analysis of fields of sea level pressure, surface winds, SST, surface air temperature, and cloudiness for the entire Tropical Pacific domain (Wolter and Timlin, 1993). Due to this broad domain and multivariate statistical approach, the MEI encompasses the whole continuum of ENSO events (from most extreme SST anomalies located in the Central to Eastern Pacific regions), but as a result, the index does not allow us to separate effects of EP and CP El Niño variations. Separating EP and CP El Niño variations requires indices isolating the centroid of El Niño activity along the equatorial Pacific.

To date, the longitudinal position of the center of maximum SST anomalies has been delineated in the Niño1+2 (0◦–10◦ S, 90◦–80◦W), Niño3 (5◦–5◦N, 150◦W–90◦W), Niño3.4 (5◦N–5◦ S, 170◦W–120◦W) and Niño4 (5◦–5◦N, 160◦E–150◦W) regions (Ashok et al., 2007). Based on analyses of the SST anomalies variations in these regions, a range of El Niño indices have been constructed. In the present study, the EP and CP El Niño signals are characterized using the EP and CP index defined by Kao and Yu (2009). The EP index was calculated by first applying regression analysis of SST anomalies onto the Niño4 index (average SST anomalies over the Niño4 region) to remove the influence of the SST anomaly component associated with central Pacific warming, and then using Empirical Orthogonal Function (EOF) analysis to determine the spatial patterns and associated temporal index of EP events. Similarly, the CP index was calculated by applying regression analysis of SST anomalies onto the Niño1+2 index to remove the influence of the SST anomaly component associated with east Pacific warming, and then using EOF analysis to characterize pattern and index of CP events (Kao and Yu, 2009; Yu et al., 2012). Using partial correlation and EOF analyses, the characterization of the canonical El Niño and El Niño Modoki signals has also been achieved based on SST anomalies from the Niño3 region and by differentiating the influence of SST anomalies variations from a combination of regions in the tropical Pacific. The specific combination and definition of regions have been formulated as the Trans-Nino Index TNI (Trenberth and Stepaniak, 2001), the El Niño Modoki Index EMI (Ashok et al., 2007), and the Improved EMI (Li et al., 2010). A comprehensive review of El Niño indices definition is presented by Capotondi et al. (2015).

The main obstacles to distinguishing the ecosystem effects associated with El Niño variability may be summarized to arise from: (i) the challenges to construct continuous, synoptic-scale, long-term time-series of marine ecosystem state at high temporal and spatial resolutions for the global oceans (Sathyendranath and Krasemann, 2014); (ii) the diversity and complexity of the mechanisms driving the biophysical interactions in different oceanic sub-regions or provinces (Longhurst et al., 1995; Boyd et al., 2014); (iii) the difficulties to elucidate the roles of the local and remote-forcing mechanisms associated with different El Niño events (Cai et al., 2015; Capotondi et al., 2015); (iv) the issues of lag in the transmission of El Niños influences at higher latitudes and to other basins via different teleconnection mechanisms (Ashok et al., 2007; Couto et al., 2013); and finally (v) the broad ranges of meridional position, amplitude and evolution of SST anomalies observed during different El Niño events, which militate against consensus in the choice of method to estimate indices of El Niño variability (Capotondi et al., 2015). Here, we propose an original approach that overcome some of these obstacles based on climate-quality ocean-color products and state-of-the-art reanalysis datasets, and allow us to establish an atlas of the impact of CP and EP types of El Niño on primary producers in the global oceans. Finally, we document the associated environmental changes and identify the dominant mechanisms driving the diverse biophysical interactions involved.

# MATERIALS AND METHODS

The list of biological and physical datasets obtained for the analyses is summarized in **Table 1**. Based on datasets availability, two periods of study have been considered: (1) 1997–2012 (15 years) for biological and physical datasets, and (2) 1979–2014 (35 years) for physical datasets only.

### Biological Datasets Remotely-Sensed Chlorophyll Concentration and Associated Uncertainty Estimates

Chlorophyll is at the heart of primary production, it is the state variable used in photosynthesis-irradiance models to compute primary production; it has a distinct optical signature which makes it one of the easiest phytoplankton properties to measure, both by in-situ and satellite methods. Ocean-color sensors on satellites provide estimates of chlorophyll concentration at high spatial and temporal resolution and at global scale. Because they provide data consistently and frequently and over long periods of time, they are suitable for computations of certain ecological indicators and for studying long-term trends in the state of the marine ecosystem (Platt and Sathyendranath, 2008; Racault et al., 2014). However, ocean-color sensors do have a finite lifespan, and differences in instrument design and algorithms make it difficult to compare data from multiple sensors. When overlapping data are available from two or more sensors, such data can be used to establish inter-sensor bias and correct for it. Recently, under the European Space Agency (ESA) Climate Change Initiative (CCI), the ocean-color project (http://www.esa-oceancolour-cci.org) has produced new, improved products, merging observations from the Sea-viewing Wide Field-of-View Sensor (SeaWiFS, 1997–2010), the Moderate Resolution Imaging Spectroradiometer (MODIS, 2002-present) and the MEdium Resolution Imaging Spectrometer (MERIS, 2002–2012) to provide a 15-year (1997–2012 OC-CCIv2) global scale, climate-quality controlled, bias-corrected, and errorcharacterized data record of ocean-color (Sathyendranath and Krasemann, 2014). Furthermore, implementation of the coupled ocean-atmosphere POLYMER correction algorithm (MERIS period; Steinmetz et al., 2011) has increased significantly the coverage of chlorophyll observations (Sathyendranath and Krasemann, 2014; Racault et al., 2015).

The OC-CCI v2.0 Level 3 Mapped data of chlorophyll concentration, root-mean-square-difference (RMSD) and bias estimates of monthly log-transformed (base 10) Chl, were obtained at 4 km spatial resolution, and monthly temporal resolution from the ESA CCI Ocean Color website at http://www. esa-oceancolour-cci.org. To remain coherent with the resolution of the different datasets used in the analyses (**Table 1**), the chlorophyll concentration was mapped onto a 1◦ × 1 ◦ regular grid by averaging all available data points within each new, larger pixel. Standard deviation of chlorophyll product (computed from bias and RMSD) was calculated by aggregating pixel values at the same spatial (1◦ × 1 ◦ ) resolution. Then, the standard deviation of the log10Chl was converted to its untransformed value. The standard error in the reported mean value at each pixel for the period 1997–2012 was computed, and values ranging between ±1% in the tropical gyres (associated with low chlorophyll concentration) and below ±0.5% at higher latitudes (associated with high chlorophyll concentration) were observed (Supplementary Figure 1). Uncertainties in the anomalies of relative (%) changes in chlorophyll in selected box areas were calculated using the standard methods for computing propagation of errors (Topping, 1972).

### Remotely-Sensed Primary Production (PP)

Global observations of water-column PP were obtained from the Open Ocean Transboundary Water Assessment Programme (TWAP) using the algorithm of Platt and Sathyendranath (1988), with OC-CCI v2.0 Chlorophyll, SeaWiFS and MODIS spectrallyresolved light (i.e., PAR) as inputs. The model parameters (i.e., vertical structure of Chlorophyll and the photosynthesisirradiance parameters) are assigned following the partitioning of the ocean into biogeographic provinces (Longhurst, 1998). The TWAP primary production estimates have been shown to compare consistently well with other global ocean primary production models (Longhurst et al., 1995; Antoine et al., 1996; Behrenfeld et al., 2005). The PP data have been obtained at 9 km spatial resolution, and monthly temporal resolution from https://www.oceancolour.org/thredds/catalog/ TWAP-PProd/catalog.html. The PP data were regridded to 1◦ × 1 ◦ spatial resolution by averaging all available data points within each new, larger pixel.

## Physical Datasets

### Remotely-Sensed Photosynthetically Active Radiation (PAR)

The Level 3 Mapped data of PAR, collected during the SeaWiFS and MODIS missions (Frouin et al., 2012), were obtained at 9 km spatial resolution and monthly resolution from the NASA website at http://oceancolor.gsfc.nasa.gov/cms/. The PAR data were regridded to 1◦ × 1 ◦ spatial resolution by averaging all available data points within each new, larger pixel.

### Remotely-Sensed Sea Surface Temperature (SST)

The sea surface temperature SST-CCI vexp1.2 Mapped gap-filled daily blend of the Advanced Very High Resolution Radiometers and the Along-Track Scanning Radiometers data (Merchant et al., 2014) were obtained at 1◦ × 1 ◦ spatial resolution, and monthly temporal resolution from the ESA-CCI National Centre for Earth Observation (NCEO) portal at http://gws-access.ceda. ac.uk/public2/nceo\_uor/sst/L3S/EXP1.2/.

### TABLE 1 | Datasets obtained for the analysis.


Information about the regridding procedure and selected period of study are provided in the Materials and Methods section in the manuscript. Res, Spatial Resolution.

### Remotely-Sensed Sea Level (SL)

The sea level SL-CCI v1.1 Mapped gap-filled blend of the Topex/Poseidon, Jason-1/2 with the ERS-1/2 and Envisat missions data (Ablain et al., 2015) were obtained at 0.25◦ × 0.25◦ spatial resolution, and monthly temporal resolution from the ESA SL-CCI website at http://www.esa-sealevel-cci.org.

### Reanalysis Products of SST and Wind

ERA Interim reanalysis of monthly 10 m U wind component, 10 m V wind component, 10 m wind speed (Dee et al., 2011) were obtained on 0.75◦ × 0.75◦ global grid-box from ECMWF at http://apps.ecmwf.int/datasets/data/interim\_full\_moda/.

### Reanalysis Product of Surface Air Temperature (SAT)

NCEP/NCAR reanalysis of monthly surface air temperature (Kalnay et al., 1996) were obtained on 2.5◦ × 2.5◦ global gridbox from National Oceanic Atmospheric Administration/Office of Oceanic and Atmospheric Research/Earth System research Laboratory at http://www.esrl.noaa.gov/psd/data/gridded/data. ncep.reanalysis.surface.html. This dataset was chosen to be consistent with the data used by Yu et al. (2012) in their analysis on El Niño impact on U.S. winter air surface temperature.

### Reanalysis Product of Precipitation

CPC Merged Analysis of Precipitation (CMAP) interpolated data (Xie and Arkin, 1997) were obtained on 2.5◦ × 2.5◦ global grid-box at monthly temporal resolution from National Oceanic Atmospheric Administration/Office of Oceanic and Atmospheric Research/Earth System research Laboratory at http://www.esrl.noaa.gov/psd/data/gridded/data.cmap.html.

### Mixed Layer Depth (MLD)

The mixed layer depth (MLD) was estimated as the shallowest depth at which ±0.2◦C change is observed compared with the temperature at 10 m depth, based on the temperature criterion of de Boyer Montégut et al. (2004). The vertical profiles of temperature were downloaded from Simple Ocean Data Assimilation (SODA) model output v2p2p4 http://iridl.ldeo. columbia.edu/SOURCES/.CARTON-GIESE/.SODA/.v2p2p4/ at monthly temporal resolution and 0.25◦ × 0.4◦ × 40-level spatial and vertical resolutions (Carton and Giese, 2008). The data were regridded to 1◦ × 1 ◦ spatial resolution by averaging all available data points within each new, larger pixel.

### Zonal Surface Currents

Annual average zonal surface currents data were obtained at 1 ◦ × 1 ◦ resolution for the global oceans from NOAA Ocean Surface Current Analyses Real-time (OSCAR) at http://www.esr. org/oscar\_index.html.

### Nitrate Concentration

Annual average surface nitrate concentration data were obtained at 1◦ × 1 ◦ resolution for the global oceans from the World Ocean Atlas Climatology (Boyer et al., 2013) at https://www.nodc.noaa.gov/OC5/woa13/.

### Climate Impact Analysis Climate Indices

Time-series of MEI based on principal component analysis of six atmosphere-ocean variable fields in the tropical Pacific basin i.e., SL, SST, SAT, U, and V wind components, and total cloudiness fraction of the sky (Wolter and Timlin, 1993) were obtained at http://www.esrl.noaa.gov/psd/enso/mei/table.html. Time-series of Eastern Pacific and Central Pacific El Niño indices based a combination of regression and empirical orthogonal function analyses applied to SST data in the tropical Pacific (Kao and Yu, 2009) were obtained at http://www.ess.uci.edu/∼yu/2OSC/.

### Statistical Analysis

The influences of EP and CP El Niño events may propagate across the world at different speed through different teleconnection mechanisms (Ashok et al., 2007). To avoid implementing impact analyses on monthly anomalies, which would involve different lag-coefficients, the influence of the different types of El Niño is characterized based on annual mean anomalies. Anomalies of physical and biological variables were computed first by removing the monthly mean climatology over the period 1997– 2012. Then, annual mean anomalies were calculated by averaging the monthly anomalies over the periods from June (of year t) to May (of year t + 1) (i.e., spanning over two calendar years). This 12-month delineation period was chosen to follow the seasonality of ENSO activity, which generally peaks in the month of November to January (i.e., higher SST anomalies in the Equatorial Pacific, Niño 1–4 regions). Because chlorophyll concentrations can span three orders of magnitude, relative percent differences in chlorophyll were calculated such as:

$$\mathcal{C}\_{\text{ra}} = \left(\text{Ct} - \bar{\text{C}}\right) / \left(\left(\text{C}\_{t} + \bar{\text{C}}\right) / 2\right) \* 100\text{t}$$

where Cra is relative chlorophyll anomalies in percent, C<sup>t</sup> is annual chlorophyll concentration in mg.m−<sup>3</sup> in year t, and C¯ is the mean of annual chlorophyll concentrations over the 15 years study period. Note that the results of the EP and CP impacts (see method below) were not sensitive to the choice of normalization function.

The climate impact analysis to identify the oceanic regions that are most sensitive to El Niño variability is based on a statistical approach initially developed in a study of El Niño impact on U.S. winter air surface temperature using EP and CP indices (Yu et al., 2012). In the present study, the global and regional influences associated with each type of El Niño are extracted by separately regressing at each 1◦ × 1 ◦ grid point the EP and CP El Niño indices with: (a) annual mean anomalies of chlorophyll concentration time-series (as a key measure of phytoplankton population), and (b) annual mean anomalies of primary production (i.e., the rate of phytoplankton growth) time-series. To identify the mechanisms driving the regionallydifferent biological responses associated with each type of El Niño, we further applied the statistical analysis based on the EP and CP indices, to annual mean anomalies of SAT, SST, SL, wind, and precipitation.

The statistical significance of the regression coefficients was estimated according to Student t-test. The autocorrelation of the time-series was considered and the effective degrees of freedom, which enter the significance test was determined based on the method presented in Lin and Derome (1998).

### Validation of EP and CP Impact on Interannual to Decadal Time-Scales

The analyses of impact on phytoplankton and primary production have been limited to 15 years by the availability of consistent climate-quality controlled satellite data (**Table 1**). To assess whether the results may be skewed due to the specific 1997/1998 EP event during the ocean-color satellite era, and to investigate the validity of the results over multi-decadal time scales, we have estimated the impact of EP and CP El Niño (considering autocorrelation, Lin and Derome, 1998) on sea and air surface temperatures, wind and precipitation during the period 1979–2014 (35-years) and compared the results with the analysis during the shorter period 1997–2012 (15-year). These two periods of 15 and 35 years respectively have been selected based on availability of physical and biological data products (**Table 1**). The period 1997–2012 includes one strong EP El Niño event in 1997–1998, and three CP events in 2002–2003, 2004–2005, and 2009–2010. The period 1979–2014 includes one additional strong EP El Niño event in 1982–1983, and three additional CP events in 1986–1987, 1991–1992, and 1994–1995 (**Figure 1**).

# RESULTS AND DISCUSSION

## EP and CP El Niño Impact on Oceanic Phytoplankton

The confidence level of the response patterns (**Figures 2A,B**) identified with the statistical analysis is assessed as very likely (i.e., within 90–100% probability range; IPCC Climate Change, 2013). During EP El Niño, at the global scale, the median chlorophyll impact is found to be −6.5%, mostly driven by a large decrease of −7.5% in the tropics (3,529 pixels), and limited increases in the Northern and Southern Hemispheres of +6.6 and +5.5% respectively (505 and 643 pixels respectively). During CP El Niño, global median chlorophyll impact is found to be −7.4%, which is the resultant of large decrease estimated for the tropics of −8.7% (1,948 pixels), and limited decrease estimated for the Northern Hemisphere of −4.9% (342 pixels) and increase for the Southern Hemisphere of +4.1% (500 pixels) (Supplementary Table 1 and Supplementary Figure 2). The impact values are estimated based on EP and CP indices equal to one (annual mean index values over the period 1997–2012 are presented in **Figure 1**). It is noteworthy that monthly impact values may be ∼3–4 times higher, particularly during the peak of El Niño events, such as in December 1997 when the EP El Niño index reached value of 3.9, and in December 2009 when the CP El Niño index reached value of 2.6 (Yu, 2016).

The regions identified as most sensitive to the EP and CP El Niño climatic perturbations compare well with the locations where significant trends in chlorophyll have been estimated previously using contemporary satellite records (Vantrepotte and Mélin, 2011; Gregg and Rousseaux, 2014; Hoegh-Guldberg et al., 2014). Furthermore, the biological and physical response patterns to each type of El Niño observed in the present work over the satellite record of 1997–2012 are consistent with contemporary case studies of specific EP and CP El Niño events in the Equatorial Pacific (Turk et al., 2011; Gierach et al., 2012; Radenac et al., 2012), the Indian Ocean (Webster et al., 1999), the continental U.S. (Yu et al., 2012; Yu and Zou, 2013), and the global oceans (Behrenfeld et al., 2001; Messié and Chavez, 2012, 2013). In addition, the response patterns observed for the physical variables have also been shown to persist over decadal timescales during the period 1979–2014 (see Section Physical Forcing Mechanisms Associated with El Niño Variability). Finally, when both EP and CP indices are equal to one, the sum of the observed impacts observed in response to EP and CP El Niño (Supplementary Figure 3) is shown to be approximately equal to the impact observed using the MEI. This is coherent as the MEI encompasses effects of both EP and CP El Niño variations.

# Major Factors Influencing Phytoplankton Growth

To understand the specific influence of the climatic perturbations on phytoplankton, we must first identify the mechanisms

driving the biophysical interactions at the global and regional scales. Phytoplankton growth is light-limited at high-latitudes where annual mean nitrate concentration is high and monthly means of chlorophyll and PAR show positive correlation, and monthly means of chlorophyll and MLD show negative correlation (i.e., chlorophyll increases when MLD is shallower and light availability is higher; **Figures 3A,B**). In contrast, phytoplankton growth is nutrient-limited in the tropics and subtropics where light-availability is plentiful all-year-round, annual mean nitrate concentration is low (**Figure 3C**) and monthly means of chlorophyll and MLD show positive correlation (i.e., chlorophyll increases when MLD is deeper, and nutrient-rich deep waters are mixed with nutrient-poor surface waters, increasing nutrient availability for phytoplankton growth to occur). Further to nutrient supply from vertical mixing, the tropics display strong zonal surface currents (**Figure 3D**), which can increase horizontal advection of nutrient and, in turn, enhance phytoplankton growth. The latter mechanism can explain the weak correlation coefficients observed between monthly means of chlorophyll and MLD in some areas of the tropics and subtropics. Finally, the negative correlation shown between monthly means of chlorophyll and MLD in the eastern Equatorial Pacific is coherent with the observed high annual mean nitrate concentration (i.e., macronutrients are not limiting; **Figure 3C**), and previously reported iron limitation (i.e., limiting trace nutrient) occurring in the region (Gordon et al., 1997; Moore et al., 2013). In some specific areas of the North and Equatorial Pacific Ocean, and the Southern Ocean, known as High Nutrient-Low Chlorophyll (HNLC) regions, the low trace nutrients concentration (iron, manganese) present all year round, limit phytoplankton production.

The results presented in **Figure 3** are consistent with the different physical regimes and global climatological relationships previously demonstrated between open ocean satellite surface observations of chlorophyll and subsurface parameters of MLD, thermocline and nutricline depths at global scale (Wilson and Coles, 2005; Messié and Chavez, 2012; Brewin et al., 2014) and in the Equatorial Pacific (Turk et al., 2011; Gierach et al., 2012; Radenac et al., 2012; Lee et al., 2014). In coastal regions, phytoplankton production can be modified further by local supply of nutrients through coastal upwelling, riverine input (e.g., Turner et al., 2003) or atmospheric dust deposition (Abram et al., 2003; Jickells et al., 2005). In Polar regions, changes in phytoplankton production are tightly coupled to variations in sea-ice extent and timing of retreat, which can affect light and nutrient availability (Kahru et al., 2011, 2016; Arrigo and van Dijken, 2015).

time-series. The impact values are estimated based on EP and CP indices equal to one.

## Physical Forcing Mechanisms Associated with El Niño Variability

The results of the biological and physical responses to the EP and CP types of El Niño, which are statistically significant, are presented in **Figures 2**, **4**, and with an estimate of uncertainty in the observational product in **Figure 5**. In the Equatorial Pacific Ocean, where ENSO activity is rooted, an EP El Niño event is generated when easterly trade winds weaken in the east and westerlies prevail in the west (**Figures 2D**, **4C**), pushing warmer, nutrient-poor waters to the east (along the coast of Peru and Chile), reducing nutrient availability, leading to a decrease in chlorophyll and PP in the eastern Pacific of −12 ± 5% and −56 ± 21 TgC/y (**Figures 2A,B**, **5**). In contrast, a CP El Niño event is generated when easterly trade winds in the east and westerlies in the west are enhanced (**Figures 2D**,**4C**), pushing warmer, nutrient-poor waters to the central Equatorial Pacific, reducing nutrient availability, which is associated with a decrease in chlorophyll and PP of −14 ± 5% and −68 ± 22 TgC/y (**Figures 2A,B**, **5**). In both cases, regional decreases in phytoplankton are caused by variations in horizontal and vertical advective fluxes responsible for the transport of nutrients to the surface layer, which are driven by perturbation in the wind forcing (Ashok and Yamagata, 2009; Gierach et al., 2012; Messié and Chavez, 2012, 2013; Radenac et al., 2012). Enhanced advection is also observed during CP El Niño in the tropical Atlantic Ocean as the Equatorial easterlies intensify in the east (**Figures 2D**, **4C**), bringing warmer, nutrient-poor waters to around 15◦N (Richter et al., 2012), and leading to decreases in chlorophyll and PP of −8 ± 3% and −10 ± 5 TgC/y (**Figures 2A,B**, **5**). In the Indian Ocean, Equatorial easterlies are found to intensify during EP El Niño, promoting horizontal advection of warmer and nutrient-poor waters to the westernside of the basin (**Figure 2C**) (Webster et al., 1999), resulting in decreases in chlorophyll and PP of −11 ± 4% and −82 ± 31 TgC/y (**Figures 2A,B**, **5**), whereas in the eastern-side, the observed increases in chlorophyll and PP of +7 ± 3% and +13 ± 5 TgC/y (**Figures 2A,B**, **5**) are likely to be driven by enhanced upwelling (**Figure 2C**) and atmospheric fallout from Indonesian fires and, further north in the basin, by enhanced nutrient supply from the Ganges and Brahmaputra rivers. The latter processes are consistent with the strong increase in fires (Wooster et al., 2012; Huijnen et al., 2016) and dust deposition reported off the west coast of Sumatra (Murtugudde et al., 1999; Abram et al., 2003), and the increased precipitation patterns observed over the Himalayas (**Figure 4F**). Furthermore, in some regions, these processes may not be sufficient to explain the observed variations in chlorophyll and PP, and other processes may be involved, such as atmospheric dust deposition from desert (e.g., EP El Niño impact in the Cape Verde Sea), extent and duration of sea-ice cover in the Artic (e.g., EP El Niño impact in the Bering and Labrador Seas), and iron limitation in HNLC regions (e.g., EP and CP El Niño impact in the Pacific sector of the Southern Ocean). Further information would be required to validate these forcing mechanisms.

The estimation of EP impact relies heavily on the El Niño event of 1997–1998, which was the single, important EP event that occurred within the relatively short time span of 15 years for which we have the OC-CCI data. To evaluate the impact that this single event had on our results, the correlation analyses have been rerun without the 1997–1998 event. In this case, the influence of EP El Niño on phytoplankton chlorophyll concentration remained significant in the Eastern Pacific Ocean and Western Indian Ocean, but not in the Eastern Indian Ocean region (**Figure 5**), indicating further that other regional climate oscillations are important drivers at basin scale (please see discussion in Section Implications for Climate Impact Research). In addition, the analyses of the EP and CP impact

on physical processes are further validated on interannual to decadal timescales in **Figure 4**. The regression coefficients of the EP and CP impact estimated for the two periods 1979– 2014 and 1997–2012 show similar spatial patterns and frequency distributions for the physical variables studied. This indicates that the results presented in **Figure 2** (period 1997–2012) are not skewed to the 1997–1998 EP El Niño event, and that the impact patterns are stable over multi-decadal time scales (35-year), at least for the physical variables. Since phytoplankton dynamics are at the mercy of these physical conditions, we postulate that the

inference may also hold for the biological variables studied here (e.g., **Figures 2**, **5**).

## Implications for Climate Impact Research

Phytoplankton have a high turn-over rate, responding to changes in their environment at scales ranging from seconds to days, and illustrating well the first-level biological response to environmental changes. At the same time, because of decadalscale variabilities in the physical forcing fields, it is generally understood that multi-decadal, uninterrupted data are needed to evaluate the impact of climate change on marine ecosystems. Such data are only rarely available from limited in situ time series stations (mostly coastal). Furthermore, satellite oceancolor sensors have provided barely two decades of uninterrupted data that can be used for climate research (Sathyendranath and Krasemann, 2014). In this context, El Niño variability, together with other large-scale inter-annual variations, provides an important vehicle to study how phytoplankton in the ocean (and hence the organisms at higher trophic levels) respond to climate variability and identify the driving processes. In turn, monitoring and analysis of long-term changes in these driving processes would help us to improve understanding of projected impact of long-term climate changes on the marine ecosystem.

In the present study, we have addressed potential issues related to the collection of continuous ocean-color time-series and processing of climate-quality products, to the study of biophysical interactions, El Niño remote-forcing mechanisms and their propagation, and the diversity of El Niño events by: (i) using the longest, error-characterized, biased-corrected, climatequality controlled, global scale merged satellite ocean-color data product from ESA Ocean-Color Climate Change Initiative project; (ii) analyzing in synergy satellite ocean-color data record and reanalysis datasets to identify the dominant mechanisms driving the biophysical interactions; (iii) characterizing local and remote influences of El Niño types on key driving variables of SST, Sea Level, wind, and precipitation; (iv) analyzing annual mean signal centered around the peak timing of El Niño activity in boreal winter; and (v) selecting EP and CP indices, which are computed to enhance differences in SST anomalies from the two most eastern and central Pacific Niño1+2 and Niño 4 regions respectively.

Our work highlights the importance of maintaining a long time series of consistent ocean-color products, to be able to evaluate the impact of climate variability on the biological fields. For example, our results on the impact of EP events could be improved when additional EP events can be incorporated Racault et al. ENSO Impact on Phytoplankton

into the analyses, such that the results would no longer be so heavily dependent on a single EP event, as was the case here. More data from longer time series are also essential to explore non-linearities in biological responses, which could not be investigated here because of limited data availability.

Our analysis shows that the modification of global oceanic phytoplankton under climate change cannot be forecast with respect to changes in a single ocean property. Rather, a range of environmental properties may be involved (e.g., advection in three dimensions, wind, riverine input, atmospheric dust deposition, stratification) whose intensity may vary on a regional basis. The statistical approach to study El Niño impact applied here has permitted us to characterize a complex mosaic of biological responses illustrating that different forcing dominates in different regions. The biophysical processes driving phytoplankton production are summarized in **Figure 6** in the form of an atlas of EP and CP El Niño impact. The influence of CP and EP El Niño events can be felt in the global oceans, although the affected regions are predominantly located in the tropics and subtropics encompassing 66–67% of the total areas affected, and the remaining 33–34% are areas located in high-latitudes. In the tropics and subtropics, 35–39% of the total affected areas showed a decrease in PP associated with reduced nutrient availability during CP and EP El Niño respectively, whereas in higher latitudes 19–20% of the affected areas showed an increase in PP associated with reduced light limitation (**Figure 3**). Even though, the percent of total affected areas are relatively similar between CP and EP El niño events, the regional differentiation is marked, and may be of opposing sign (e.g., along the coast of Peru and Chile, the Benguela upwelling, the Great Barrier Reef), or affected during an EP event but not during a CP event (e.g., in the tropical eastern and western Pacific). Several process-orientated studies have further highlighted the important role played by horizontal processes (together with vertical processes) in the supply of nutrients in the surface layer, and specifically demonstrated significant impacts in Winter new primary production in the North Pacific transition zone (Ayers and Lozier, 2010), interannual variations of chlorophyll concentration in the Equatorial Pacific (Gierach et al., 2012; Messié and Chavez, 2013; Dave and Lozier, 2015) and the Red Sea (Raitsos et al., 2015), and decadal variations in phytoplankton abundance in the North Atlantic Subpolar Gyre (Martinez et al., 2015). Thus, both the development of statistical methods to study climate impact, and the assessment of the future evolution of regional physical forcing processes may help us to understand phytoplankton responses to climate change and improve confidence in our projection of future ecosystem state (Bopp et al., 2013; Boyd et al., 2014). The first assessment based on a biogeochemical and ecosystem model output of chlorophyll response to the EP and CP types of El Niño was shown to compare well with remotely-sensed observations in the Equatorial Pacific (Lee et al., 2014). However, the response to El Niño variability projected from biogeochemical and ecosystem models is yet to be investigated at the global scale.

Notwithstanding the dominant influence of El Niño on global climate patterns, other driving factors may enhance or weaken the observed biophysical impact. Examining links between El Niño and inter-annual and decadal climate oscillations (Di Lorenzo et al., 2010; Izumo et al., 2010) may provide further insights toward improving projection of environmental properties and associated phytoplankton responses to climate forcing at global and regional scales. The regional variations associated with El Niño may be superimposed on long-term warming trends (Bopp et al., 2013; IPCC Climate Change, 2013; Boyd et al., 2014; Kumar et al., 2016) and regional-scale oscillations at sub-seasonal and seasonal scales associated with other large-scale climate modes of variability, such as the Atlantic Multidecadal Oscillation (AMO) and Pacific Decadal Oscillation (PDO; Martinez et al., 2009), the North Pacific Gyre Oscillation (NPGO; Di Lorenzo et al., 2008, 2010; Messié and Chavez, 2013), the monsoon and Indian Ocean Dipole (IOD; Saji et al., 1999; Ashok et al., 2007; Izumo et al., 2010; Brewin et al., 2012; Currie et al., 2013). As a result, the regional climate response is not a simple function of the strength and centroid location of an El Niño event. Further, the regional patterns observed using EP and CP El Niño indices may be sufficient to explain only a fraction of all the regional variations on a year-to-year basis (except perhaps where El Niño is likely to dominate the variability of the system such as in the Equatorial Pacific region). For instance, in the Indian Ocean, the ENSO and IOD indices can account for ∼30% and 12% respectively of the regional variations in SST (Saji et al., 1999), and years of co-occurrence of positive IOD and El Niño events may provide positive feedbacks to the SST (Kumar et al., 2016). Therefore, some apparent differences will show between the observed impact of El Niño on biological and physical variables, and the corresponding anomalies.

## Implications for the Oceanic Ecosystem and Carbon Cycle

Phytoplankton are at the base of the food chain and transfer energy to higher trophic levels. This transfer of energy has a knock-on effect on fisheries and dependent human societies especially in highly productive and coastal upwelling regions, as well as coral reef ecosystems. The larvae of many marine species graze on phytoplankton during this most vulnerable stage of their lives. Hence, changes in phytoplankton population associated with climate variability may propagate rapidly up the marine food chain and profoundly alter the functioning of marine ecosystems (Platt et al., 2003; Edwards and Richardson, 2004; Lo-Yat et al., 2011). In addition, changes in environmental conditions associated with EP and CP El Niño events have been shown to impact mesozooplankton community with variable time lags in the northern California Current, which in turn can affect top down control on phytoplankton, and disrupt the pelagic food chain (Fisher et al., 2015). Following EP and CP El Niño events, quite different impact on commercially important fisheries have been reported in anchovy catches in the Humboldt Current Large Marine Ecosystem (Jackson et al., 2011) and tuna catches in the Indian Ocean (Kumar et al., 2014). In coral reef ecosystems, changes in phytoplankton population and mass bleaching following an El Niño event can critically affect fisheries, recreation, and tourism services (Hoegh-Guldberg, 1999; Abram et al., 2003; Lo-Yat et al., 2011). Recent analysis in the Andaman Sea, southeast Bay of Bengal, has further

dashed contour) PP may be further controlled by other mechanisms, such as sea-ice melting, atmospheric dust deposition and availability of trace nutrients. The contour delineation of the influence of EP and CP El Niño is generated based on information displayed in Figure 2 and Supplementary Figure 4.

demonstrated that differences both in intensity and timing of SST warming associated with EP and CP El Niño events, can determine the extent of mass coral bleaching (Lix et al., 2016). In this context, regional differentiation of the impact of each type of El Niño events (**Figure 6**) may provide important information to delineate and establish protected coral reef and fishing areas to facilitate their recovery.

The oceanic carbon sink is part of a very active, natural cycle, in which phytoplankton in the surface layer of the ocean fix, by photosynthesis, dissolved CO<sup>2</sup> in the water into organic matter, some of which subsequently sinks below the mixed layer. Through the associated decrease in the partial pressure of CO<sup>2</sup> in the surface ocean, phytoplankton contribute to the drawdown of dissolved CO<sup>2</sup> from the ocean surface layer (Hauck et al., 2015), which in turn help to modulate the increase in anthropogenic atmospheric CO2. The estimated El-Niño-driven changes in PP at the regional scale can be considerable, reaching values of −57 ± 21 and−68 ± 22 TgC/y in the Eastern and Central Equatorial Pacific Ocean during EP and CP types of El Niño respectively (**Figure 5**). However, to provide a more complete picture on the influence on the carbon cycle, further investigations are required to quantify the impact of El Niño on carbon export and associated changes in air-sea CO<sup>2</sup> fluxes. The buffering action of the ocean in the carbon cycle is non-linear—it varies with the water temperature (solubility pump), alkalinity (carbonate pump), biological productivity and demineralization (biological pump); the impact on environmental and ecosystem properties must be evaluated at the appropriate scale to allow investigation of the underlying mechanisms driving the variability in the ocean carbon cycle.

As the frequency of extreme El Niño events and the relative frequency of occurrence of CP-El Niño/EP-El Niño are projected to increase under climate warming (Yeh et al., 2009; Lee and McPhaden, 2010; Cai et al., 2015), it is essential to refine our regional assessment of climate impact associated with El Niño variability. The atlas of impact of CP and EP types of El Niño on oceanic phytoplankton (**Figure 6**) can be used for societal benefit. It provides key climate impact information that can allow us to better inform fisheries management on possible risks and opportunities associated with El Niño events, and support more effectively mitigation and adaptation plans for local fisheries-dependent societies. The atlas information can also provide observational basis to test model predictions of the impact of climate change on the marine ecosystem. Finally, from a biogeochemical perspective, such insights on El Niño variability impact are needed to improve our understanding of the buffering capacity of the oceanic carbon cycle under climate change.

# AUTHOR CONTRIBUTIONS

MFR designed and implemented the research. MFR, SS, RB, TJ provided materials and analysis tools. MFR, SS, RB, TJ, DR, and TP discussed the results and contributed to the writing of the manuscript.

# FUNDING

MFR is funded through a European Space Agency Living Planet Fellowship Grant Ref Number (CCI-LPF-EOPS-MM-16-0078). SS, TJ, and TP are funded through ESA Ocean Color Climate Change Initiative program. SS, MFR, RB, and DR are funded through the NERC's UK National Centre for Earth Observation.

# ACKNOWLEDGMENTS

This work is a contribution to the European Space Agency Ocean Color Climate Change Initiative (ESA OC-CCI), the European Space Agency Living Planet Fellowship program (CLIMARECOS), and to the NERC National Centre for Earth Observation (NCEO). The authors thank the ESA CCI teams for providing OC-CCI chlorophyll data, SL-CCI sea level data, and NCEO-ESA-SST-CCI sea surface temperature data. The authors further acknowledge TWAP for providing primary production data; NASA for providing SeaWiFS and MODIS PAR data; NOAA for providing NCEP/NCAR air surface temperature and precipitation reanalysis data; ECMWF for providing ERA Interim wind and sea surface temperature reanalysis data; and SODA for providing ocean temperature reanalysis data. The authors thank James Dingle for technical support with the OC-CCI data processing, Stéphane Saux-Picart for help in processing the mixed layer depth, Sang-Wook Yeh for discussion about El Niño phenomenon, and Eleni Papathanasopoulou for discussion about socio-economic impacts. The authors wish to acknowledge use of the Ferret program for analysis and graphics in this paper—Ferret is a product of NOAA's Pacific Marine Environmental Laboratory (information is available at http://ferret.pmel.noaa.gov/Ferret/).

### REFERENCES


The authors would also like to acknowledge the Nature Method on-line web-tool BoxPlotR (http://boxplot.tyerslab.com/), which was used to generate the boxplots, and Dimitrios Kleftogiannis for information about boxplot analysis tools. We acknowledge the two reviewers for providing constructive comments on our manuscript.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fmars. 2017.00133/full#supplementary-material


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Racault, Sathyendranath, Brewin, Raitsos, Jackson and Platt. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The OLCI Neural Network Swarm (ONNS): A Bio-Geo-Optical Algorithm for Open Ocean and Coastal Waters

Martin Hieronymi <sup>1</sup> \*, Dagmar Müller 1 † and Roland Doerffer 1, 2

<sup>1</sup> Department of Remote Sensing, Institute of Coastal Research, Helmholtz-Zentrum Geesthacht, Geesthacht, Germany, <sup>2</sup> Brockmann Consult GmbH, Geesthacht, Germany

The processing scheme of a novel in-water algorithm for the retrieval of ocean color products from Sentinel-3 OLCI is introduced. The algorithm consists of several blended neural networks that are specialized for 13 different optical water classes. These comprise clearest natural waters but also waters reaching the frontiers of marine optical remote sensing, namely extreme absorbing, or scattering waters. Considered chlorophyll concentrations reach up to 200 mg m−<sup>3</sup> , non-algae particle concentrations up to 1,500 g m−<sup>3</sup> , and the absorption coefficient of colored dissolved organic matter at 440 nm is up to 20 m−<sup>1</sup> . The algorithm generates different concentrations of water constituents, inherent and apparent optical properties, and a color index. In addition, all products are delivered with an uncertainty estimate. A baseline validation of the products is provided for various water types. We conclude that the algorithm is suitable for the remote sensing estimation of water properties and constituents of most natural waters.

Keywords: ocean color, remote sensing, Sentinel-3, OLCI, extreme Case-2 waters, neural network, fuzzy logic classification

## INTRODUCTION

The Sentinel-3 Ocean and Land Colour Instrument (OLCI) was developed by the European Space Agency as part of the Copernicus Earth observation program (Donlon et al., 2012). The first of a row of consecutive satellites, Sentinel-3A, was launched early in 2016. Mission objectives include measuring of the ocean reflectance (color) as well as monitoring of sea-water quality and pollution. OLCI is based on the heritage of the Medium Resolution Imaging Spectrometer (MERIS) on board ENVISAT (mission between 2002 and 2012), but with six additional spectral bands. OLCI operates in full resolution mode with a spatial resolution of approximately 300 m and a swath width of 1,270 km. Thus, the instrument images wide sea areas including details of coastal waters, e.g., estuaries, intertidal mudflats, and lagoons, but also inland waters. The challenge is to extract extensively reliable ocean color products such as chlorophyll concentration, Chl, from such wide-scale satellite observations, which cover the high natural variability of optical water properties.

The spectral water-leaving reflectance or remote sensing reflectance, Rrs, is characterized by absorption and scattering properties of four main components: sea-water, phytoplankton (together with small organisms), colored dissolved organic matter (CDOM), and inorganic particulate material (Mobley, 1994). In addition, wind-dependent air bubbles and boundary conditions may influence the color signal. The composition of water constituents varies considerably, both temporally and regionally. At the open ocean, inherent optical properties (IOPs) of water are determined primarily by phytoplankton and related CDOM and detritus degradation products.

### Edited by:

Tiit Kutser, University of Tartu, Estonia

### Reviewed by:

Tim Moore, University of New Hampshire, USA Jenni Attila, Finnish Environment Institute, Finland Mark Matthews, CyanoLakes, South Africa

> \*Correspondence: Martin Hieronymi martin.hieronymi@hzg.de

† Present Address: Dagmar Müller, Brockmann Consult GmbH, Geesthacht, Germany

### Specialty section:

This article was submitted to Ocean Observation, a section of the journal Frontiers in Marine Science

Received: 15 December 2016 Accepted: 26 April 2017 Published: 11 May 2017

### Citation:

Hieronymi M, Müller D and Doerffer R (2017) The OLCI Neural Network Swarm (ONNS): A Bio-Geo-Optical Algorithm for Open Ocean and Coastal Waters. Front. Mar. Sci. 4:140. doi: 10.3389/fmars.2017.00140

**130**

In accordance with the classical (and not unambiguous) bipartite differentiation, these are the so called "Case-1" (C1) waters and all other water types correspond to "Case-2" (Morel and Prieur, 1977; Mobley et al., 2004). Coastal and inland waters can be significantly influenced by other constituents whose concentrations do not covary with the phytoplankton concentration, e.g., due to CDOM and mineral runoff from adjacent land areas or resuspension of bottom material in shallow waters. In extreme cases concentrations of CDOM or inorganic particles can be exceptionally high; those are defined as (Case-2) extremely absorbing (C2AX) and extremely scattering (C2SX) waters respectively (Hieronymi et al., 2016). Absorbing waters are characterized by very low marine reflectance and a shift of the Rrs maximum toward the red spectral range. Typically, the CDOM absorption at 440 nm is >1 m−<sup>1</sup> in C2AX waters. There are "black lakes," e.g., many boreal lakes, where the reflectance is negligible in almost the entire visible part of spectrum (VIS: 400–700 nm); signal from chlorophyll is—if at all—only detectable in the near infrared (Kutser et al., 2016; NIR in the sensor response division scheme: 700–1,000 nm). Observations from remote sensing face similar challenges for extreme turbid C2SX waters, because nonalgae particles mask optical properties of algae particles over large parts of the visible spectrum. But in general, the water appears much brighter; the water-leaving reflectance spectrum has still significant amplitudes in the NIR (Ruddick et al., 2006) and measurably non-zero reflectance at the last OLCI band at 1,020 nm (Knaeps et al., 2012). Typically, the concentration of inorganic suspended matter, ISM, is >100 g m−<sup>3</sup> in C2SX waters (Hieronymi et al., 2016). An overview of water type subclassification, used for differentiation in this work, is provided in **Table 1**.

Great variability of IOPs causes ambiguousness and therefore a significant degree of uncertainty in the interpretation of the remote sensing signal. We have to deal with a nonlinear and multivariate problem and the ocean color algorithm must be designed accordingly. The capability of bio-(geo)-optical algorithms strongly varies on global, regional, and very small scales and algorithms generally face more difficulties in Case-2 waters (e.g., Blondeau-Patissier et al., 2004; Darecki and Stramski, 2004; Gregg and Casey, 2004; Reinart and Kutser, 2006; Attila et al., 2013; Beltrán-Abaunza et al., 2014; Harvey et al., 2015). Indeed, it is a challenge to bridge the different scales with a high degree of reliability of the ocean color products. And we should not forget that marine atmospheric correction (AC), which is necessary to derive Rrs at the sea surface from satellite imagery and thus, provides input for in-water algorithms, is a complex task with additional uncertainties, in particular for extreme waters.

An artificial neural network (NN) is an appropriate regression technique to parametrize the inverse relationship between optical properties and reflectances. It has been proven in the last years that NNs produce reasonable approximations of ocean color products from optically complex (Case-2) waters. NNs have been applied to different satellite sensors in order to derive concentrations of water constituents, inherent and apparent optical properties (IOPs and AOPs), and photosynthetically available radiation (PAR), or to discriminate algae species (Gross



In Case-1 (C1), CDOM is related to chlorophyll concentration with arbitrary parameters X and Y.

et al., 1999; Schiller and Doerffer, 1999; D'Alimonte and Zibordi, 2003; Zhang et al., 2003; Tanaka et al., 2004; Schiller, 2006; Bricaud et al., 2007; Schroeder et al., 2007; Ioannou et al., 2011; Jamet et al., 2012; Chen et al., 2014; Hieronymi et al., 2015; D'Alimonte et al., 2016). Due to their speed, NN-based ocean color algorithms are deployed for operational and near-real time satellite observations, e.g., the MERIS Case-2 water algorithm (Doerffer and Schiller, 2007) and C2RCC (Brockmann et al., 2016).

The objective of this study is to introduce a new in-water processing scheme designed for OLCI ocean color observations called OLCI Neural Network Swarm (ONNS). The distinctive feature of this algorithm is its wide range of applicability in terms of optical water properties ranging from oligotrophic ocean waters to extremely turbid (scattering) or dark (absorbing) waters. The specific goals of the study are: (1) to reference the fundamental processing scheme, (2) to provide the scientific background, (3) to introduce the derived ocean color products, and (4) to evaluate the basic suitability of the algorithm for oceanic and coastal waters, i.e., C1, C2A, C2AX, C2S, and C2SX waters (**Table 1**).

### ONNS BASIS AND ALGORITHM DESCRIPTION

ONNS is an in-water processor, which retrieves ocean color (OC) products from Sentinel-3 OLCI satellite scenes. Inputs to the algorithm are normalized remote sensing reflectances (just above the sea surface). Atmospheric correction is not part of the inwater processing scheme, and thus, ONNS fully relies on proper atmospheric correction (see Section Retrieval Accuracy). The processor logic is illustrated in **Figure 1** and documented in the following.

### Neural Network Algorithm

As it is the case for all ocean color algorithms, NNs are valid for a certain range of constituents and their concentrations, and some parameters may be deduced more accurately than others, e.g., retrieval of suspended matter is usually the least critical, whereas

CDOM retrievals are the most challenging (Odermatt et al., 2012; Brewin et al., 2015). Our approach proposes to blend various NN algorithms, each optimized for a specific scope. This swarm of neural networks therefore, covers the largest possible variability of water properties including oligotrophic and extreme waters (**Table 1**).

### Neural Network Data Basis

Basis for NN training is knowledge of the relationship between water constituents, i.e., their optical activity, and the spectral remote sensing reflectance, Rrs. The latter is defined as ratio between water-leaving (upwelling) radiance, Lw, and downwelling irradiance, Ed, both just above the water surface. For training and validation (test) purposes, a large (>10<sup>5</sup> ) dataset has been simulated using the commercial radiative transfer software Hydrolight (version 5.2; Sequoia Scientific, USA; Mobley, 1994). Hydrolight is a forward model to compute Rrs and many other light field-related quantities from optical specifications of the water body, such as specific absorption and scattering properties. Considered concentration ranges are defined in **Table 1**. Basis for estimating distributions, ranges, and covariances of optical parameters in the model are different in situ datasets: (1) primarily our data from the North and Baltic Sea (HZG), (2) OC-CCI (ESA, worldwide; Valente et al., 2016), (3) HELCOM (Baltic Sea 1997–2013; ICES, 2011), and (4) NOMAD (NASA, worldwide; Werdell and Bailey, 2005). The simulations cover the spectral range from 380 to 1,100 nm in 2.5 nm steps (hyperspectral over full VIS and NIR). Resulting reflectances and AOPs refer to a solar irradiation from zenith direction and nadir viewing angle, i.e., they are fully normalized. Many standard settings of Hydrolight are utilized (Mobley, 1994; Mobley and Sundman, 2013); specific inputs are defined in the following.

The total absorption contains fractions of absorption of pure water, phytoplankton pigments, minerals (also inorganic detritus or non-algae particles), and colored dissolved organic matter (CDOM, also referred as yellow substance or gelbstoff ). The same distinction is made for scattering; only to CDOM no scattering is attributed. The absorption and scattering coefficients of pure water depend on temperature, salinity, and wavelength (data from WOPP v2 by Röttgers et al., 2016).

Phytoplankton absorption is determined by the composition and concentration of pigments, e.g., chlorophyll-a, Chl, which is generally used to quantify the marine biomass concentration. This means that different algae species have unique absorption spectra. Xi et al. (2015) showed the impact of chlorophyll-specific absorption spectra on Rrs . They identified five fundamental absorption shapes from which an inversion of algae species from remote sensing reflectance is possible. **Figure 2A** illustrates the basic chlorophyll-specific absorption, a ∗ p , spectra normalized at 440 nm that are utilized in this work (from Xi et al., 2015). Mixtures of these spectra represent the variability of spectral shapes that are found in measured data. It has been decided to combine two types of spectra, whereby one component dominates the signal with 80%. The globally most common spectral shape is labeled with "brown group"; it is very similar to the standard absorption spectrum used in Hydrolight and summarizes Heterokontophyta, Dinophyta, Haptophyta, and others that have a similar spectral shape. The "green group" includes Chlorophyta. Cyanobacteria are separated into blue (e.g., Aphanothece clathrata) and red (e.g., Synechococcus red) species. The two spectra for cyanobacteria are derived from in situ absorption measurements in the Baltic Sea. The three other spectra are taken from cultures. The phytoplankton (particle) absorption, ap, is related to the spectral chlorophyll-specific absorption and chlorophyll concentration, a<sup>p</sup> (λ) = a ∗ p (λ) Chl. The natural variability of phytoplankton absorption is very high (e.g., Bricaud et al., 2004) and included in the simulations (**Figure 2B**). Thus, when assessing Chl retrieval performance, this must be kept in mind.

The shape of CDOM absorption is nearly exponential. Exponential functions have been used for C1 water simulations (**Table 1**). Here, CDOM absorption coefficients, acdom, and exponential slopes are varied strongly in order to display the natural variability (e.g., Valente et al., 2016). In addition, the present work uses modeled absorption spectra that are fitted to spectral measurements (by Rüdiger Röttgers, HZG). Based on this, further CDOM spectra are extrapolated toward ultraextreme absorption with acdom(440) = 20 m−<sup>1</sup> . The exponential slope of these spectra (between 300 and 400 nm) is approximately 0.014 nm−<sup>1</sup> .

Fournier-Forand volume scattering functions have been applied for algae and non-algae particles (see Mobley and Sundman, 2013). The particle backscatter fraction, which actually correlates poorly with Chl and mineral (ISM) concentrations, is needed for the selection of appropriate phase functions. The corresponding formula of Twardowski et al. (2001) has been used for Chl-bearing particles. For inorganic particles, the mineral backscattering-ISM relationship of Zhang et al. (2010) has been utilized. The spectral mass-specific scattering coefficients are approximated by exponential functions, following the natural variability shown in measurements of organic-dominated and mineral-dominated waters (Wozniak et al., 2010 ´ ).

The atmospheric and surface boundary conditions in Hydrolight are set constant, i.e., usage of the semi-empirical sky radiance model, assuming dry air with a marine aerosol type and moderate wind speed of 5 m s−<sup>1</sup> . The refractive index of water (as it is the case for absorption and scattering) is a function of water temperature (0–30◦C) and salinity (0–35 PSU). The water is (virtually) infinitely deep. Effects of light polarization are not taken into account in Hydrolight.

All simulations have been carried out with and without inelastic scattering, i.e., Raman scattering, CDOM and Chl fluorescence, but without internal sources, i.e., no bioluminescence. In the end, data without inelastic scattering have been used for ONNS development. This is unproblematic in the selected setup with the 11 OLCI bands. Seen over the visible spectral range, differences mostly play no role, except for extreme absorbing waters, where high CDOM fluorescence is present. During algae bloom events, very high chlorophyll fluorescence peaks can be observed in nature (e.g., Fawcett et al., 2006), but modeling a certain quantum yield efficiency holds in itself great uncertainties (see Section ONNS Design). However, the simulations have been compared with observations and we generally have found a good agreement (Hieronymi et al., 2016). But we also have found some discrepancies partly related to plausible measuring uncertainties and possibly due to model simplifications.

### NN Training

One part of the simulated dataset is put aside for later quasiindependent test purposes (see Section ONNS Application to Validation Data). The rest of the Rrs data is optically classified (Section Out-of-Scope Test) and grouped together. The scopes of concentrations together with median values are given in **Table 2**.

The usable wavebands of the in-water algorithm are determined by the atmospheric correction at OLCI bands. We selected 11 (out of 21) OLCI wavebands for NN input (bands 1– 8, 12, 16, and 17, i.e., at 400, 412.5, 442.5, 490, 510, 560, 620, 665, 755, 777.5, and 865 nm). The instrument's band widths vary and to be precise, the centers of bands 12 and 16 are actually at 753.75 and 778.75 nm respectively (Donlon et al., 2012). In contrast to other NN algorithms (e.g., Doerffer and Schiller, 2007), sunviewing geometry is no input to the present NNs, instead input reflectances are normalized.

The selected output parameters are mostly common ocean color products (e.g., Nechad et al., 2015; Valente et al., 2016):


The 12 parameters are results of three independent sets of NNs, one that computes concentrations, one gives IOPs at 440 nm, and the last provides different IOPs and AOPs (see Appendix **Table A1**). Concentrations can be directly derived with NNs or alternatively, they can be estimated using IOPs, e.g., Chl from ap(440) or ISM from bm(440) (Doerffer and Schiller, 2007). The latter approach allows better adaptation of empirical relationships by means of in situ match-up data (which in case of OLCI is not yet available at present). All absorption and scattering contributions are retrieved at the reference wavelength 440 nm (pure water IOPs are known). Thereby, it is possible


TABLE 2 | Chlorophyll, CDOM, and inorganic suspended matter concentrations for the 13 optical water types.

The composition of an optical water type with reference to the sub-classification in Table 1 is additionally shown.

to estimate the total absorption, total scattering, and total attenuation coefficients. Mineral particles have usually lower absorption characteristics than CDOM, but shape-wise both are very similar. Following this reasoning, semi-analytical models, designed to retrieve IOPs from satellite data, often combine absorption by detritus (in this work only inorganic fraction) and gelbstoff (all water constituents which pass a filter pore size of 0.2 µm, which is often synonymous with CDOM). This absorption coefficient often corresponds to 412 nm. A similar idea holds true for the backscattering parameter. It is usually the backscattering coefficient of all marine particles together, which is measured in the field (at 510 nm). The diffuse attenuation coefficients are used to describe the attenuation of irradiance as a function of depth in water. It can be used to compute the depth of the euphotic zone. The Forel-Ule color scale was used for natural water classification long before the satellite era. In open ocean regions, the FU number is closely related to Chl concentration. Thus, the index can support ocean color trend analysis in the presatellite age and afterwards (Wernand et al., 2013; van der Woerd and Wernand, 2015). The color scale visualizes the color of the water body above a white Secchi disk that is hold at half Secchi depth.

A subsequent set of neural nets serves to evaluate the divergence of final OC products from the original training basis, i.e., Hydrolight simulations. The results are part of an uncertainty estimate (see Section Uncertainty Analysis).

The actual NN training procedure is described in Schiller and Doerffer (1999). The utilized multilayer feedforwardbackpropagation neural net program is documented in Schiller (2000). The code was embedded in a program to test many NN architectures, i.e., varying numbers of hidden layers and neurons, and to optimize the learning process.

### NN Scoring and Selection

Several hundreds of nets per water class and task with much different architecture have been produced. Afterwards, a ranking system has been applied in order to determine the optimal nets without over-training. In principle, statistical parameters such as root-mean-square error and goodness of fit are transformed into relative scores, which evaluate the quality of individual nets (Müller et al., 2015a). The best performing neural network architectures per water class are specified in **Table A1**. Inputs and outputs for the NNs are log10(X + 0.001), where X stands for Rrs or an ocean color product. The only exception is the Forel-Ule number, which is an integer between 1 and 21 and not logarithmized. The logarithmic form of input/output enables a distribution of values, which is closer to a uniform distribution within the range of input data, and therefore better approximation of outputs. The addition of 0.001 allows consideration of zero-values as input.

### Fuzzy Logic Classification

Optical water type, OWT, classification based on remote sensing reflectance spectra has been developed to overcome the simplifications of Case-1 and Case-2 waters (Moore et al., 2001). It can bridge the gap between regionalized optical models and the global scale by combining several models according to their respective membership to a certain water type (Moore et al., 2009, 2012, 2014).

The classification is based on the simulated Rrs spectra (at 11 OLCI wavebands), but it is used for atmospheric corrected satellite data afterwards. Negative reflectances can occur after AC sometimes, while the spectral shape is still realistic. In order to avoid conflict with negative reflectances, the spectra are therefore transformed by log10(Rrs + 1) (note that Rrs is treated differently during classification and the NN application). Before the clustering, these transformed spectra are normalized by their brightness (sum of log-transformed reflectances), so that the classification is based on the shape of the spectrum alone. As the goal is to derive representative spectra, which have different spectral shapes and in particular a different spectral maximum, the sample from the simulation database does not take into account the frequency of natural occurrence of spectra. Spectra with their maximum at 510, 620, and 777.5 nm come in two distinctive shapes and there is no spectrum with maximum at 865 nm, so that during the agglomerative clustering 13 classes are selected. The 13 OWT classes are described by their class mean and standard deviation per wavelength of the brightness-scaled Rrs, which are used for the classification of spectra furthermore. The mean reflectance spectra of the 13 OWT classes are plotted in **Figure 3**.

The five water type categories (**Table 1**) are defined by combinations of concentrations and thresholds. The water classes are designed to represent spectra, which have their maximum in the spectral shape at different wavelengths, independent of their brightness. Combining the water classes and the concentration categories is a test, which spectral shapes can be found in certain concentration ranges (**Table 2**).

Fuzzy set theory allows an element to have membership to one or more OWT classes (Moore et al., 2001). The weight of a class (membership function) is altered to allow for graded memberships, i.e., 0 ≤ w<sup>i</sup> ≤ 1, and to express partial class membership to the ith class. For constrained fuzzy sets the sum of all 13 weights equals 1. However, the class membership had to be above a minimum threshold, which was set at 0.0001. The membership to a class (weight) is calculated by determining the Mahalanobis distance between the given spectrum and the class means using the classes' covariance matrix respectively. Reconstruction of Rrs spectra by means of the fuzzy classification inversion yields mostly satisfying results. However, Rrs inversion from different atmospheric corrections reveals expected uncertainties in the violet-blue, which can be the case, if the satellite-acquired spectrum provided by the AC is distinctly different from modeled Rrs (which is basis of the classification).

Within the ONNS framework (**Figure 1**), the fuzzy logic classification scheme is used to assess the atmospheric corrected Rrs, and to determine the corresponding class memberships. The final blended retrieval for each pixel and each ocean color product is a weighted sum of the retrievals of all class-specific NNs (Appendix **Table A1**).

### Out-of-Scope Test

Well-constructed NNs have good interpolation properties but produce unpredictable output when forced to extrapolate (Doerffer and Schiller, 2000). Therefore, measures have to be taken to recognize NN input not foreseen in the NN training phase and thus out of scope of the algorithm. Regarding simulated data, the fuzzy classification is well-constructed; maximum (or high) membership of a water class usually correlates well with the scopes of the corresponding NNs. However, it may happen that the classification yields a broad distribution of weights or that all memberships are such low that the spectrum is not classifiable. In the latter case, the satellite image pixel is flagged out. Despite the memberships, a quality measure is applied that evaluates the deviation of the NN input from the NN training range. The out-of-range parameter, OOR, is zero if the input is within the range but increases with increasing deviation. The assessment treats the input-reflectances spectrally differently; wavebands in the green spectral range have highest weights. The varying signal-to-noise specifications of OLCI are one argument for this (Donlon et al., 2012). Uncertainties in the fluorescence quantum yield efficiency of phytoplankton are another argument. Furthermore, we observe higher uncertainties in violet and blue wavebands generally shown in atmospheric correction validations (Müller et al., 2015a), but also from in situ determinations of Rrs due to the variable surface reflectance factor (Hieronymi, 2016; Zibordi, 2016). The allowance for OOR > 0 is one of the fine-tuning techniques to gain better spatial homogeneity of an OLCI scene and to adapt the algorithm to in situ observations.

### Uncertainty Analysis

The determination of uncertainties of OC products is similar to the procedure applied in the C2RCC algorithm (Brockmann et al., 2016). All NNs per water class were reapplied to their training datasets to estimate the OC products. The uncertainty nets compare the estimated value, XE, with the initial training value, XT. The uncertainty per product is given as approximation (percent) error:

$$\varepsilon = 100 \frac{X\_E - X\_T}{X\_T}.\tag{1}$$

As it is the case for all OC products, the final approximation error is a weighted sum of the retrievals of all class-specific uncertainty NNs.

### Test Data

Remote sensing reflectance data at OLCI wavelengths in conjunction with bio-geo-optical properties of the top water layer are used to evaluate the capacity of ONNS. For this purpose, different statistical parameters have been utilized. The degree of deviation is presented by the absolute root-mean-square error, RMSE. The Bias shows the average difference and is a measure for systematic over- or underestimation. Furthermore, the correlation coefficient, r, is calculated.

### Simulated Data

Hydrolight-simulated data have been used to develop the classification scheme and to train neural nets. From the same data source (>10<sup>5</sup> ), a quasi-independent set of 23,445 reflectance spectra is used for testing and validation; these particular data are not used for ONNS development. The test data contain all water types from **Table 1** and are shared approximately equally.

### Simulated CCRR Data

A second synthetic dataset has been used to evaluate the performance of ONNS for the retrieval of water quality parameters. The "CoastColour Round Robin" (CCRR) dataset by Nechad et al. (2015) compiles inputs and results from 5,000 Hydrolight simulation. Atmospheric boundary conditions and simulation setup are comparable with above mentioned simulations. The used data refer to a sun zenith angle of 0 ◦ . Remote sensing reflectance at the 11 needed OLCI bands is interpolated from hyperspectral water-leaving reflectances between 350 and 900 nm with 5 nm steps. Corresponding Chl and ISM concentrations as well as CDOM absorption at 443 nm are tested with the ONNS retrieval (note that ONNS CDOM absorption coefficient refers to 440 nm).

### In situ Data

Complete in situ datasets for the evaluation of OLCI-specific algorithms like ONNS are not freely available. The accessible data of CCRR (Nechad et al., 2015) and OC-CCI (Valente et al., 2016), which compiles data from several sources (e.g., MOBY, BOUSSOLE, AERONET-OC, NoMAD, MERMAID), lack of several Rrs bands. The ONNS algorithm needs only 11 out of 21 OLCI bands, but coinstantaneous data at 400, 755, 777.5, and 865 nm are not available. However, many of these sub-datasets are actually measured hyper-spectrally. Ramses radiometers, for example, that are deployed during ourin situ campaigns, measure between 320 and 950 nm (TriOS optical sensors, Germany). Extracted multi-spectral reflectances together with Chl, ISM, and CDOM data are included in the CCRR in situ dataset; the corresponding measurement protocols are described in (Nechad et al., 2015). Our 48 data were collected between 2005 and 2006 (but not in winter) onboard a ferry from Cuxhaven to the island Helgoland in the German Bight (see **Figure 5**). Radiometric measurements were conducted under optimal sunviewing angles (e.g., Zibordi, 2016), but strictly speaking, ONNS requires angle-normalized Rrs (with the sun in zenith) as input. Nonetheless, these data are used to test ONNS as well.

### Sentinel-3 OLCI Scene

One Sentinel-3 OLCI scene is shown with permission to illustrate the qualitative and spatial application of ONNS (**Figure 5**). The tripartite scene was captured on 20 July 2016 between 9:30 and 9:36 UTC and shows large parts of the North and Baltic Sea. Thus, the scene images many different water types including different algae blooms. The satellite image indicates transparent cirrus clouds over the German Bight and Gulf of Finland, broken clouds over the Skagerrak and Kattegat, and cloud shadows. In comparison with MERIS, OLCI's view is slightly tilted in order to reduce the impact of sun glint, which is somewhat visible at the right edge of the image. Level-1 data of the first OLCI reprocessing are utilized for this work (IPF-OL-1-EO version 06.06). Atmospheric correction of the scene is provided by the C2RCC algorithm ("Case-2 Regional CoastColour," version 0.15, Brockmann et al., 2016). An additional cloud mask was applied using the provided path radiance and viewing angles. At the same time of the satellite data acquisition, we measured Rrs in the German Bight for Sentinel-3 validation purposes (**Figure 5**), but these data are not used in this work.

# RESULTS

## ONNS Application to Validation Data

The classification of simulated test data reveals that a maximum of four classes contribute to the inversion of Rrs spectra (the classes have non-zero weights). In principle, all water types (clear to extremely turbid) can be assigned properly. The classification failed on <3% of validation data; of those 56% are absorbing waters (C2A, C2AX) and approximately 70% have high Chl concentrations (Chl > 10 mg m−<sup>3</sup> ). The classification of the in situ and simulated CCRR data yields no plausible results in approximately 10% of cases. The classifiable 4512 CCRR spectra exhibit maximum memberships in OWTs 1 (9%), 2 (0.5%), 4 (0.16%), 5 (55.2%), 6 (14.9%), 9 (10.6%), 10 (6.9%), 11 (1.7%), 12 (0.3%), and 13 (0.5%). Thus, a high percentage of these data correspond to the Case-1 or moderately to strongly scattering waters (**Tables 1, 2**). The 43 in situ data points, which are captured in coastal waters of the German Bight (**Figure 5**), have maximum memberships in OWT 1 (10.4%) and 5 (89.6%).

Examples of the retrieval capabilities of ONNS in comparison with validation data are illustrated in **Figure 4**. Estimates of concentration of Chl, ISM, and CDOM are shown for different water types, namely Case-1, extreme absorbing, and extreme scattering waters (**Table 1**). In addition, ONNS retrieval tests are shown for simulated data from the CCRR dataset and our in situ data. The colors characterize the estimated uncertainty in terms of the percent error. Green marks the generic ±5% uncertainty target for satellite ocean color products (defined for oligotrophic and mesotrophic Case-1 waters), orange and red colors signify an overestimation of the retrieved value in comparison with the expected (trained) value, and blue stands for an underestimation respectively. The uncertainty can be high in ambiguous cases with significant masking effects (in extreme waters) or if the NN data basis already provides high (natural) variability, as for example for Chl concentration (compare **Figures 4A,D,G** with **Figure 2B**). Despite high Chl variability, the uncertainty target can be achieved for all magnitudes of concentrations (varying over five orders of magnitudes), but with different occurrence in the water types: approximately 30% in C1, 10% in C2AX, and 5% in C2SX waters. An acceptance level of ±50% can be achieved in >97% of cases for C1, >80% in C2AX, and >70% in C2SX respectively. In all the cases, mean and median percentage

absorption at 440 nm (right) in comparison with simulated and in situ validation data. (A–C): Case-1 data from database, (D–F): Case-2 extreme absorbing waters, (G–I): Case-2 extreme scattering waters, (J–L): simulated data from CoastColour Round Robin (Nechad et al., 2015), (M–O): HZG in situ data. Colors indicate the retrieved uncertainty.

errors are slightly negative, i.e., ONNS Chl retrieval shows a tendency for underestimation of expected values. Even if the test value is overestimated by ONNS, the uncertainty estimate may point to underestimation. This may be due to blending of NN from different OWT classes with distinctive different ranges of concentrations. In contrast to the simulated validation data, the ONNS Chl retrieval of CCRR and in situ data yields stronger deviations from the one-to-one line (**Figures 4J,M**).

With regards to ISM, the retrieval performance is less skilled if the optical signal of minerals is weak due to low mineral concentrations—as it is the case in oligotrophic waters (**Figure 4B**). The 5% uncertainty target is reached within approximately 20% of all cases in C1 waters, 27% in C2AX, and >87% in C2SX. Thus, the more non-algae particles are present, the better ONNS performs. A similar trend can be observed for CDOM retrieval (**Figures 4C,F,I**). Lowest concentrations vanish in the noise, whereas high concentrations can be retrieved accurately. Approximately 60% target-retrievals can be achieved in C1 and >94% in extreme absorbing waters. **Figure 4I** illustrates the difficulties to separate the absorption signal due to CDOM and minerals; only 6% of estimates fall in the targetuncertainty range. In comparison to the Chl retrieval, ISM and CDOM retrievals of CCRR and in situ data show better agreement (**Figures 4K,L,N,O**).

The NN-estimated uncertainties and corresponding color distributions in **Figure 4** reflect the comparative statistics that are tabulated for all water types (**Table 3**). Additional statistics of all OC products are listed in the (**Table A2**). With reference to the simulated test data and seen over all water types, the smallest differences between estimated and test data occur for the Forel-Ule number and both "mixed" IOPs, adg (412) and bbp(510). In comparison, larger deviations occur for low-concentration mineral-related values. We found weak water type-independent underestimation for the direct phytoplankton-related quantities [Chl, ap(440), and bp(440)] and for FU, Kd(490), Ku(490), adg (412), and bbp(510). But again, largest Biases are observed for the retrieval of non-algae properties in clear oceanic (almost mineral-free) C1 waters. The correlation coefficient reveals strong linear relationship for all cases exclusive of CDOM in extremely scattering waters, here the relation is weak (CDOM retrieval correlation is >0.95 for C2S and the other cases). The statistical values of the comparison with independent CCRR and in situ data paint a somewhat different picture with generally lower correlation coefficients (**Table 3**). Both datasets include turbid Case-2 waters that are predominantly characterized by one optical water type, i.e., OWT 5 (see **Table 2**). Most of the other water types are not independently evaluated.

### ONNS Application to OLCI Scene

Application of the new ONNS algorithm to the satellite image is illustrated in **Figure 6**. Again, up to four optical water type classes are needed for the inversion. In this particular scene, water classes 3, 7, and 8 have no contribution to the products (all rather extreme turbid cases, see **Table 2**); all other classes give spatially dependent contributions (**Figure 6A**). Comparison with the mean shapes of the Rrs (**Figure 3**) meets regional expectations. In the western part of the Baltic Sea, including the

TABLE 3 | Statistics of ONNS retrievals vs. test data.


Datasets marked with C1, C2A, C2S, C2AX, and C2SX refer to simulated data that are not used for NN training (numbers of points for comparison are 5392, 4699, 4049, 4526, and 4082 respectively). The independent Hydrolight-simulated CoastColour Round Robin (CCRR) dataset contains 4512 data. 43 match-ups are basis for the in situ data comparison. The corresponding plots in Figure 4 are shown in log form; the statistical values here are not in log form.

Western Gotland Basin and the Bothnian Sea, the spectra show the strongest resemblance to classes 6, 9, and 10. Spectra of the Eastern Gotland Basin and Gulf of Finland fall into class 5 mostly and the Lagoons behind the Bay of Gdansk have some spectra with the shape of class 1. In contrast, we have found maximum memberships of classes 9, 10, and 11 in the clear open North Sea and Norwegian Sea and classes 1, 2, 4, 5, and 9 along the German and Dutch coasts. In some clear water cases (OWT 9, 10, and 11), the out-of-range warning flag for input spectra raises; these cases are mostly in spatial conjunction with transparent cirrus clouds (**Figure 5**). The Forel-Ule number that is estimated with ONNS provides an intuitively color impression and reconfirms expected geographic characteristics of the sea areas (**Figure 6B**).

Concentrations of Chl, ISM, and CDOM together with their accompanied uncertainty estimates are shown in **Figure 7**. Only valid sea pixels are shown; land areas and clouds are masked out. However, in spatial vicinity to clouds and coasts, apparently wrong assessments of OC products are possible; here, the predictions are overestimating the true values for the most part. Some areas are very shallow, e.g., the Curonian and Vistula Lagoons, and therefore, bottom reflections cannot be ruled out. This again would lead to possible overestimation of (particle) concentrations. All in all, the ranges of derived concentrations are reasonable (the colors on the left side of **Figure 7** correspond to the respective units). Previous match-up analyses showed that most of the measured Baltic Chl values range between ∼1 and 10 mg m−<sup>3</sup> with somewhat smaller values in the Skagerrak-Kattegat region in comparison with the Central Baltic Sea (Pitarch et al., 2016). The ONNS-retrieved Chl values are in this range and reflect the geographic expectations (**Figure 7A**). But from the filamentous patterns it can be assumed that a cyanobacteria bloom including floating vegetation has developed in the Gotland Basin. In surface blooms, the concentrations are typically much higher, but the estimated biomass concentration seems too low. On the other hand, in some of these cases with visible algae structures, the out-of-range warning flag is raised (mostly OWT 9). Unfortunately, no in situ validation data of this OLCI image are available. The spatial distribution of ISM yields partly implausible features (**Figure 7B**). Commonly, significant concentrations of non-algae particles are not expected at the open sea. The concentration ranges, however, fit to observations by Berthon and Zibordi (2010). The regional distribution of CDOM, including the east-west gradient and high concentrations in the northern Baltic and Gulf of Finland, is plausible (e.g., Kowalczuk, 1999; Berthon and Zibordi, 2010; Ylöstalo et al., 2016). The corresponding uncertainty maps partly mirror the boundaries of the dominant water classes. Again, the high Chl uncertainties reflect the likewise high modeled variability (**Figures 2B, 4, 7D**).

ONNS application to contemporaneous Rrs measurements in the German Bight yield plausible results. All measured spectra exhibit maximum membership in OWT 5, the same as derived from the OLCI image for the transect (**Figure 6A**) and from 90% of the in situ data from the same area. The results are entirely in the same magnitudes as our previously measured in this area. ONNS estimates Chl along the transect between 1.7 and 4.5 mg m−<sup>3</sup> , ISM from 0.7 to 3.4 g m−<sup>3</sup> , and CDOM absorption (at 440 nm) between 0.38 and 0.68 m−<sup>1</sup> . Due to tides and hydrologic changes of the Elbe river plume, ISM can be higher than 10 g m−<sup>3</sup> near the coast.

# DISCUSSION

### Retrieval Accuracy

In general, the retrieval statistics of ONNS (**Tables A1**, **A2**, **Figure 4**) display the general problems of OC algorithms in the various water types (e.g., Blondeau-Patissier et al., 2004; Gregg and Casey, 2004; IOCCG, 2010; Odermatt et al., 2012; Brewin et al., 2015; Pitarch et al., 2016). All the more, one has to critically assess the quality of the Rrs input spectra, which is influenced by two factors: the sensor calibration (over which we have no control) and the atmospheric correction (AC). A careful atmospheric correction is of systematic importance for the success of the in-water algorithm—in particular for extreme Case-2 waters. Most of the light arriving at the satellite has been scattered by the atmosphere or reflected at the sea surface. The atmospheric path radiance is typically >85% of the total signal in C1 waters, >60% in C2SX, and >94% in C2AX waters (IOCCG, 2010). Existing AC processors address the various modeling aspects quite differently, e.g., the treatment of subvisible cirrus clouds or aerosol properties, and therefore, have

(contains modified Copernicus Sentinel data [2016] processed by ESA/EUMETSAT/HZG). The boundaries of individual scenes are marked with dashed lines. The picture detail shows the route with reflectance measurements in the German Bight.

strengths and weaknesses for specific water types (Müller et al., 2015a). In view of the new algorithm ONNS, which relies on normalized reflectances, angle-dependent AC processes such as "smile correction" and sun glint handling are important as well to ensure spatial homogeneity of satellite data. OLCI's viewing direction is slightly shifted in comparison with MERIS in order to reduce sun glint contaminated areas. However, some AC processors incorporate sun glint contributions in their reflectance models and derive normalized Rrs in this condition. These AC yield much larger coverage of data (Müller et al., 2015b), but areas with high glint should nevertheless be considered cautiously. One of the AC processors is Polymer (Steinmetz et al., 2011), which reveals good performance in comparison with MERIS matchups (Müller et al., 2015a). Another processor is C2RCC (Case-2 Regional CoastColour; Brockmann et al., 2016), which is an evolution of the precursors "Case-2 Regional," "ForwardNN," and the "MERIS Case-2 water" algorithm (Doerffer and Schiller, 2007). C2RCC is available through ESA's Sentinel toolbox SNAP and it is used in the Sentinel-3 OLCI ground segment processor

of ESA for generating Case-2 water products. Both Case-2 AC algorithms, Polymer and C2RCC, provide usable normalized Rrs at OLCI bands and give comparable memberships of OWT classes. Some differences between C2RCC and Polymer-derived Rrs are visible, mainly in the shape of the reflectance spectrum in the violet-blue spectral range. The sensor calibration especially for shorter wavelengths is subject to current investigations. Future versions of AC algorithms may incorporate the specific sensor properties. For this reason, we have to keep in mind that the results of ONNS to a certain degree rely on the applied atmospheric correction and data reprocessing version (subject to ongoing research).

One of the most important ocean color quantity is chlorophyll concentration. It is our general impression that ONNS delivers Chl in the expected orders of magnitude. Future tests must show the suitability of ONNS in comparison with other algorithms, globally and for the specific region (e.g., Blondeau-Patissier et al., 2004; Darecki and Stramski, 2004; Gregg and Casey, 2004; Attila et al., 2013). However, the Baltic Sea for example is known for intense cyanobacteria blooms with small-scale patches and extreme high biomass conditions (Chl > 200 mg m−<sup>3</sup> ) partly associated with surface scums and floating algae. Under these conditions, results from different satellite sensors are very variable, values of Chl may exceed processing limits and atmospheric correction often fails (Reinart and Kutser, 2006). Optical properties of floating (also air bubble containing) material can be distinctly different from the data basis assumed in this work, e.g., higher backscattering and also higher reflectance in the NIR. Consequently, Rrs resembles dry vegetation rather than water (e.g., Kutser, 2004; Matthews et al., 2012), or—in terms of the defined optical water types—looks like (extreme) scattering waters. This means, as a corollary, that high biomass (Chl) is rather interpreted as non-algae particles (ISM). The outof-range warning is notified for some of the affected areas. But here, it can be useful to raise an additional flag for surface scum conditions as it is suggested by Matthews et al. (2012). With regards to the possible misinterpretation of algae vs. non-algae particles, we must concede a potential weakness of the biogeo-optical model assumptions underlying the simulated data basis. For example, the data include high variability of scattering properties but do not take scattering properties of different species into account (only chlorophyll-specific absorption), but it is likely that they are different (e.g., Harmel et al., 2016).

The selected OLCI scene is a good example for phytoplankton diversity. It is not well visible in **Figure 5**, but different algae blooms occur (none of them are confirmed). Very likely, a cyanobacteria bloom occurred in the Gotland Basin of the Baltic Sea. The bright water top left of the image along the Norwegian coast points to the occurrence of blooming coccolithophore. Moreover, west of the island Sylt in the German Bight fingerlike structures related to enhanced biomass are recognizable. The fact that we have to deal with different species within predominant water types increases the uncertainties. Chlorophyll-specific variability is included in the database for ONNS (**Figure 2A**).

This and the high natural variability of phytoplankton absorption vs. Chl concentration are reflected in the uncertainty estimates of ONNS. On this basis, future developments of ONNS may be directed into optical differentiation of diversity with corresponding traceability of uncertainties (Bracher et al., 2017; Mouw et al., 2017).

It is a frequent practice to derive inherent optical properties from ocean color and from this create an empirical relationship to observed concentrations (e.g., Doerffer and Schiller, 2007). Besides the directly retrieved concentrations, ONNS provides a number of IOPs and AOPs. The nets which derive concentrations (**Table A1**, second column) must balance their estimates indirectly by means of the relationship between absorption and scattering properties. NNs that retrieve IOPs or AOPs can rather focus on either spectral reflectance reduction or enhancement, i.e., absorption or scattering. This is demonstrated by the statistical analysis shown in **Table 3**, **A2**. The correlation coefficients of estimated IOPs and AOPs are generally very high, even if we have to deal with cases of significant pigment absorption masking due to the influence of sediments. Once the spectrum is properly classified, the anticipated values are significantly restricted, which leads to high correlation. The derived sets of IOPs and AOPs form closure in a similar manner as the Hydrolight simulations, e.g., if we compare the sum of absorption and scattering coefficients at 440 nm (which is the attenuation coefficient) with the diffuse attenuation coefficient of downwelling irradiance, Kd(490), the correlation coefficient yields 0.856 for the test data and 0.841 for the ONNS-retrieved values, both seen over all water types. Due to the high inputvariability of phytoplankton absorption vs. Chl concentration, we expect higher uncertainties related to the biomass concentration than related to other quantities as for example CDOM absorption (**Figures 4**, **7**). The exploitation of various IOPs, e.g., absorption coefficient adg (412) or backscattering coefficient bbp(510), may lead to more accurate and regionalized OC products. Additionally, certain absorption or scattering properties may help identifying oceanographic features such as water masses or (sub-) mesoscale eddies and frontal systems (structures are best visible in total particulate backscattering). ONNS already is a regionally employable algorithm that delivers plausible outputs, and is thus in line with new multi-water type ocean color algorithms (e.g., D'Alimonte et al., 2014; Moore et al., 2014).

### ONNS Design

The present version of the bio-geo-optical processing scheme applies 11 (from 21) OLCI bands (namely at 400, 412.5, 442.5, 490, 510, 560, 620, 665, 755, 777.5, and 865 nm). Wavebands that are affected by phytoplankton fluorescence (at 673.75, 681.25, and to only a minor degree at 708.75 nm) are not utilized. In principle, the inclusion of these bands could help the classification and Chl retrieval capacity, in particular for highly eutrophic waters. Admission of the three additional bands, also in the combined form of a fluorescence line height, slightly increases the accuracy of the Chl retrieval with respect to the simulated dataset, where inelastic scattering features with the standard settings of Hydrolight are included. But we have to keep in mind that fluorescence (quantum yield efficiency) is subject to strong fluctuations and potential false assessment; it has diurnal variability, depends on nutrient- and light-availability and algae species (e.g., Greene et al., 1994). The retrieval accuracy slightly decreases if the present fluorescence line height mismatches the expected range from the training dataset. Our tests show that this is less of a problem in case of in situ measured remote sensing reflectance, but deviations can be higher in case of atmospheric corrected satellite data (tested with C2RCC and Polymer). A proper atmospheric correction for these bands is difficult to achieve. For this reason, the Chl fluorescence bands are not used in the present version of the ONNS algorithm.

The main purpose of other OLCI NIR bands is atmospheric correction, e.g., due to oxygen and water vapor absorption and optical features of aerosols. Thus, satellite-derived Rrs is not provided for all of the NIR bands (Steinmetz et al., 2011). In contrast, many radiometers that are deployed for in situ Rrs determination measure hyper-spectrally in the VIS and NIR range (e.g., Ramses sensors). Hence, 20 OLCI bands (or more bands) could be theoretically used for a bio-geo-optical algorithm. The last OLCI band at 1,020 nm is not covered by many radiometers. For this reason and because of the little information gain in most waters, the 1,020 nm band was also not selected for ONNS input. However, we must bear in mind that available spectral bands in compiled in situ datasets (e.g., Nechad et al., 2015; Valente et al., 2016) are limited too, making meaningful validation difficult.

The new algorithm deploys Rrs that are angle-normalized, i.e., the sun is at zenith and the viewing direction is perpendicular. All sun and viewing angle-related effects must be eliminated by the atmospheric correction prior to ONNS application. The approach simplifies for example comparisons of different satellite sensors. The first step of the processing scheme is to transform the input Rrs into brightness-scaled reflectances. The advantage of this approach is that the classification is less sensitive to the amplitude of Rrs spectra, which can be shifted by various scattering processes, e.g., due to wind-dependent micro-bubbles in water (white scatterer), marine particle aggregation, particle size, or just under-estimation of the measured total scattering (e.g., McKee et al., 2013).

### OWT Classification

The classification of synthetic validation data with same data source shows general good performance for most of the water types. Occasionally, in <3% of the validation data, the fuzzy classification yields no plausible memberships of the classes and thus no ONNS-retrieval values. Classification of very weak remote sensing reflectance signals, for example, is still challenging but mostly possible. The reason is that, e.g., in CDOM-rich lakes, the reflectance is near zero in almost the entire VIS, but nevertheless, significant phytoplankton biomass can be present (e.g., Kutser et al., 2016).

Fuzzy logic classification of the in situ and simulated CCRR validation data yields no significant memberships in approximately 10% of cases, i.e., in 5 and 488 cases respectively. Hence, the spectra were not considered plausible and the final blended retrieval delivers no results. One possible explanation is that spectral shapes of Rrs appear which not occur in the database with 10<sup>5</sup> spectra. Moore et al. (2001) propose a minimum threshold for class memberships, which was arbitrarily set at 10−<sup>4</sup> . If this threshold is lowered to 10−<sup>5</sup> , the non-plausible cases reduce to <1% of both datasets. Further tests are needed in the framework of an all-water-type-embracing validation.

Applied to a satellite scene, all marine spectra are classifiable, but water classification can be problematic and spatially heterogeneous in association with cloud and adjacency effects. Nonetheless, the optical water type classification of the scene basically yields geographically expected results. Three of the water classes (OWT 3, 7, and 8) gained never significant weights. Therefore, they were not used for blending. Those cases include extreme absorbing or scattering cases with very high biomass, e.g., like the mentioned "black lakes" (Kutser et al., 2016) or the Gulf of Finland (Ylöstalo et al., 2016), which is mostly flagged out due to clouds in the example scene. In global terms, they are restricted very locally. However, the three cases each represent a spectral Rrs with maxima in one of the three selected NIR bands (**Figure 3**); therefore, they have an essential function in the fuzzy logic classification scheme.

### Application to Radiometric In situ Data

The OC processor ONNS can be applied to in situ measurements as well. In this case, an atmospheric correction is not needed. Remote sensing reflectance can be determined from above- or inwater radiometric measurements. Nechad et al. (2015) assembled reflectance measurements that are gained in five different manners. In case of above-water measurements for example, downwelling irradiance, sky radiance, and upwelling radiance are measured under specific sun-viewing angle conditions. From this, Rrs is determined using a surface reflectance factor, which depends on sun-viewing geometry and wind speed and which is usually between 2 and 5% for optimal viewing angles (Hieronymi, 2016; Zibordi, 2016). The surface reflectance factor determines the shape of Rrs mainly in the violet-blue range. Thus, there are some uncertainties for the application of the classification scheme and the overall processor. Optimally, the measured Rrs is adapted to the input criteria for ONNS, i.e., fully normalized, a moderate wind speed, no micro-bubbles in water, etc., In our tests, the fuzzy logic classification scheme is applicable and yields useable inputs for ONNS. Furthermore, ONNS retrieves OC products in the expected orders of magnitude. However, this cannot hide the fact that the retrieval statistic for the comparison with in situ data could be better (**Table 3**). Thus, more and all-water-typeembracing validation is needed.

### Outlook

Proper validation of ONNS products using in situ data and OLCI match-ups will be a future task. Every OWT class must be validated (and possibly readjusted) independently, knowing that it is the balance between the water constituents (phytoplankton, minerals, and CDOM), represented in the training data, that decides on the quality of the OC products (D'Alimonte et al., 2016). Furthermore, higher validation uncertainties must be expected in extreme Case-2 waters and heterogeneous waters in coastal areas or during algae blooms (e.g., Kutser, 2004; Pahlevan et al., 2016). These in situ validation uncertainties must be incorporated into the delivered uncertainty products. The aim of this paper is to provide the scientific background description of the processor together with a baseline validation of the present ONNS version (v0.4). Once the processor has been validated, it is planned to make it freely accessible via ESA's Sentinel toolbox SNAP. After that, the algorithm will be compared with other bio-geo-optical algorithms.

In principle, ONNS can provide results in near-real time. The computational time depends on the (in our case high) number of neurons of the NNs and a swarm of 4 × 13 NNs obviously takes more time. However, single NNs are fast and the processing can occur in parallel. Thus, OC products can be disseminated in near real time mode, which usually comprises the time up to one day after satellite acquisition.

## CONCLUSIONS

This study presents a novel in-water algorithm for the retrieval of ocean color remote sensing products from atmospheric corrected OLCI-like satellite imagery or in situ radiometric measurements. The algorithm consists of several specialized neural networks with task-optimized architectures (OLCI Neural Network Swarm). The products contain concentrations of water constituents (Chl and ISM), inherent and apparent optical properties [acdom(440), ap(440), am(440), adg (412), bp(440), bm(440), bbp(510), Kd(490), and Ku(490)], and a sea color index (FU). In addition, all products are delivered with an uncertainty estimate that describes the deviation of the product from the original data basis. The algorithm makes use of a comprehensive fuzzy logic classification scheme. Thirteen optical water type classes have been identified based on Hydrolight simulated and brightness-scaled remote sensing reflectances at 11 OLCI bands (400, 412.5, 442.5, 490, 510, 560, 620, 665, 755, 777.5, and 865 nm). The corresponding water types range from clearest sea waters to extreme Case-2 waters (**Table 1**). This includes chlorophyll concentrations up to 200 mg m−<sup>3</sup> , non-algae particle concentrations up to 1,500 g m−<sup>3</sup> , and an absorption coefficient of colored dissolved organic matter up to 20 m−<sup>1</sup> at 440 nm. A baseline validation of ONNS products for the various water types is provided, showing principle strengths and weaknesses of the algorithm. With simulated test data the algorithm performs generally well within the wide range of optical properties of the water. Additional tests have been conducted using simulated data from the independent CCRR database and a few in situ data; both datasets contain mostly turbid Case-2 waters, which are classified in few optical water type classes. As might be expected, these comparisons revealed somewhat worse correlation but are overall encouraging, for example regarding ISM and CDOM retrieval. An appropriate full validation for all OWT classes and all provided ocean color products is still to be done. Conclusions on the performance using OLCI Earth observation data can be drawn after throughout validation against field measurements or other bio-geo-optical algorithms. The shown example demonstrates that ONNSestimated ocean color products are mostly within the range of observed concentrations (e.g., Kowalczuk, 1999; Berthon and Zibordi, 2010; Pitarch et al., 2016; Ylöstalo et al., 2016). From our present point of view, we conclude that the new ONNS inwater algorithm is suited for the remote sensing estimation of water properties and constituents of most natural waters.

## AUTHOR CONTRIBUTIONS

MH and DM developed the concept of the processor ONNS with consultancy of RD. MH prepared the synthetic data basis, trained the neural networks, and wrote the paper. DM developed the water type classification, processed data, and contributed text modules. RD helped with the processor development and fed the discussion.

## FUNDING

This work is a contribution to the European Space Agency (ESA) funded Ocean Colour Climate Change Initiative (OC-CCI: AO-1/6207/09/I-LG), Case-2 Extreme Water project (C2X: 4000113691/15/I-LG), and Living Planet Fellowship Programme (LowSun-OC: 4000112803/15/I-SBo).

### ACKNOWLEDGMENTS

This paper is an outcome of the CLEO Workshop "Colour and Light in the Ocean from Earth Observation," held in Frascati, Italy in September 2016. The authors would like to acknowledge data processing and discussions with colleagues Rüdiger Röttgers, Hajo Krasemann, Kerstin Heymann, and Wolfgang Schönfeld. In addition, the authors thank the C2X

### REFERENCES


project and science support team for valuable comments throughout the algorithm development, in particular Carsten Brockmann, Kerstin Stelzer, Ana Ruescas, François Steinmetz, Kevin Ruddick, Bouchra Nechad, Gavin Tilstone, Stefan Simis, and Peter Regner. We thank ESA/ EUMETSAT/ EU Copernicus for providing Sentinel-3 data and for permission to use them. Finally, the detailed comments of three reviewers and of the guest associate editor Tiit Kutser are highly appreciated.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Hieronymi, Müller and Doerffer. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# APPENDIX

TABLE A1 | Numbers of neurons from selected neural network architectures (input, hidden, and output layers) for all 13 optical water type classes.


Three sets of NNs deliver selected concentrations, IOPs, AOPs, and the Forel-Ule color code. A fourth set of NNs (right column) estimates the uncertainties of the NN outputs. Inputs and outputs for the NNs are log10(X + 0.001), where X stands for Rrs or an ocean color product (this applies not for FU).



Statistics are based on NC<sup>1</sup> = 5392, NC2<sup>A</sup> = 4699, NC2<sup>S</sup> = 4049, NC2AX = 4526, and NC2SX = 4082.

# Bio-Optical Properties of Two Neigboring Coastal Regions of Tropical Northern Australia: The Van Diemen Gulf and Darwin Harbour

David Blondeau-Patissier 1, 2, 3 \*, Thomas Schroeder <sup>3</sup> , Lesley A. Clementson<sup>4</sup> , Vittorio E. Brando<sup>5</sup> , Diane Purcell 1, 6, Phillip Ford<sup>7</sup> , David K. Williams <sup>6</sup> , David Doxaran<sup>8</sup> , Janet Anstee<sup>7</sup> , Nandika Thapar <sup>9</sup> and Miguel Tovar-Valencia<sup>2</sup>

<sup>1</sup> North Australia Marine Research Alliance (NAMRA), Darwin, NT, Australia, <sup>2</sup> Research Institute for the Environment and Livelihoods (RIEL), Charles Darwin University, Darwin, NT, Australia, <sup>3</sup> Oceans & Atmosphere, Commonwealth Scientific and Industrial Research Organization (CSIRO), Brisbane, QLD, Australia, <sup>4</sup> Oceans & Atmosphere, Commonwealth Scientific and Industrial Research Organization (CSIRO), Hobart, TAS, Australia, <sup>5</sup> Institute of Atmospheric Sciences and Climate, National Research Council (CNR), Rome, Italy, <sup>6</sup> Australian Institute of Marine Science (AIMS), Darwin, NT, Australia, <sup>7</sup> Oceans & Atmosphere, Commonwealth Scientific and Industrial Research Organization (CSIRO), Canberra, ACT, Australia, <sup>8</sup> Laboratoire d'Océanographie de Villefranche-sur-Mer (LOV), Centre national de la recherche scientifique (CNRS), Villefranche-sur-mer, France, <sup>9</sup> Agriculture, Commonwealth Scientific and Industrial Research Organization (CSIRO), Canberra, ACT, Australia

### Edited by:

Tiit Kutser, Estonian Marine Institute, University of Tartu, Estonia

### Reviewed by:

Oliver Zielinski, University of Oldenburg, Germany James Acker, NASA, USA

### \*Correspondence:

David Blondeau-Patissier david.blondeau-patissier@csiro.au

### Specialty section:

This article was submitted to Ocean Observation, a section of the journal Frontiers in Marine Science

Received: 01 December 2016 Accepted: 10 April 2017 Published: 15 May 2017

### Citation:

Blondeau-Patissier D, Schroeder T, Clementson LA, Brando VE, Purcell D, Ford P, Williams DK, Doxaran D, Anstee J, Thapar N and Tovar-Valencia M (2017) Bio-Optical Properties of Two Neigboring Coastal Regions of Tropical Northern Australia: The Van Diemen Gulf and Darwin Harbour. Front. Mar. Sci. 4:114. doi: 10.3389/fmars.2017.00114 This study focuses on the seasonal and spatial characterization of inherent optical properties and biogeochemical concentrations in the Van Diemen Gulf and Darwin Harbour, two neighboring tropical coastal environments of Northern Australia that exhibit shallow depths (∼20 m), large (>3 m) semi-diurnal tides, and a monsoonal climate. To gain insight in the functioning of these optically complex coastal ecosystems, a total of 23 physical, biogeochemical, and optical parameters were sampled at 63 stations during three field campaigns covering the 2012 wet and dry seasons, and the 2013 dry season. The total light absorption budget in the Van Diemen Gulf was dominated by nonalgal particles (aNAP; >45%) during the dry season (May–October) and colored dissolved organic matter (aCDOM; 60%) during the wet season (November–April). The combined absorption by aNAP and aCDOM generally exceeded ∼80% of the total absorption budget from 400 to 620 nm, with phytoplankton, aPhy, accounting for <20%. In Darwin Harbour, where only the dry season conditions were sampled, the total absorption budget was dominated by an equivalent contribution of aCDOM, aNAP, and phytoplankton. The major processes explaining the seasonal variability observed in the Van Diemen Gulf are resuspension from seasonal south-easterly trade winds in combination with the tidal energy and shallow bathymetry during the dry season months, and mostly terrestrial river runoff during the monsoon which discharge terrestrial CDOM from the surrounding wetlands. Due to light-limited conditions all year round, the particulate scattering coefficient [bp(555)] contributed significantly (90%) to the beam attenuation coefficient c(555), thus strongly limiting phytoplankton growth (Chlorophyll a ∼1 mg.m−<sup>3</sup> ). Spatially, the Van Diemen Gulf had higher total suspended solids and nutrient concentrations than Darwin Harbour, with dissolved organic carbon and aCDOM subjected to photobleaching during the dry season. Key bio-optical relationships derived from this comprehensive

**148**

set of parameters, the first ever to be collected in this tropical coastal environment, were successfully used for a region-specific seasonal parameterization a regionspecific seasonally parameterized ocean color algorithm. Challenges related to the parameterization, and the use, of ocean color remote sensing algorithms for these optically complex waters are discussed.

Keywords: coastal waters, tropical waters, Northern Australia, optical properties, water quality, seasonal variability

### INTRODUCTION

Tropical coastal systems represent <1% of the world's ocean volume, yet they host some of the most productive and diverse ecosystems on earth (Jennerjahn, 2012; Bowen et al., 2013). Although, recent studies suggest that tropical marine regions will experience less drastic environmental changes than temperate regions in the future (Acevedo-Trejos et al., 2014), they remain largely understudied. This is particularly true in the southern hemisphere where very few datasets that capture the seasonal variations in bio-optical properties exist. The literature available on the tropical coastal waters of Far North Australia, referred to as the North Marine Region (NMR, **Figure 1**), is limited to marine phytoplankton productivity of this region (Hallegraeff and Jeffrey, 1984; Ilahude and Mardanis, 1990), its geology (Woodroffe et al., 1993) and the physical processes that influence its oceanography (Condie, 2011; Li et al., 2014). Across the NMR there is a need to develop better coastal ecosystem-based management strategies, which require quantitative assessment of the processes controlling the coastal marine environment. Combined bio-optical and biogeochemical measurements are key to quantify the ecosystem response to physical and chemical drivers. However, the remote location of the NMR makes the acquisition of in situ data by conventional sampling methods difficult (**Figure 1**), thus the use of ocean color and sea surface temperature (SST) satellite datasets for the study of the spatio-temporal patterns of biogeochemical processes in such remote areas is ideal. Recent studies using satellite-derived subsurface light attenuation [Kd(λ)], Chlorophyll-a (Chl-a) and total suspended solids (TSS) concentrations have shown the distinct seasonal cycles of those parameters in the NMR (Schroeder et al., 2009; Blondeau-Patissier et al., 2011, 2014). Standard ocean color algorithms were used however, thus likely limiting the accuracy of the quantitative retrievals of the above-mentioned products due to the optical complexity of the North Australian shelf waters (IOCCG, 2000). Recommendations for future applications include the implementation of ocean color algorithms regionally tuned to the NMR using in situ optical measurements collected in this region for their parameterization. The success of such approach has been previously shown in other coastal regions (e.g., Brando et al., 2012; Tilstone et al., 2012; Roy et al., 2013). The optical properties and concentrations presented in this study are the first to have been collected in the region, and may be used as a baseline in any future bio-optical assessments guiding environmental management.

Located within the NMR and bounded to the east by the Gulf of Carpentaria, the Van Diemen Gulf (VDG) is a semi-enclosed bay (∼16,000 km<sup>2</sup> ) with two narrow openings (∼25–30 km wide), one to the North into the Arafura Sea and a second to the West into the Beagle Gulf (**Figure 1A**). The VDG, our first study region, is a dynamic marine environment that is characterized by shallow depths (<20 m) and strong tidal forcing. Six major river catchments surround the VDG: to the east are the Wildman River and the West, South (10,000 km<sup>2</sup> ), and East Alligator Rivers; to the west are the Mary (8,000 km<sup>2</sup> ) and Adelaide Rivers (638 km<sup>2</sup> ; **Figure 1A**). While the western catchments have been actively used for agricultural purposes, mainly cattle grazing, the eastern catchments are primarily for conservation (indigenous lands and national parks; CSIRO, 2009). Our second study region, Darwin Harbour (DH) (**Figure 1B**), is a semi-enclosed, shallow macrotidal estuarine system (∼3,000 km<sup>2</sup> ) connected to the VDG via the Clarence Strait (e.g., Williams et al., 2006; Andutta et al., 2014). Home to more than half of the Northern Territory's population (∼200,000), Darwin Harbour is an isolated coastal region with the closest other cities being located >1,000 km to the east (Cairns) or west (Broome) (**Figure 1**). While there are no major ocean currents in the NMR, tidal currents play a significant role in water movement in this region (Condie, 2011). The monsoonal climate of the NMR region is characterized by a wet season that extends from November to April, during which more than three-quarters of the yearly rainfall occurs<sup>∗</sup> (e.g., Story et al., 1969). Monsoonal rainfall (∼1,700 mm·yr−<sup>1</sup> ) 1 generates large quantities of freshwater that enter the coastal waters of the VDG via the surrounding catchments (**Figure 1**). Monsoonal winds are mostly northerly or north-westerly, with episodic cyclones (e.g., Acker et al., 2009; Lyon, 2010), while south-easterly trade winds predominate during the dry season (May–October). The rainfall, wind speed and direction vary dramatically between the wet and dry seasons, thus seasonal differences in water column mixing, turbidity, salinity, wave patterns, and wind-driven surface currents are to be expected. Overall a limited amount of light penetrates the water column of our study regions all year around, thus phytoplankton biomass and productivity are affected. In the neighboring Gulf of Carpentaria, Burford et al. (2012c) found that sediment resuspension from the tidal energy was the most important physical process limiting light penetration, and consequently phytoplankton primary production.

Surface inherent optical properties (IOPs) and biogeochemical concentrations, as well as nutrients for some stations, were sampled in the VDG and DH shelf waters during one wet

<sup>1</sup>BOM Bureau of Met., 73 year statistics (1941–1974) http://www.bom.gov.au, Climate statistics for Australian locations: Darwin Airport.

season (March 2012) and two dry seasons (September 2012, 2013; **Tables 1A,B**). The specific objectives of this study are (1) to characterize the spatial and seasonal bio-optical variability of DH and VDG to understand the functioning of these complex coastal ecosystems by identifying the major controlling processes and (2) to examine the relationships that can be derived between parameters to provide recommendations for their use in biogeochemical modeling and the parameterization of satellite ocean color algorithms specific to this dynamic coastal region of the NMR.

# METHODS

# Characteristics of the Sampling Effort

The dataset presented in this study comprises 23 metadata, biogeochemical and optical parameters (**Table 1**) sampled at

TABLE 1 | (A) Details of the seasonal field sampling and (B) the variables measured, (C) Notations: abbreviations, symbols and units of all the parameters referred to in this study.




63 stations over three field campaigns in the VDG and DH (**Figure 1**). Overall, the sampling effort corresponds to 13 days of field sampling—the distance from Darwin to the furthest sampling site in the VDG is 125 nautical miles, equivalent to >12 h of travel time for a vessel equipped for water sampling. Two research campaigns were undertaken in the VDG during (1) the wet season of 2012 (29–31 March 2012; N = 16) and (2) the dry season of 2013 (10–15 September 2013; N = 22; **Figure 1A**; **Table 1A**). Additionally, DH was also sampled on two occasions but only during the dry season: first, an intensive field campaign that solely focused on DH was undertaken during 3 days at the end of the 2012 dry season (24, 26 September and 8 October 2012; N = 15) and second, several stations (N = 10) were sampled in DH at the end of the September 2013 dry season field campaign when returning from the VDG (**Figure 1**; **Table 1A**). At the time of their sampling, the bottom depths of all stations ranged from 5 to 60 m (median: 21 m; N = 63). Water sampling was carried out 3–6 times a day to cover various tidal conditions.

The tidal regime in DH and VDG is semi-diurnal. While the tides in DH are macro-tidal with a mean spring tidal range of 6 m, the tidal range in DH can reach up to 8 m during spring tides, which is large compared to the mean depth (<20 m; **Figure 1B**). The tides are meso-tidal in the VDG, with a mean spring tidal range of 3 m and some tidal amplification up to 6 m in the south east of the Gulf, near the mouths of the Alligator Rivers (**Figure 1**). As a result the tidal phasing between DH and VDG can be between 1.5 and 2 h, with tides in the VDG occurring before those in DH. The tide tables and tide charts that are publicly available are from the Darwin City tide gauge only, and thus are not applicable to the entire study region. As our field sampling occurred across an area that covered both VDG and DH, the tidal properties at each station were computed for the sampling dates, times and locations using a two dimensional, depth-averaged, finite element, hydrodynamic Resources Modeling Associates (RMA) numerical model (Williams, 2009). The tidal range, the vertical difference between succeeding (or preceding) high and low tides depending on the time of sampling, was computed for each station.

The wet season of 2012 was characterized by above-average rainfall (1,661 mm > long term1941−<sup>2014</sup> mean: 1,282 mm), with March 2012 (570 mm) receiving 80% more rain than the longterm average monthly rainfall. Similarly, the September 2012 rainfall (21 mm) exceeded the long-term average by 35%, while September 2013 was exceptionally dry (0.2 mm) (source: BOM<sup>∗</sup> ).

### Optical Measurements

At each station, vertical profiles of temperature, salinity and density (WET Labs Water Quality Monitor, WQM) as well as light absorption, beam attenuation (WET Labs ac-s with a 10-cm path-length), and light backscattering (WET Labs ECO BB-9) coefficients were measured (**Tables 1A,B**). The BB-9 backscattering meter measures the light backscattered at an angle of 124◦ with a fixed gain at nine wavelengths, which were set at 412, 440, 488, 510, 532, 595, 650, 676, and 715 nm. The ac-s measures the light absorption [a(λ)] and beam attenuation [c(λ)] at multiple wavelengths (N = 82) from 401.6 to 739.7 nm. All instruments were deployed together in an optics cage to allow for simultaneous collection of the measurements. Corrections for temperature and salinity effects on water optical properties were applied to the ac-s data using the WQM measurements (Pegau et al., 1997). The incomplete recovery of the scattered light in the ac-s absorption tube was rectified using the proportional method described in Zaneveld et al. (1994). The BB-9 backscattering dataset was corrected for light loss due to absorption over the path length at each angle and wavelength using light absorption and scattering values from the ac-s (Boss et al., 2004). All vertical profiles were binned at 0.2 m depth intervals. In addition, the water transparency was visually estimated by the same observer at all stations by lowering a black and white disk (Secchi disk). The Secchi depth reported in this study is the depth at which the disk is no longer visible.

Assuming that dissolved organic matter in water has a negligible effect on scattering, the particulate scattering coefficient bp(λ) was derived by subtracting the scattering by pure water, bw(λ) from the difference between the total light attenuation c(λ) and total light absorption (particulate+dissolved), aTot(λ) [i.e., bp(λ) = (c(λ) − aTot(λ)) − bw(λ)]. For the scattering coefficient, the particulate backscattering coefficient, bbp(λ), was obtained by subtracting the backscattering component of pure water, bbw(λ), from b<sup>b</sup> (λ). The spectral slope of the backscattering coefficient, γ, was computed as a power-law (Whitmire et al., 2007).

For the determination of the particulate absorption coefficient, the sum of the phytoplankton and the non-algal particle (NAP) components [aP(λ) = aphy(λ)+aNAP(λ)], a volume of 0.4–1 L of surface seawater was filtered through a 25 mm Whatman GF/F glass-fiber filters. Filters were stored flat in liquid nitrogen until analysis. Optical densities were measured over the 250– 800 nm spectral range with 0.9 nm increments, using a Cintra 404 UV/VIS dual beam spectrophotometer equipped with an integrating sphere. Pigmented material was extracted from the sample filter using the method of Kishino et al. (1985) to determine the optical density of the non-algal matter, ODNAP(λ). The optical density due to phytoplankton was obtained by difference [ODphy(λ) = ODP(λ)-ODNAP(λ)]. The pathlength amplification effect due to the filter was corrected by using the algorithm of Mitchell (1990). Finally, absorption coefficients of phytoplankton [aphy(λ)], non-algal particles [aNAP(λ)], and its slope SNAP were computed as described in Clementson et al. (2004).

All phytoplankton absorption spectra were plotted and quality controlled to avoid the inclusion of those contaminated by nonalgal material.

### Biogeochemical Measurements

For all measurements, water samples were collected at the surface (≤2 m), using either Niskin bottles (via a Rosette system) or by lowering a clean, polyethylene bucket from the side of the ship.

### CDOM, POC, and DOC Concentrations

Water samples were filtered through a Whatman Anodisc membrane (0.22 µm; 47 mm) for CDOM and dissolved organic carbon (DOC) analysis. Filters were pre-rinsed with Milli-Q water prior to filtration. To track possible contamination of the glass filtration unit by the filters, the initial filtrate (100 ml) of Milli-Q water was discarded, and the subsequent filtrate of Milli-Q water was stored as a blank for the first and last samples collected during the campaign. The final filtrate was transferred to a SCHOTT glass bottle pre-rinsed with the same filtered seawater; the same was done for the DOC samples which were then preserved with 0.5 ml of 50% H3PO4. The samples were stored at 4◦C. The analysis of DOC and Particulate Organic Carbon (POC), derived from subtracting DOC (water filtered through 0.22 µm) from the unfiltered water samples, is further described in the method section of MacIejewska and Pempkowiak (2014b). DOC and POC were not measured for 15 stations sampled in Darwin Harbour in the 2012 dry season, and POC (N = 29) was only collected during the 2013 dry season field campaign. The CDOM absorbance of each filtrate, after equilibrating to room temperature, was measured from 250 to 800 nm in a 10 cm pathlength quartz cell using a Cintra 404 UV/VIS spectrophotometer, with fresh Milli-Q water (Millipore) as a reference. The CDOM absorption coefficient (m−<sup>1</sup> ) was calculated using the equation a CDOM = 2.3(OD(λ)/l) where l is the cell path length in meters. Finally, an exponential function was fitted to the CDOM spectra over the wavelength range 350– 680 nm from which the slope, SCDOM, was derived as described in Clementson et al. (2004).

During the dry season sampling of September 2013, a second set of CDOM samples (N = 26) was analyzed by fluorescence Excitation-Emission Matrix spectroscopy (EEM). EEM is often used to trace photochemical and microbial reactions associated with fluorescent dissolved organic matter (FDOM). In this study, it was used to assess CDOM sources (terrestrial or marine). Water samples were filtered through 0.45-µm syringe filters into SCHOTT bottles pre-washed with HCl. The samples were stored at 4◦C while at sea. They were then stored at −20◦C once in the laboratory and were allowed to reach room temperature before analysis. Each sample was analyzed using a Horiba Jobin Yvon Aqualog Excitation-Emission spectro-fluorometer (240–600 nm) and a Starna 1-cm quartz cell. The EEM spectra were recorded under standard instrumental conditions and were subsequently corrected for internal absorbance effects. Both the first and second order Raman and Rayleigh lines were removed, and the intensity was expressed in Quinine Sulfate Units (QSUs) using the appropriate instrument normalization for the integration time (Watson and Zielinski, 2013). The spectral slope ratios Sr, were computed separately from the same data by least squares fitting of the log transformed raw absorbance in the ranges 275–296 and 350–400 nm (Helms et al., 2008).

### Total Suspended Solids

For the determination of TSS concentrations, water samples (0.50–3.50 L) were filtered through pre-weighed Whatman GF/F glass microfiber filters (0.7 µm; 47 mm) pre-combusted at 450◦C. A blank filter with filtered seawater was used as a reference. The filter was then rinsed with ∼50 ml of distilled water to remove any salt from the filter and dried to constant weight at 65◦C to determine the TSS (see Figure 1 of Neukermans et al., 2012). The filters were then placed in a furnace at 450◦C for 3 h, allowed to cool and weighed to determine the amount of inorganic material remaining on the filter. To quantify sample variability between TSS samples, triplicates were taken at each station. Overall, the relative standard deviation between all triplicates was 8% (N = 57), varying the least for the DH dry season samples (2%) and the most for the VDG dry and wet season samples (13%). Laboratoryestimated particulate inorganic (PIM) and organic (POM) matter fractions of TSS were only available for the 2012 wet season samples and were extracted as described in Oubelkheir et al. (2014).

### Determination of Phytoplankton Pigment Concentrations and Cell Counts

For phytoplankton pigment concentration and composition, water volumes of 0.10–1.20 L were filtered through Whatman GF/F glass microfiber filters (0.7 µm; 25 mm) and stored in liquid nitrogen until analysis. Phytoplankton pigments, extracted by High-performance Liquid Chromatography (HPLC) following Clementson (2013), were grouped into five categories: (1) Chl-a (Chlorophyll-a, divinyl-Chl-a, chlorophyllide-a and pheopigments); (2) Chl-b (Chl-b and divinyl Chl-b); (3) Chl-c (Chl-c1, -c2); (4) photosynthetic carotenoids (PSC; fucoxanthin, peridinin, 19–HF and 19– BF); and (5) photoprotective carotenoids (carotenoids (PPC, zeaxanthin, diadinoxanthin, alloxanthin, lutein, α- and βcarotene). The relative contribution of picophytoplankton (<2 µm), nanophytoplankton (2–20 µm), and microphytoplankton (>20 µm) was estimated for each sample using diagnostic pigments as described in Uitz et al. (2006), improved from Vidussi et al. (2001).

Surface water samples for microscopic phytoplankton cell counts were collected in 100-ml darkened bottles and preserved with paraformaldehyde for a final concentration of ∼1% paraformaldehyde (2013 dry season campaign only). Using the Utermöhl technique, sub-samples were settled in 100-ml settling chambers for 24 h and examined using an inverted microscope (Nikon Eclipse Ti-S) following the methodology outlined by Hasle (1978). Large and numerically rare taxa were counted during full examination of the settling chamber (×100), while small and numerically dominant taxa were counted on 1–2 transects of the chamber (×400), or from cumulative counts of 5–10 fields of view. Diatoms and dinoflagellates were identified to the genera or species level, based on Hallegraeff et al. (2010). For picophytoplankton cell counts, samples were collected in 2.5 ml cryovials, preserved in 1% paraformaldehyde, kept in liquid nitrogen while at sea and stored at −80◦C until analysis. An Accuri C6 Flow Cytometer was used, following the procedure described in Zubkov et al. (2007). Cells' viability was assessed through microscope counts, where cells that were broken or lysed were considered to be non-viable cells. When cell numbers were compared (i.e., between the phytoplankton community >2 µm and picoplankton <2 µm), phytoplankton cell counts amounted to <0.001% of the total (picoplankton + phytoplankton).

### Specific Inherent Optical Properties

Mass-specific inherent properties (SIOPs), the absorption and backscattering coefficients normalized to their respective concentrations, are necessary for the parameterization of (semi-) analytical bio-optical models for complex coastal waters (e.g., Brando et al., 2012; Tilstone et al., 2012; Le et al., 2015). However, SIOP datasets are seldom available for most of the world's coastal ocean. To further characterize our optical dataset and to help the development of ocean color algorithms for this sub-region of the NMR, we computed the specific absorption coefficients for phytoplankton, aphy∗(λ), obtained by normalizing aph(λ) to Chl-a, as well as the mass-specific non-algal particulate absorption, aNAP∗(λ), and backscattering, bbp∗(λ), coefficients by normalizing aNAP and bbp by their TSS concentrations. Further, we acknowledge that the algal contribution to TSS may not have been negligible, during the wet season in particular.

### Nutrient Concentrations

Water samples for nutrients were collected at surface (0.5 m; 2013 dry season campaign only) and filtered through pre-combusted (500◦C) 25 mm Whatman GF/F glass fiber filters. The samples were frozen and stored at −20◦C prior to analysis. Nutrient species nitrite+nitrate (NO<sup>−</sup> <sup>2</sup> <sup>+</sup> NO<sup>−</sup> 3 ), ammonium (NH<sup>+</sup> 4 ) and phosphate (PO3<sup>−</sup> 4 ) were analyzed using a segmented flow analysis system following Ryle et al. (1981).

### Statistics

The effect of seasons (i.e., dry vs. wet season) or locations (DH vs. VDG) on the measured variables were tested using a one-way analysis of variance (ANOVA). The normality of the distributions was verified using the Shapiro–Wilk test statistics at p > 0.05 and the homogeneity of variance was tested with the Bartlett's test prior to analysis. If none of the transformations used led to normally distributed data, the non-parametric Mann–Whitney– Wilcoxon test was applied.

# RESULTS

### Metadata: Salinity, Secchi Depths, and Tides

The range of a selected set of metadata, concentrations and IOPs for the dry and wet seasons is shown in **Table 2**. Most (13) of the variables showed significant seasonal variations due to the monsoon (**Tables 2, 3**). Surface waters were found to be significantly cooler (median dry/wet: 28.35/29.83◦C; p < 0.001; N = 63) and more saline (median dry/wet: 34.64/29.43 PSU; p < 0.005; N = 63) during the dry season (**Table 2**). Surface salinity varied by 33% during the wet season (N = 16), with stations featuring differences of up to 4 PSU between the surface and the bottom of the cast (**Table 2**). In comparison, dry season stations showed little variation (median: 0.1 PSU; N = 47).

Significant differences in seasonal and regional Secchi depths were found (**Table 3**). Most of the Secchi depths sampled ranged from 1.0 to 3.0 m (67%), with 45% of the stations having a Secchi depth <2 m. The clearest waters were sampled in DH (Secchi depth >3 m), in the straits (>4 m), and in the Beagle Gulf (4 m), while most (N = 17) of the VDG dry stations had Secchi depths <2 m (**Figure 1**). Amongst the metadata, the Secchi depth was the parameter most correlated, albeit mildly, with the tidal range (R <sup>2</sup> = 0.23, N = 56) and temperature (R <sup>2</sup> = 0.22, N = 50), but not with salinity (R <sup>2</sup> < 0.1, N = 58). Secchi depths were strongly correlated with aNAP (R <sup>2</sup> = 0.73, N = 56) and aTot (R <sup>2</sup> = 0.73, N = 46), while there was a mild co-variation with TSS (R 2 =


TABLE 2 | Range of surface (<2 m) values in metadata, concentrations and IOPs in the Van Diemen Gulf and Darwin Harbour during the 2012–2013 dry and wet seasons field samplings.

The number of samples, N, varies due to quality control. Dry season includes samples from both the Van Diemen Gulf and Darwin Harbour, Wet season includes samples from the Van Diemen Gulf only. SD; Standard Deviation.

0.34, N = 57). Temperature was moderately correlated with TSS (R <sup>2</sup> = 0.47, N = 47). Tidal range was moderately correlated with aNAP slopes (R <sup>2</sup> = 0.45, N = 50), and weakly with any of the concentrations.

# Biogeochemical Concentrations and Inherent Optical Properties

The overall average Chl-a concentrations sampled during these three campaigns was 1.0 mg.m−<sup>3</sup> (N = 56). The highest Chl-a concentrations measured were 2.8 mg.m−<sup>3</sup> during the dry season and 2.3 mg.m−<sup>3</sup> during the wet season, both sampled at the same VDG station located <15 km from the mouth of the Mary River (**Figure 1A**). Although, no significant difference was found for Chl-a between seasons or locations (**Table 3**), higher Chl-a concentrations (>1.3 mg.m−<sup>3</sup> ) were found along the VDG coast and within DH (with a gradual decrease from the inner ∼1.63 mg.m−<sup>3</sup> to the outer ∼<1 mg.m−<sup>3</sup> of the Harbour), while lower Chl-a concentrations were found in the middle of the VDG (0.6–1.25 mg.m−<sup>3</sup> ) and at stations sampled in the Beagle Gulf and Dundas Strait (<0.6 mg.m−<sup>3</sup> ; **Figures 1**, **2A**; **Table 2**). TSS was found to be significantly higher for stations sampled during the dry season and located in the VDG (54.2 ± 40.0 mg.L−<sup>1</sup> ; N = 20) in comparison to stations sampled in DH (15.6 ± 9.7 mg.L−<sup>1</sup> ; N = 24) or during the wet season (6.0 ± 6.6 mg.L−<sup>1</sup> ; N = 13; **Tables 2, 3**). Higher TSS concentrations were typically found along the coast, while lower TSS concentrations were found in the inner Harbour (**Figure 2B**). There was no interdependence between TSS and Chl-a (R <sup>2</sup> < 0.1; N = 55; **Figure 3A**; **Table 5**), while TSS concentrations seem to increase with increasing tidal range, albeit with no significant relationship (R <sup>2</sup> < 0.1; p > 0.1, N = 51; **Figure 3B**). The wet season TSS samples (VDG wet) were found to be mostly (75%) composed of inorganic material (N = 13).

Organic carbon is divided into a particulate (POC) and a dissolved (DOC) fraction, both playing major roles in the ocean carbon cycle (Bauer et al., 2013) with the majority (∼95%) of the ocean organic carbon being composed of DOC (e.g., Hansell and Carlson, 2001). The concentration in organic carbon varies according to the distance from land, with open ocean waters having less organic carbon than those in coastal regions. For instance, POC was found to be in the order ∼0.1 mg.L−<sup>1</sup> in Pacific Ocean waters (Claustre et al., 1999; Fabiano et al., 1999), 0.3 mg.L−<sup>1</sup> in shallow shelf waters of the Northwest Atlantic (Bauer et al., 2002), 1.4 mg.L−<sup>1</sup> in the Baltic Sea coastal waters (MacIejewska and Pempkowiak, 2014b), and up to 1.8 mg.L−<sup>1</sup> in Chesapeake Bay (Fisher et al., 1998). For this study, POC

TABLE 3 | One-way ANOVA results for selected concentrations, IOPs and metadata. The factor "Season" refers to dry vs. wet seasons (Van Diemen Gulf stations only), and "Location" refers to Van Diemen Gulf vs. Darwin Harbour (during the dry season only).


If only the p-value is indicated, a non-parametric test was applied on the dataset. p\*\*\* < 0.01; \*\* < 0.05; NS (i.e., not significant; p > 0.05).

(only available for the dry season samples) was found to be as high as 2.23 mg.L−<sup>1</sup> , with a mean concentration of 1.42 ± 0.26 mg.L−<sup>1</sup> (N = 19) in the VDG and 1.33 ± 0.06 mg.L−<sup>1</sup> (N = 10) in DH (**Table 2**). No significant difference in POC between the two locations was found (**Table 3**). In open ocean waters, Morel (1988) proposed the empirical relationship POC = 90 Chla 0.57 (R <sup>2</sup> = 0.68; N = 409), but in coastal waters, such robust relationships between POC and Chl-a may not be observed because of the higher non-phytoplankton contribution (e.g., Sathyendranath et al., 2009). For our dataset, after converting POC from mg.L−<sup>1</sup> to mg.m−<sup>3</sup> , we obtained POC = 1,444 Chla 0.19 (R <sup>2</sup> = 0.40; N = 29; p < 0.005; **Figure 3C**; **Tables 2**, **5**). TSS samples were also examined in relation to their POC content to assess their relative fraction of organic material but this was not possible due to a poor covariation of POC with TSS (R <sup>2</sup> = 0.14; p < 0.05; N = 29) (**Figure 3D**).

Understanding the dynamics of DOC in coastal systems is key to accurately assessing the important role coastal regions play in the global carbon cycle (Fichot and Benner, 2014; Fichot et al., 2014). While POC is the sum of the masses of all organic particles, largely composed of phytoplankton and organic detritus (e.g., fecal pellets; Romero-Ibarra and Silverberg, 2011), DOC is directly related to micro-organism activities ranging from photosynthesis to virus lysis (Agustí and Duarte, 2013). The use of CDOM as a direct and reliable proxy for DOC in coastal waters has been demonstrated in previous studies. For instance, Fichot and Benner (2011) used multiple linear regressions to successfully (R <sup>2</sup> > 80%) retrieve DOC from aCDOM(275) and SCDOM in the Beaufort Sea and the Northern Gulf of Mexico. More recently, Vantrepotte et al. (2015) retrieved DOC from aCDOM(412) using linear regressions and a dataset of contrasting coastal waters, comprising samples from the English Channel (R <sup>2</sup> = 0.72), Vietnam (R <sup>2</sup> = 0.81), and French Guiana (R 2 = 0.78). For our dataset, there was no correlation between DOC and aCDOM(400) (R <sup>2</sup> ∼ 0.1; p > 0.1; N = 55; **Figure 4C**). DOC was found to be significantly different between both seasons and locations (**Tables 2, 3**). It was higher during the wet season (1.39 ± 0.29 mg.L−<sup>1</sup> ; N = 13; **Table 2**) and varied spatially from lower concentrations in the embayment of the VDG (∼0.8–1.0 mg.L−<sup>1</sup> ) increasing toward Clarence Strait (>1.0 mg.L−<sup>1</sup> ) and reaching up to 2.35 mg.L−<sup>1</sup> in DH (**Figure 1**). Overall DOC (1.34 ± 0.37 mg.L−<sup>1</sup> ; N = 10) in DH was found to be much lower than that observed by Burford et al. (2012a) in Buffalo Creek (2.9 ± 1.8 mg.L−<sup>1</sup> ), 20 Km North of Darwin. This difference can be partly explained by the seasonal effect: samples from DH were from the dry season, while Burford et al.'s Buffalo Creek DOC values were sampled during the wet season (December 2008; Burford et al., 2012a). No correlation was found between DOC and salinity (R <sup>2</sup> = 0.10; N = 38) (**Figure 4D**).

The absorption of CDOM at 440 nm was not found to be statistically significantly different between seasons (**Table 3**). Yet aCDOM(440) was characterized by higher values during the wet season (median dry/wet: 0.11/0.18 m−<sup>1</sup> ; **Table 2**), and stations located in the VDG had slightly lower aCDOM (440) values (0.11 ± 0.09 m−<sup>1</sup> ; N = 21) than those in DH (0.14 ± 0.03 m−<sup>1</sup> ; N = 10; **Figure 2C**). Overall, aCDOM(440) was found to weakly increase with Chl-a (R <sup>2</sup> = 0.20, N = 37; **Figure 4A**). Also a proxy for salinity, aCDOM can be used as a marker of freshwater influence during flooding conditions (e.g., Schroeder et al., 2012), but for our dataset, there was a very weak relationship between salinity and aCDOM (**Figure 4B**): albeit with a poor relationship, CDOM was found to increase with salinity during the dry season (R <sup>2</sup> = 0.14; N = 26; **Figure 4B**), while it decreased with salinity during the wet season (R <sup>2</sup> = 0.34; N = 11) possibly due to its terrestrial source (thus in higher concentration near the rivers). Its spectral slope, SCDOM, was not seasonally or spatially uniform (0.008–0.017; N = 38) but statistically, no seasonal or spatial difference was found (**Tables 2**, **3**; **Figure 5A**). SCDOM was within the set range described by Blough and Del Vecchio (2002). The spectral slope of NAP did not change between seasons but unlike SCDOM, SNAP was found to differ spatially (p < 0.001; N = 45) with the dry season samples located in the VDG (median: 0.014 m−<sup>1</sup> ; N = 20) having steeper slopes than those from DH (0.012 m−<sup>1</sup> ; N = 25) (**Figure 5B**). From the EEM, we found Sr to be between 1.5 in DH and 3.5 in the VDG (**Figure 13A**). The slope ratios are in accordance with coastal environments, as sampled in our study, where Sr ranged from wetlands (0.69) to oceanic (9.02) (Helms et al., 2008). In addition, aCDOM(350) was found to exponentially decrease with S275−<sup>295</sup> (**Figure 13B**), as per

FIGURE 2 | Spatial distribution of measured (A) Chl-a, (B) TSS, (C) <sup>a</sup>CDOM(440), and (D) <sup>b</sup>fbp 555 during the dry (gray) and wet (black) seasons. Symbol sizes are representative of the concentration range sampled during a specific season.

**Figure 6** of Fichot and Benner (2012), thus indicating possible effects of photobleaching for the VDG-dry season samples.

The magnitude of the particulate backscattering coefficient varied broadly [0.01 < bbp(555) < 0.73 m−<sup>1</sup> ; N = 40; **Table 2**]. Its associated spectral slopes (0.2 < γ < 1.9 m−<sup>1</sup> ) were well within the range reported by previous studies (Snyder et al., 2008; **Figure 5C**) but bbp(555) slopes did not feature any seasonal or spatial variations (γ∼ −0.6 m−<sup>1</sup> ; **Table 3**). While bbp(555) was evidently strongly correlated with bp(555) (R <sup>2</sup> = 0.84; N = 37; p < 0.001; **Table 5**), there was also a significant correlation between bbp(555) and TSS (R <sup>2</sup> = 0.53; p < 0.001; N = 36; **Figures 7A,B**; **Table 5**). Overall, the mass-specific non-algal particulate coefficient at 440 nm, aNAP∗(440), varied between 0.002 and 0.035 m<sup>2</sup> .g−<sup>1</sup> with a mean value of 0.025 m<sup>2</sup> .g−<sup>1</sup> during the wet season decreasing to 0.010 m<sup>2</sup> .g−<sup>1</sup> during the dry season. The wet season average is comparable to other coastal waters around the world such as that found in French Guiana (i.e., 0.023 m2 .g−<sup>1</sup> ; Loisel et al., 2009) but it is much lower than that of the North Sea (0.033 m<sup>2</sup> .g−<sup>1</sup> ; Babin et al., 2003a) for instance; the extremely low dry season average confirms that the dry season samples are largely dominated by inorganic particles, which is in accordance with our hypotheses of a NAP-dominated system at that time of the year. A significantly strong correlation between aNAP<sup>∗</sup> (555) and bbp<sup>∗</sup> (555) and bbp ∗ (555) (R <sup>2</sup> = 0.77, p < 0.001, N = 36) is shown in **Figure 7C**. Such strong relationship between the two parameters was previously reported in the Great Barrier Reef, albeit at a different wavelength [aNAP<sup>∗</sup> (440) & R <sup>2</sup> = 0.82; see Figure 13 of Blondeau-Patissier et al. (2009)], and in the Gulf of Mexico (D'Sa et al., 2007).

Temporal and spatial differences were found in the beam attenuation coefficient, c555, which was significantly higher during the dry season (p < 0.05; N = 33), particularly in the Gulf (VDGdry: 3.33 ± 1.42 m−<sup>1</sup> ; p = 0.01; N = 11), compared with the wet season (VDGwet:1.61 ± 0.89 m−<sup>1</sup> ; N = 15) and DH (1.91 ± 1.08 m−<sup>1</sup> ; N = 22; **Figure 6**; **Table 2**). In the VDG, independent of the season, the total absorption at 555 nm, aTot(555), was a negligible portion of c<sup>555</sup> (4 ± 1%; N = 26), while in DH it played a more important role (15 ± 8%; N = 22). The beam

attenuation also displayed a strong relationship with bbp(555) (R <sup>2</sup> = 0.80; p < 0.001; N = 36; **Figure 6A**). While Secchi depths were found to be relatively correlated with c(555) (R <sup>2</sup> = 0.43; N = 47; p < 0.005; **Figure 6C**), c(555) was not correlated with the particulate backscattering to scattering ratio bbpe(555) (<sup>N</sup> <sup>=</sup> 26; R <sup>2</sup> <sup>=</sup> 0.20; <sup>p</sup> <sup>&</sup>gt; 0.1; **Figure 6B**). This latter parameter, bbpe, has been linked to the composition of the particle assemblage (e.g., Loisel et al., 2007), from the particle size distribution to its refraction index (e.g., Twardowski et al., 2001; Boss et al., 2004). For our dataset, bbpe(555) surface values varied three-fold (0.02–0.07; N = 36; **Figure 7D**; **Table 2**). We found a significant difference in bbpe(555) between locations and seasons (**Tables 2, 3**). Although, its mean value of 0.03 was above that found by Whitmire et al. (2007) (i.e., 0.01 <sup>&</sup>lt; <sup>b</sup>bpe(555) <sup>&</sup>lt; 0.02) over various coastal, oceanic, and freshwater environments of the US, it was in accordance with measurements found by McKee et al. (2009) in the shallow, macro-tidal estuary of the Bristol Channel where <sup>b</sup>bpe(532) was found to reach up to 0.07. Albeit at a different, shorter wavelength, the bulk of their bbp<sup>e</sup> measurements had a bi-modal distribution at 0.01 and 0.03. Spatially, bbpe(555) was found to differ (p < 0.05; N = 23), increasing from DH (0.027 ± 0.005; N = 12) to the VDG (0.033 ± 0.009; N = 11; **Figures 2D**, **7D**). The scattering-to-attenuation ratio, b/c(555), averaged 0.91 ± 0.08 (N = 48) (**Figure 9A**), with the scattering coefficient, b(555), contributing >90% (N = 26) of the beam attenuation coefficient in the VDG independently of the seasons (**Figures 6**, **9D**). In DH however, this contribution decreased to 79% during the dry season sampling in 2012 (N = 14) but was equivalent to that of the VDG (>90%) during the dry season sampling in 2013 (N = 8).

The relative contributions of phytoplankton, NAP and CDOM absorptions to the total absorption budget for our samples (N = 49) are displayed in **Figure 8**. Overall, the waters of this region were largely (60%) dominated by CDOM during the wet season and mostly (>45%) by NAP during the dry season, with phytoplankton contributing very little to the total absorption at

440 nm (**Figure 8D**). The combined absorption by NAP and CDOM generally exceeded 70% of the total absorption from 400 to 620 nm. The second highest combination was CDOM and phytoplankton (>60%), while NAP and phytoplankton contributed 50–60% for the same wavelengths. The median contribution of phytoplankton at 440 nm was 18% overall (**Figure 8D**), increasing to >55% at 665 nm. Darwin Harbour showed a mixed CDOM and NAP assemblage.

The particulate absorption coefficient, aP(440), was found to be equally correlated with both TSS (R <sup>2</sup> = 0.30; p < 0.001; N = 51; **Figure 9B**) and Chl-a (R <sup>2</sup> = 0.28; p < 0.001; N = 50 not shown). A significant seasonal and local (VDG) variation was also found for aNAP/aP(440) (p < 0.05; N = 50) (**Figure 9C**; **Table 3**).

The quality of the optical closure between ac-s and filter pad total absorptions [aTot(λ)] was assessed at the selected wavelengths of 412, 440, 510, 532, and 676 nm for the VDG dry and wet season field campaign measurements. There was a very good agreement (R <sup>2</sup> > 0.9) between the two methods in the blue and green spectral regions [i.e., aTot (ac−s) = 0.96xaTot (Filters); R <sup>2</sup> = 0.99; N = 26 at 412 nm and aTot (ac−s) = 0.79xaTot (Filters); R <sup>2</sup> = 0.97; N = 26 at 532 nm] but the optical closure largely degraded in the near-infrared [i.e., aTot (ac−s) = 0.31xaTot (Filters); R <sup>2</sup> < 0.1; N = 26 at 676 nm]. The absolute relative errors between the two measurements increased from 3% in the blue to 16% in the green (532 nm) to 85% in the NIR. The correction for residual scattering in the reflecting tube of an ac-s (or ac-9) varies spectrally (Röttgers et al., 2013), while it is considered wavelength independent in the proportional correction method (as selected for this study). This may lead to high discrepancies (Leymarie et al., 2010; Pitarch et al., 2016) and although we acknowledge that the choice of the proportional correction method significantly affected our ac-s estimates in the NIR, no ac-s or bb-9 data beyond 555 nm was used in this study.

### Phytoplankton: Absorption Coefficients, Pigments, Cell Counts, and Nutrients

Phytoplankton cell sizes in DH were mostly (>50%) dominated by picoplankton (<2 µm). In the VDG, the dry season samples were dominated (>60%) by microplankton (>20 µm), and the wet seasons samples were a mixed population assemblage

(**Figure 10**). The two major carotenoid pigments, fucoxanthin and zeaxanthin, are diagnostic pigments for diatoms and cyanobacteria, respectively (**Figures 11E,F**). Overall, fucoxanthin was the most abundant pigment (60% of the 57 stations) and was mostly present in the VDG dry season samples (30%) (**Table 3**), while zeaxanthin characterized the DH samples (26%). As expected, fucoxanthin strongly co-varied with Chl-a (N = 57; R <sup>2</sup> = 0.70; p < 0.001; **Figure 11C**). From the phytoplankton cell counts (2013 dry season campaign only), we found that the dominance of diatoms increased spatially, from DH to the Gulf. For the VDG dry season samples, fucoxanthin/Chl-a ratios were typically one order of magnitude higher than those of zeaxanthin, confirming that diatoms were the most abundant phytoplankton group (>70%; N = 36).

In the VDG, Chl-c, an accessory pigment in diatoms, was found to be more abundant than Chl-b (53%), a pigment generally associated with e.g., picophytoplankton Prochlorococcus which was mostly present in DH (72%). The pigment ratio Chl-b/Chl-a was low in the waters along Melville Island and higher in Darwin Harbour, while Chl-c/Chl-a increased from the VDG coast to Melville Island and was higher in Darwin Harbour but lower in the VDG wet samples. The VDG dry season dataset was characterized by higher PSC (**Figures 11A–D**), while the VDG wet and DH samples were higher in PPC. The PPC/PSC ratio provides a photo-physiologic index for phytoplankton cells. Environmental stresses, such as high light or low nutrient availability, are usually associated with higher PPC, thus resulting in higher PPC/PSC ratios. A low PPC/PSC ratio would in turn be associated with phytoplankton cells receiving low light levels in high-nutrient surface waters. However, the PPC/PSC ratio also varies between phytoplankton taxonomic groups, as phytoplankton cells exhibit a variety of tolerances and adaptations to light and nutrient exposure. Overall, the VDG samples had lower PPC/PSC ratios (∼0.68) during the dry season than during the wet season (∼1.50), which is due to limited light penetration from high turbidity levels. In contrast, the PPC/PSC ratios were generally high (∼1.55) for the Darwin Harbour samples, likely due to a better light penetration through the water column in this system at the time of our sampling (**Figure 6**).

Frontiers in Marine Science | www.frontiersin.org

and (C) Secchi depth.

Cyanobacterium Trichodesmium sp. inhabits tropical waters, and in low wind stress conditions it produces large surface blooms that can be monitored from space (McKinna, 2014). Blooms of Trichodesmium sp. are known to occur in the region, although the dynamic coastal environment of the VDG may not favor its optimum growth, due to water column mixing. Trichodesmium sp. patches were seen at five stations over 3 days during the September 2013 field campaign, and high counts (>200 units/L) of cyanobacterial cells were found at one station in the eastern embayment of the Gulf, thus confirming previous findings from remote sensing observations of Blondeau-Patissier et al. (2014) that these cyanobacteria blooms occur regularly between August and October in the VDG-DH region.

The absorption coefficient of phytoplankton, aphy(440), was found to decrease from the coast to the center of the Gulf and was higher in the inner Harbour, but there was no significant difference in aphy(440) between the dry and wet season samples. There was a significant correlation between aphy(440) and Chla (R <sup>2</sup> = 0.55; p < 0.001; N = 50; **Figure 12B**; **Table 5**), and no significant difference was found between the relationship derived from our dataset and that of Bricaud et al. (2010) (p > 0.1) or Bricaud et al. (2004) (p > 0.1). There was no significant seasonal variation in aphy(440/676) (**Figure 12A**; **Table 3**), and no significant difference was found between the model derived for this dataset and that of Bricaud et al. (1995) (N = 50; p > 0.05), therefore inferring that any of these models can be used in the VDG and/or DH (**Figure 12**). A significant difference (p < 0.005) however, was found for aphy∗(440) between locations (**Figure 12C**; **Table 3**), thus reflecting the difference in phytoplankton composition between the VDG and DH.

Nutrients, in particular nitrate and ammonium, have the greatest potential to limit phytoplankton growth in coastal marine systems (Malerba et al., 2015). For our study regions, nutrient concentrations sampled during the dry season 2013 were much higher in the VDG in comparison to DH: mean ammonium and phosphate concentrations were almost twice those recorded in DH, while nitrate was three times greater in the VDG (**Figure 14**; **Table 4**). DH is a nitrogen depleted environment (Wolanski et al., 2006) and nitrate showed an increasing concentration gradient from the inner Harbour (∼0.06 µmol.L−<sup>1</sup> ; N = 5) to the outer Harbour (∼0.1 µmol.L−<sup>1</sup> ; N = 6; **Figures 1B, 14**), consistent with the principle that oceanic waters coming into the harbor are richer in nutrient content.

# Summary of Results

Poor correlation (R <sup>2</sup> < 0.2) was found between DOC and aCDOM(400) or salinity, as well as between salinity and aCDOM or Secchi depths, and between POC and TSS with Chl-a. A mild correlation (0.2 < R <sup>2</sup> < 0.4) was found between Secchi depths and tidal range or temperature or TSS, between aP(440) and TSS and Chl-a and also between aCDOM(440) and Chl-a. A moderate correlation (0.4 < R <sup>2</sup> < 0.6) was found between TSS and temperature, and between the tidal range and aNAP slopes. A moderate to strong correlation (0.6 < R <sup>2</sup> < 1.0) was found between Secchi depths and aNAP(440) and aTot(440). A very good optical closure was found between ac-s measurements and the filter pads for the total absorption coefficient [aTot(412–555);

FIGURE 6 | Beam attenuation coefficient (c555) as a function of (A) backscattering (bbp 555); (B) the backscattering-to-scattering ratio (bb 555);

R <sup>2</sup> = 0.9]. The backscattering bbp(555) and scattering bp(555) coefficients were highly correlated (R <sup>2</sup> > 0.8). DOC was found to be significantly different between seasons and locations. While CDOM was found to be higher during the wet season, TSS concentrations were found to be higher during the dry season in the VDG, especially for stations along the coast. DH (dry season) was composed of a mixed assemblage of CDOM and NAP. No data was available for the wet season months in DH. Chl-a and the backscattering coefficient bbp(555) did not show any statistical difference between seasons or locations, which is in contrast to c(555) and the backscattering ratio which varied seasonally and spatially.

# DISCUSSION

# A Complex Environment

### Effect of Local and Seasonal Forcing

The major processes controlling the optical properties in DH and the VDG are mainly the wind speed and direction during the dry season, and the tides year round. In addition, the river discharges in this region are controlled by seasonal rainfall and thus add to the complexity of this coastal system by increasing land-sourced CDOM delivery to the coastal waters. The coastal waters of Northern Australia have a tidal range that is amongst the largest in the world for a coastline facing an open ocean. For our dataset, the tidal range varied from 1 to 6 m. Stations located in DH (dry season only; N = 24) were mostly sampled during neap tides, with a mean tidal range of 2.05 ± 1.04 m. In the VDG however, the tidal ranges were 20% higher at the time of our dry season sampling (4.05 ± 1.15 m; N = 16) than during the wet season sampling (3.30 ± 0.78 m; N = 12), partly explaining the higher concentrations in TSS, from resuspension, found during the dry season. The strong tidal currents in combination with the complex, shallow bathymetry caused the high spatial variability observed in most of the surface measurements from site to site (e.g., **Figure 2**) as a result of localized, small scale up- and downwelling processes. Previous modeling studies have shown that Darwin Harbour's

ternary plot of their relative absorption at 440 nm (D).

hydrodynamics are driven mainly by tides, with the wind and seasonal river inputs playing somewhat smaller roles (Li et al., 2012). In particular, Li et al. (2014) reported that the dynamics of TSS in DH vary with the spring-neap tidal cycle, with the whole water column being well-mixed during spring tides. For our study, the DH stations were sampled during the clearest conditions (neap tides) and thus our DH dataset may not be the most representative of this environment. However, it provides an interesting comparison with the VDG that is further discussed in the Section Challenges for Remote Sensing of Water Quality in the NMR.

In addition to tidal movement, another physical forcing explaining the higher TSS concentrations in the VDG during the dry season are the prevailing south-easterly winds occurring at this time of the year. These trade winds, when acting in phase with the tidal currents, significantly enhance resuspension.

While the recorded wind speeds were very low during the wet season field campaign (<2 m.s−<sup>1</sup> ; W-SW), winds of up to 10 m.s−<sup>1</sup> occurred during the dry season sampling. Webster and Ford (2010) found that wind-induced waves in combination with tidal currents contributed to higher concentrations of sediment in Keppel Bay, a shallow embayment adjacent to the Fitzroy estuary in sub-tropical Queensland; In the North Sea, Hommersom et al. (2009) also found that TSS showed large short-term spatial variability due to the tidal energy, in combination with winds. These studies align with our findings that the combination of tidal energy, shallow depths, and strong winds is the main physical process controlling the increased TSS concentrations found during the dry season in the VDG.

The Alligator Rivers are directly connected to the VDG (**Figure 1A**), and their freshwater flows are substantial during the first 4 months of the year due to the highly seasonal rainfall. Their discharges, loaded with terrestrial material, are trapped within the coastal boundary layer, a body of turbid inshore water. Very little mixing occurs between the turbid coastal boundary layer and the rest of the Gulf. Hence we sampled much higher concentrations of sediment material along the coast in comparison to the middle of the Gulf (**Figure 2**). In addition the entire coastal area between DH and the Cobourg Peninsula (**Figure 1A**) is dominated by mangroves and tidal flats that are also known to play a key role in the sediment redistribution (Li et al., 2014). An explanation for the surprisingly low contribution of non-algal particles and TSS found for the

wet season samples in the VDG is the river runoff being largely filtered through most of the Kakadu National Park wetlands (20,000 km<sup>2</sup> ) prior to reaching the coast (**Figure 1A**; e.g., Pusey et al., 2015). The seasonal variability in river runoff also directly results in significant difference in salinity between seasons. The large amount of runoff reaching the coastal waters at the end of the wet season (∼April), which coincided with the timing of our VDG 2012 wet season field campaign (**Table 1A**), explains the lower salinity observed during this seasonal sampling. Terrestrial runoff plays a role in CDOM characterization, by changing its slopes and concentrations. Higher aCDOM(440) values were observed during the wet season but surprisingly, these concentrations were not found to be significantly different from the dry season. Neither was a relationship found between aCDOM and salinity, but it has to be stressed that the CDOM dataset for the wet season was limited (N = 10), with only four stations having salinities lower than 28 PSU. Further seasonal sampling, in particular during the wet season and in DH, is required.

Our results showed that there was, statistically, no spatial (VDG vs. DH) or seasonal (dry vs. wet) differences in Chla concentrations. But within the VDG however, there was an inshore-offshore Chl-a gradient: stations close to the Alligator Rivers and Mary River featured Chl-a from 1.3 up to 2.8 mg.m−<sup>3</sup> , steadily decreasing to 0.5 mg.m−<sup>3</sup> in the Dundas Strait (**Figures 1**, **2A**). This distribution, which was not necessarily reflected in the other properties (beside possibly CDOM; i.e., **Figure 2**), is associated with the nutrient stock being concentrated mostly along the coastline and in the eastern embayment of the Gulf (VDG) due to the presence of the boundary layer that allows phytoplankton to grow (**Figure 14**). In DH, a similar mechanism occurs whereby the upper harbor is characterized by a longer water residence time (exceeding 20 days; Williams et al., 2006) in comparison to the inner section (**Figure 14**). It takes up to 22 days for the nutrients located in the upper section of the harbor to reach the sea (DHAC, 2010).

Chl-a is an indicator of phytoplankton biomass and light availability is undeniably playing a central role in the phytoplankton production in this study region: while it was found that phytoplankton production is highest in the wet season (e.g., Blondeau-Patissier et al., 2014), it is limited during the dry season possibly because of the high turbidity from the water column mixing, generated by both the shallow bathymetry and strong, dry seasonal winds, which will limit light penetration. This as reflected by lower PPC/PSC ratios and lower Chl-a concentrations found in the center of the VDG during the dry season (**Figure 2A**). Wet season river discharges release large amounts of nutrients and, together with the release of CDOM, are likely to fuel the primary productivity (e.g., Burford et al., 2011).

Previous studies on subtropical estuarine systems of the South East Australian coast have shown the pronounced effect of shorttimescale variability in dissolved and particulate matter (particle size, composition) due to the spring-neap tidal cycle (Oubelkheir et al., 2006), stressing the need to take into account the tidal phase in the sampling strategy. Shi et al. (2011) demonstrated that the magnitude of the spring-neap tidal cycle on the variations of satellite-derived Kd(490) and TSS was comparable to the seasonal

ratios; (B) non-photosynthetic carotenoids/Chl-a ratios; (C) Chl-b to Chl-a ratios; and (D) Chl-c to Chl-a ratios. Fuxoxanthin and zeaxanthin as a function of Chl-a are shown in (E,F), respectively.

effect observed in their coastal region of SE Asia. The tidal range in their study region was comparable (∼3 m) to that observed during our study of the VDG. Oubelkheir et al. (2014) also emphasized the role of short-term processes, such as wind stress and tides, as key drivers of the dissolved and particulate material in the shallow and dynamic subtropical environment of Moreton

Bay (SE Queensland). In our study, the sampling strategy was primarily to ensure our stations had a large spatial distribution rather than to sample the small-scale temporal variability due to the tides, thus the tidal phase was not taken into account. This should be addressed in future field campaigns in this region, as the spring-neap tidal cycles affect both the properties sampled in the field and satellite-derived ocean color products.

### A System Driven by CDOM and NAP

Characteristic of coastal waters, most optical properties and some of the biogeochemical concentrations sampled during these three field campaigns cover a large range of variability (**Table 2**). While aNAP(440) (coefficient of variation = 252%), atot(555) (CV = 159%), and bbp(555) (CV = 167%) are the IOPs that varied most across seasons and locations, aphy(440/676) (15%) and Chla (45%) were amongst those parameters that varied the least. Our results highlight that the optical properties of our coastal system (VDG in particular) was mostly driven by NAP during the dry season and CDOM during the wet season, with little influence of phytoplankton on the total absorption budget at 440 nm. Sources of NAP include phytoplankton bio-products and non-algal detritus and we concluded that NAP in the VDG did not originate from phytoplankton but rather from sediment. This is supported by the weak correlation of NAP with both Chla (R <sup>2</sup> = 0.33; N = 55) and aphy (R <sup>2</sup> < 0.10; N = 52), while being significantly correlated with TSS (R <sup>2</sup> ∼ 0.50; N = 55). The important (∼84%) contributions of aCDOM(440) and aNAP(440) masked the contribution due to phytoplankton, which at ∼16% was below the contribution of aphy(440) to the total absorption budget (20–60%) obtained in European coastal waters by Babin et al. (2003b).

For our study, it was found that only ∼10% of the phytoplankton cells counted during the 2013 dry field campaign were viable, thus indicating that the productivity must be due to a contribution from the microbial population in this system. The UV spectroscopic characteristics of the DOC sampled in the VDG during the dry season of 2013 (low absorbance at 350 nm) are consistent with organic matter that has been subjected to bacterial degradation (**Figure 13**). Our results therefore suggest that the DOC of the dry season samples may have a different chemical character from the DOC delivered during the wet season. The size of the DOC fraction

in the samples remained stable, indicating possible shifts in microbial community structure, which would be expected in a tropical environment. Analysis of the FDOM from the EEM also suggested that during the dry season, the material was composed mostly (∼70%) of aromatic organic carbon (Weishaar et al., 2003) with a relatively constant spectral slope (0.029 ± −0.003), therefore inferring that only a small fraction of the DOC was derived from terrestrial sources (Fichot and Benner, 2012). The underlying assumption that CDOM properties can be inferred from DOC concentrations (Shanmugam, 2011; Hestir et al., 2015) was, however, not verified in this dataset. Other recent studies also reported the complete lack of relationship between DOC and CDOM (e.g., Nelson and Siegel, 2013) because the controlling factors for DOC and CDOM were different (Yamashita et al., 2013).

## Regional Differences during the Dry Season

The waters of the VDG were found to have a higher scattering component than the waters of DH, largely contributing to higher beam attenuation during the dry season in particular. This result is consistent with the higher TSS concentrations found in the VDG (dry) in comparison to DH (e.g., **Figure 2B**). The two environments were sampled over different tidal conditions, as the sampling in DH only occurred during neap tides. The fetch length—much larger in the VDG—and the seasonal forcing are most likely explaining the significant differences between the two locations. The VDG is ∼130 km wide from the shore to the tip of the Cobourg Peninsula (**Figure 1**), thus offering much more ocean surface for the wind to generate sea surface roughness and resuspension. The harbor in comparison, is 15 km wide, mostly protected from the wind and its entrance is the section the most influenced by incoming waves from the ocean.

There were distinct differences in phytoplankton pigments, composition and cell counts between the samples collected in the VDG and those collected in DH (**Figure 1**). From cell counts and identification during the 2013 dry season field campaign, diatoms were present at 23 of the 33 stations sampled in the VDG during the dry season. This microphytoplankton dominance was also observed by Burford et al. (1995) in the neighboring Gulf of Carpentaria, suggesting possible similarities between the two gulfs; picophytoplankton however, was predominantly present in DH. It can be inferred that at the time of the 2013 field sampling, the higher number of cyanobacteria cells indicates a marine influence in DH. There was a low number of phytoplankton cell counts for most samples (∼250 cells/L) overall, possibly due to the low productivity at this time of year (e.g., Blondeau-Patissier et al., 2014). In contrast, a greater diversity in phytoplankton cells was found in the Gulf. This is explained by the higher levels of nutrients being available in the VDG while DH's primary production is limited, mostly by nitrogen (Burford et al., 2008; **Figure 14**; **Table 4**). Differences in nutrient concentrations between the two locations was expected because the oceanic and VDG waters are known to be richer in nutrients in comparison to DH (Wolanski et al., 2006). Nitrate and phosphate concentrations are relatively low in the VDG (**Table 4**) but silicate is mostly present, explaining the large dominance of diatoms in the VDG (Condie and Dunn, 2006).

CDOM was not found to be different between the two locations, yet it has an inshore-offshore pattern (**Figure 2C**) with an evident decrease from the inner (0.17 m−<sup>1</sup> ) to the outer (0.08 m−<sup>1</sup> ) harbor for DH. We hypothesize that during the wet season, the larger amount of CDOM delivered from river runoff would compare differently to that of DH, as the amount discharged into the Gulf (e.g., East Alligator River discharge: 7 GL/yr) is much larger than the amount of CDOM that would discharged in DH (e.g., Elizabeth River discharge: 3 GL/yr). The relative contributions of the three absorption components, namely aphy(λ), aNAP(λ), and aCDOM(λ), were investigated in this study and showed that DH was mostly characterized by approximately equal contributions of aNAP(λ) and aCDOM(λ), while aNAP(λ) was predominant in the VDG. The role of SNAP is poorly understood but this parameter is often found to be stable across marine environments (Matsuoka et al., 2011). This was not the case in this study as SNAP(λ) was significantly different between the two regions with significantly higher slopes in the VDG. This difference is likely related to the

particle assemblage characterizing the two regions, also reflected in bbpe(555). In contrast, the dominance of organic material during the wet season months is consistent with an enhanced

TABLE 4 | Nutrient budget for the two regions (in µmol.L−<sup>1</sup> ) as sampled during the 2013 dry season field campaign.


phytoplankton productivity, as reported in Blondeau-Patissier et al. (2014). In the Gulf of Carpentaria, Burford and Rothlisberg (1999) also reported higher integrated primary production (∼955 mg.C.m−<sup>2</sup> .day−<sup>1</sup> ) during the wet season months, therefore supporting this finding. Both bp555 (R <sup>2</sup> = 0.97; N = 48) and bbp 555 (R <sup>2</sup> = 0.80; N = 36) were found to strongly co-vary with c<sup>555</sup> in DH and VDG, with the high b/c ratio (>90%) emphasizing that standard ocean color algorithms would likely fail in these waters, hence supporting the necessity to use a regional algorithm with an (S)IOP-based parameterization for the derivation of accurate water quality products in VDG and DH. **Table 5** provides a list of selected relationships between key (S)IOP and concentrations.

# Challenges for Remote Sensing of Water Quality in the NMR

The effect of tidal currents on the spatial variability of suspended sediments can easily be observed from space using satellite observations. **Figure 15** shows the two MODIS-Aqua (NASA) images covering the study region at a 250-m resolution during the 2013 dry season field campaign. Tidal currents are stronger during spring tides resulting in a more contrasted spatial distribution of suspended sediments, in comparison to neap cycles. This highlights yet another challenge in these highly dynamic environments: the scaling difference—while the satellite integrates over a larger area, the ground measurements represent point observations and their direct comparability, with respect to the satellite observations, may be questioned.

A previous study on the dynamics of phytoplankton blooms in the VDG from the MERIS mission (Blondeau-Patissier et al., 2014) found that an increase in TSS was occurring predominantly during the dry season. This observation is supported by the findings of the present work (**Tables 2, 3**). MERIS sensor estimates of dry season TSS concentrations from the 2014 satellite study were much lower (4 ∼ 10 mg.L−<sup>1</sup> ) when compared to the results from these field samples. A total of 10 stations (with associated biogeochemical measurements) from the dry (N = 4) and wet (N = 6) season field trips in the VDG were located within the water mass cluster selected for the satellite study (see Figure 5B of Blondeau-Patissier et al., 2014). For these station locations, in situ TSS was found to increase from an overall average of 4 mg.L−<sup>1</sup> during the wet season to 30 mg.L−<sup>1</sup> during the dry season. This ∼10-fold increase in TSS concentrations between seasons was observed in the MERIS satellite study, but it is important to recall that only few (N = 4) dry season stations in the VDG were used for this comparison. Conversely, in situ Chl-a concentrations weakly increased from

TABLE 5 | Summary of selected relationships.


The number of samples, N, varies due to quality control.

0.6 to ∼0.75 mg.L−<sup>1</sup> . We can only interpret this comparison with caution for at least two reasons: first, because of the limited in situ dataset used for this exercise, and second because of the seasonal dominance of CDOM and NAP in this system which will inevitably hamper accurate satellite retrievals, of Chla in particular. This discrepancy also highlights the need to parameterize a region-specific remote sensing algorithm for the NMR. A seasonal parameterization, based on the wet and dry season in situ optical observations presented in this study, has been applied to MODIS-Aqua imagery of the VDG (Schroeder et al., 2015).

In recent years, ocean color remote sensing has provided a powerful means for studying ocean biogeochemistry and ecosystems over large spatial scales (Gardner et al., 2006; Tang, 2011; Swirgon and Stramska, 2015). Both the dissolved and particulate fractions of organic carbon can affect light penetration, and thus, optical properties may be used as proxies for DOC and POC (Pan et al., 2014), but satellite retrievals of these parameters may be a major challenge in the NMR because of their poor correlation with any of the variables measured. It is known that the use of optical proxies for satellite retrieval of POC is not straightforward because of the highly variable relationships between parameters (e.g., Cetinic et al., 2012 ´ ). In the VDG, the maximum POC concentrations were found to be much higher than in other coastal systems, such as Chesapeake Bay (Fisher et al., 1998). For this study, POC was found to be strongly correlated with Chl-a (R <sup>2</sup> = 0.40; N = 29) and less so with TSS (R <sup>2</sup> = 0.14; N = 29), probably due to the composition of both the POC and the phytoplankton community (Zhu et al., 2006; Wang et al., 2009; MacIejewska and Pempkowiak, 2014a,b).

Of particular interest from a coastal ecosystem-based management point of view is the development of ecological indicators from ocean color remote sensing, such as seasonal cycle of phytoplankton biomass, spatial distribution of phytoplankton types or the delineation of ecological provinces

FIGURE 15 | Large spatial variability of suspended sediment depending on the tidal energy: MODIS-Aqua 250-m resolution images during the dry season 2013 field campaign. (A) 12 September 2013 (spring tide) and (B) 14 September 2013 (neap tide).

(Platt and Sathyendranath, 2008). These require accurate in situ bio-optical measurements of phytoplankton absorption and chlorophyll to be used for remote sensing algorithm parameterization and validation. Long-term (>10 years) satellite-time series are now commonly used to assess trends in the productivity of coastal and ocean regions, and it is always stressed that well-calibrated ocean color sensors and algorithms are paramount to these estimates (e.g., Signorini et al., 2015). First and foremost, sea-truth measurements must be performed as closely as possible—both spatially and temporally—to the satellite observations if they are to be used for the validation of remote sensing algorithms. In highly dynamic coastal environments such as those of this study, it is recommended that in situ measurements be used only from samples collected within less than ±30 min of the satellite overpass to minimize bias (Martinez-Vicente et al., 2003). Independent of the match-up, the choice of the satellite algorithm, as well as its parameterization, is also an important factor. Aurin and Dierssen (2012) tested the performance of four semi-analytical algorithms at retrieving optical and biogeochemical properties in the complex waters of Long Island Sound (North Atlantic), a coastal system very similar to our region because of its comparable high proportion of CDOM and NAP and its low (<20%) phytoplankton contribution to the total absorption. The quasi-analytical algorithm (QAA; e.g., Lee et al., 2010) was found to perform better [in comparison to the three other models tested, namely, a linear matrix inversion (LMI)-type algorithm, GSM and C99] because both SNAP and SCDOM were relatively stable in space and time. This is not the case for SNAP in our study.

# CONCLUDING REMARKS

The continental shelf of the NMR is the widest on the Australian continent, it is also one of the most pristine coastal environments worldwide (Morrison and Delaney, 1996; Halpern et al., 2008). Currently, the Northern Territory has no specific coastal management legislation or coastal climate change policy in place. Catchments of this region and the coastal marine environment are highly connected, yet the VDG lacks a coastal environmental monitoring program extending beyond the boundary of the well-managed Kakadu National Park. Therefore, long-term monitoring of key bio-physical coastal water quality parameters of the VDG, from in situ and satellite observations, would be of benefit to the regional environmental management. But, to date, the lack of in situ optical datasets in the NMR has limited the derivation of regional ocean color satellite algorithms for this region.

The dataset presented in this study fills this gap and is the first collected in the tropical, optically complex coastal waters of VDG and DH. Results from this study generally show that these two regions are separate coastal environments with different optical characteristics. The VDG was found to be mostly dominated by CDOM during the wet season and NAP during the dry season, while DH has a mixed absorption budget. The phytoplankton populations are also different, with the VDG being characterized by bigger phytoplankton cells (diatoms) in comparison to DH, due primarily to differences in nutrient stocks (**Table 4**) and light availability. The strong, south-easterly winds and the tidal energy are a combination that increases water turbidity in the Gulf during the dry season, thus limiting light penetration and hence phytoplankton growth. This study did not allow for a detailed assessment of the seasonal effect on optical properties and concentrations in DH. Additional field observations during the wet season would be necessary to seasonally characterize this environment.

From a remote sensing point of view, algorithm developments should focus on the synergistic use of new geostationary satellites such as the recently launched (October 2014) Advanced Himawari Imager (AHI) onboard Himawari-8 (Japan Meteorological Agency, JMA) in combination with polar-orbiting sensors. Geostationary instruments offer far greater temporal imaging resolution—up to 10 min intervals using Himawari-8—and have therefore the potential to resolve the bio-optical variability due to semi-diurnal tidal cycles. The performance of a seasonally parameterized MODIS-Aqua algorithm at retrieving CDOM, NAP, and Chl concentrations in the VDG is evaluated in Schroeder et al. (2015). This regional algorithm is based on the aLMI approach of Brando et al. (2012) and the dataset presented in this study was used for its parametrization.

# AUTHOR CONTRIBUTIONS

DB and TS conceived the idea of the study, collected the samples, designed, organized, and participated to the 2012 wet and 2013 dry season campaigns; DB designed and collected the samples of the 2012 dry season campaign, designed and wrote the manuscript, created all the plots, and performed all the numerical data analysis; TS provided **Figure 15**; LC helped organize the logistic for all the field sampling, expertly analyzed the datasets of spectrophotometric absorptions and bio-geo concentrations in the laboratory, and provided comments on the manuscript; TS and VB provided valuable comments on all versions of the manuscript; DP contributed to the design of the 2013 dry season campaign, collected, and analyzed (a) the nutrient samples and (b) the phytoplankton cell counts, and contributed to the interpretation of the results related to phytoplankton and nutrient budgets; PF analyzed the EEM dataset and provided very valuable comments on the manuscript; DW provided the tidal heights modeled for each station and contributed to some of the interpretation of the results from his expert knowledge of the region; DD provided guidance on the IOP analysis and valuable comments to earlier versions of this manuscript; JA, NT, and MT participated, and helped organize the logistic of at least one of the three field campaigns, and expertly deployed the complex optical instrumentation necessary to collect the dataset used in this paper. All co-authors have approved the manuscript to be published.

# FUNDING

This research was funded by the North Australia Marine Research Alliance (NAMRA) and the Australian Government's National Environmental Research Program (NERP). VB was supported by the European Union (FP7-People Co-funding of Regional, National, and International Programmes, GA no. 600407) and the CNR RITMARE Flagship Project. In-kind support was provided by the CSIRO and the Australian Institute of Marine Science (AIMS).

### ACKNOWLEDGMENTS

The authors would like to thank Dr. Arnold G. Dekker (CSIRO), for valuable comments on the manuscript prior to its submission, Dr. A. Bricaud (LOV, Villefranche-sur-mer) for assistance with the selection of phytoplankton spectra suitable for this analysis, Dr. Ian Leiper (Charles Darwin University) for help with ArcGIS and some of the field sampling in Darwin Harbour, and Ms. Julie Haines (CSIRO Adelaide, Waite Campus) for the analysis of the DOC and POC samples. This research was funded by the North Australia Marine Research Alliance (NAMRA) and the Australian Government's National Environmental Research Program (NERP). VB was supported by the European Union (FP7-People Co-funding of Regional, National, and International Programmes, GA no. 600407) and the CNR RITMARE Flagship Project. In-kind support was provided by the CSIRO and the Australian Institute of Marine Science (AIMS). We also would like to thank Dr. Edward Butler (AIMS, Darwin Office) for facilitating our participation to the dry season field campaign aboard RV Solander. We thank the captain (Mr. Christopher Davis), cruise leader (Mr. Marcus Stowar) and the entire crew of the RV Solander (AIMS), as well as Mr. Steve Compain (Arafura Bluewaters Charters), for providing great working conditions and route flexibility during our field sampling. We also wish to thank Dr. Edward Butler (AIMS), Dr. Simon Townsend (AHU) and the technical services at CSIRO, Charles Darwin University, AIMS (Darwin office) and the Aquatic Health Unit (Northern Territory Government) for making their laboratory facilities available to us during this research: in particular Mrs. Rebecca Edwards and Mrs. Heidi Franklin (CSIRO), Mrs. Kirsty McAllister (AIMS), Mrs. Ellie

### REFERENCES


Hayward, Mr. Matthew Gray (Fieldwork Support, CDU), and Mrs. Julia Fortune, Mr. Matthew Majid (Fieldwork Support, AHU). A 1 month visit to the Laboratoire d'Océanographie de Villefranche (LOV, CNRS/UPMC) for collaborative work between the primary author and DD has helped us progress in the data processing required for this study. Travels from Australia to France were supported by a travel grant for early career researchers (Scientific Mobility Program 2013; French Embassy, Canberra, Australia), a proposal written by both DB and DD. We acknowledge the use of Rapid Response imagery from the Land, Atmosphere Near real-time Capability for EOS (LANCE) system operated by the NASA/GSFC/Earth Science Data and Information System (ESDIS) with funding provided by NASA/HQ. Finally, the authors are grateful to the two external reviewers, and the Specialty Chief Editor of Frontiers in Marine Science for providing constructive comments to the manuscript.

of the Environment 2011. Independent Report to the Australian Government Minister for Sustainability, Environment, Water, Population and Communities, DSEWPaC, Canberra, ACT, 410–411.


the Gulf of Carpentaria, Australia. Mar. Ecol. Prog. Ser. 118, 255–266. doi: 10.3354/meps118255


starvation length and ammonium inhibition on nitrate uptake. Ecol. Modell. 317, 30–40. doi: 10.1016/j.ecolmodel.2015.08.024


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Blondeau-Patissier, Schroeder, Clementson, Brando, Purcell, Ford, Williams, Doxaran, Anstee, Thapar and Tovar-Valencia. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Shellfish Aquaculture from Space: Potential of Sentinel2 to Monitor Tide-Driven Changes in Turbidity, Chlorophyll Concentration and Oyster Physiological Response at the Scale of an Oyster Farm

### Pierre Gernez <sup>1</sup> \*, David Doxaran<sup>2</sup> and Laurent Barillé<sup>1</sup>

<sup>1</sup> Mer Molécules Santé (MMS EA 2160), Université de Nantes, Nantes, France, <sup>2</sup> Laboratoire d'Océanographie de Villefranche (UMR 7093), Centre Nationnal de la Recherche Scientifique, UPMC, Villefranche sur mer, France

Edited by: Tiit Kutser,

University of Tartu, Estonia

### Reviewed by:

Emmanuel Devred, Fisheries and Oceans Canada, Canada Zhubin Zheng, Gannan Normal University, China

> \*Correspondence: Pierre Gernez pierre.gernez@univ-nantes.fr

### Specialty section:

This article was submitted to Ocean Observation, a section of the journal Frontiers in Marine Science

Received: 15 January 2017 Accepted: 25 April 2017 Published: 16 May 2017

### Citation:

Gernez P, Doxaran D and Barillé L (2017) Shellfish Aquaculture from Space: Potential of Sentinel2 to Monitor Tide-Driven Changes in Turbidity, Chlorophyll Concentration and Oyster Physiological Response at the Scale of an Oyster Farm. Front. Mar. Sci. 4:137. doi: 10.3389/fmars.2017.00137 The algorithms of Novoa et al. (2017) and Gons et al. (2005) were recalibrated and applied to Sentinel2 data to retrieve suspended particulate matter (SPM) and chlorophyll a (chl a) concentration in the environmentally and economically important intertidal zones. Sentinel2-derived chl a and SPM concentration distributions were analyzed at the scale of an oyster farm over a variety of tidal conditions. Sentinel2 imagery was then coupled with ecophysiological modeling to analyze the influence of tide-driven chl a and SPM dynamics on oyster clearance and chl consumption rates. Within the studied oyster farming site (Bourgneuf Bay along the French Atlantic coast), chl consumption rate mirrored the changes in chl a concentration during neap tides, whereas oyster clearance and chl consumption rates were both negatively impacted by high SPM concentration during spring tides.

Keywords: Sentinel2, ocean color, chlorophyll, turbidity, oyster, aquaculture, microphytobenthos, mudflat

# INTRODUCTION

One of the most striking features of the intertidal zone is the formation of microphytobenthos (MPB) biofilms at sediment surface during those low tides that occur in daylight (MacIntyre et al., 1996; Paterson et al., 1998; Jesus et al., 2009). In many mudflat MPB biofilms are visible from space, and they have been studied using airborne and satellite remote sensing (Méléder et al., 2003; van der Wal et al., 2010; Kazemipour et al., 2012; Brito et al., 2013). Although MPB main ecological functions are carried out when it is organized in the form of biofilms, benthic microalgae can also be resuspended into the water column together with other sedimentary particles throughout the tidal cycle (Koh et al., 2006; Ubertini et al., 2012). This can result in significant enrichment of nearshore waters with a high concentration of chlorophyll a (chl a) that becomes available food for suspension feeders such as the Pacific oyster Crassostrea gigas and other commercially and ecologically important bivalves (Kang et al., 2006; Choy et al., 2009). In coastal zones, despite the high contribution of tidal flats to primary production (Underwood and Kromkamp, 1999), the spatial distribution and temporal dynamic of chl a concentration in intertidal waters has been little studied using ocean color remote sensing so far.

In the optically complex and very diverse coastal zone, separating the contribution of chl a from other colored constituents [namely particulate inorganic matter (PIM), and colored dissolved organic matter (CDOM)] in the water column is notoriously difficult due to the rapidly changing concentrations of CDOM and PIM coming from sediment resuspension, river plume, and land runoff (Blondeau-Patissier et al., 2014). In turbid tidal flat and adjacent coastal areas, the main challenge arises from the difficulty to detect chl a from the high load of suspended particulate matter (SPM). In estuarine and nearshore waters, algorithms based on the analysis of the chl a absorption band in the near-infrared (NIR) spectral region around 675 nm were demonstrated to generally outperform other methods (Le et al., 2013).

Due to its spectral characteristics (namely the red and NIR spectral bands at 665 and 705 nm), we hypothesize that the Multi Spectral Imager (MSI) onboard Sentinel2 has the potential to quantify chl a concentration in turbid waters, provided that these waters are exposed to resuspension of benthic microalgae. Besides its relevant spectral characteristics, Sentinel2 also offers the advantage of high spatial resolution (20 m), making it possible to observe narrow bays and estuaries where shellfish farms are usually located. The first objective of the present study is therefore to analyze the potential of Sentinel2 for shellfish aquaculture monitoring, and more specifically to test the retrieval of SPM and chl a concentration in a turbid oyster farming ecosystem. The second objective is to analyze the tide-driven influence of SPM and chl a variability on oyster ecological response at the scale of an oyster farm. For that purpose, and building on previous studies (Gernez et al., 2014; Thomas et al., 2016), Earth Observation (EO) and shellfish physiological modeling were interconnected in order to remotely quantify the influence of rapidly changing environmental conditions on oyster clearance and chl consumption rates.

### MATERIALS AND METHODS

### Study Site

Bourgneuf Bay is a macrotidal bay along the French Atlantic coast, mostly constituted of mudflats, and widely used for shellfish aquaculture (oyster annual yield was 5,330 tons in 2010, Dessinges et al., 2012). In the present study, a focus was made on a shellfish farming site located at the northern limit of the oyster aquaculture zone (**Figures 1**, **2**). Due to tidal resuspension, SPM concentration seldom decreases below 50 g m−<sup>3</sup> and regularly exceeds 500 g m−<sup>3</sup> . As a too high SPM concentration impacts oyster clearance rate and other physiological functions (Barillé et al., 1997), oysters grown in this farming site are negatively impacted by high SPM concentration (Gernez et al., 2014). Daily mean chl a concentration was reported to vary between 4 and 14 mg m−<sup>3</sup> (Dutertre et al., 2009), and monthly means between 5 and 30 mg m−<sup>3</sup> were previously reported at the study site (Barillé-Boyer et al., 1997).

### In situ Data

Field data were acquired during two bio-optical cruises in Bourgneuf Bay from 08 to 12 April 2013 and from 12 to 13 April

2016 in the frame of the ANR GIGASSAT and FP7 HIGHROC projects, respectively. During both cruises, water sampling and radiometric measurements were performed following the same protocol. Sampling stations were located nearshore, mostly within the intertidal zone and in the vicinity of farming sites (**Figure 1**). Sampling took place at different times of the tidal cycle in order to acquire reflectance spectra over a wide range of SPM and chl a concentration. The same flat-bottomed barge was used during both cruises. This kind of vessel makes it possible to navigate throughout the shallow intertidal waters, even during low tide. Some stations were visited when the water depth was as low as 0.5 m. The bottom was never visible from above surface, even at the shallowest station due to the extremely high turbidity.

### Radiometric Data

Above-water radiometric measurements were conducted following standard protocols (Mueller et al., 2000) to determine the spectral water-leaving radiance reflectance (also commonly referred as the marine reflectance), ρw(λ), defined as:

$$\rho\_{\text{w}}(\lambda) = \pi [L\_{\text{u}}(\lambda) - \rho\_{\text{sky}}L\_{\text{sky}}(\lambda)] / E\_d(\lambda) \tag{1}$$

where L<sup>u</sup> (λ) is the upwelling radiance from the water and airsea interface measured at a zenith angle of about 37◦ , Lsky(λ) is the sky radiance, ρsky is the air-water radiance reflection coefficient, Ed(λ) is the above-water downwelling irradiance, and λ is the wavelength. The barge was oriented away from the sun to avoid shadowing effects. Radiance sensors were pointed at a solar azimuth angle between 90 and 135◦ . The radiometric data were acquired simultaneously during about 5 min of stable sky

conditions using three TriOS radiometers, two measuring the radiance signal and one measuring the downwelling irradiance.

A thorough quality control was made and only clear sky data were selected. Wave height was <0.5 m and wind speed was <5.0 m s−<sup>1</sup> during both cruises. The ρsky coefficient was taken as 0.02 following Austin (1974). The TriOS data were averaged over the time span of the measurement, smoothed over a 10 nm moving-window, cut within 400 and 900 nm (**Figure 3A**), and then spectrally downgraded at the resolution of the MSI onboard Sentinel2 using the spectral response function provided by the European Space Agency (**Figure 3B**).

### Seawater Samples

Seawater samples were collected just below the surface concomitantly with radiometric measurements. Seawater

samples were stored in 1 l bottles until they were filtered in the laboratory in the evening. The turbidity, T (in Formazin Nephelometric Unit, FNU), of each water sample was determined in triplicate using a 2100Q portable turbidimeter (Hach Company, Loveland, CO, USA) in order to optimize the volume of filtered seawater as in Neukermans et al. (2012). SPM, defined as the dry mass of particles per unit volume of seawater, was then determined using a standard gravimetric technique. Measured volumes of seawater (between 10 and 200 ml depending of the turbidity of the sample) were filtered through 25 mm diameter preweighed Whatman GF/F glass-fiber filters. At the end of filtration, sample filters were rinsed with deionized water to remove sea salt. The filters were frozen and shipped at the Laboratoire d'Océanographie de Villefranche (LOV). The dry mass of particles collected on the filter was then measured with a MT5 microbalance (Mettler-Toledo Intl. Inc.) with a resolution of 0.001 mg. A significant relationship between SPM and T was obtained (p < 0.01).

Depending on the turbidity, between 10 and 300 ml of seawater was also filtered through 25 mm GF/F filters for high performance liquid chromatography (HPLC) pigment analysis. The filters were frozen in liquid nitrogen and shipped for analysis at LOV, where HPLC analysis was performed according to Ras et al. (2008). The total chlorophyll a (chl a) concentration was computed as the sum of the "true" chlorophyll a, divinylchlorophyll a, and chlorophyllide a.

# Bio-Optical Algorithms

### Chlorophyll a Algorithm

Several algorithms are available for Sentinel2/MSI (Beck et al., 2016; Toming et al., 2016) to retrieve chl a concentration from ρw(λ) in coastal waters. For our study site, an intercomparison exercise based on in situ measurements demonstrates that the chlorophyll-retrieval algorithm of Gons et al. (2005) provided the most satisfactory results (see Supplementary Information for more details). This algorithm was originally developed for the Medium Resolution Imaging Spectrometer (MERIS) using the bands at 665, 705, and 775 nm (Gons, 1999; Gons et al., 2002, 2005). It was applied here to Sentinel2/MSI using bands B4 (665 nm), B5 (705 nm), and B7 (783 nm). Due to the shift from 775 to 783 nm, a recalibration has been performed to update the algorithm to Sentinel2/MSI. The chlorophyll-retrieval is done in three steps. First, the backscattering coefficient (b<sup>b</sup> ) is estimated from ρ<sup>w</sup> at 783 nm:

$$b\_b \text{(783) } = 1.56 \,\rho\_w \text{(783)/[0.082 - 0.6 \,\rho\_w \text{(783)]} \text{} \tag{2a}$$

Note that Equation (2a) is specific to Sentinel2/MSI, and replaces the original equation for MERIS:

$$\left[b\_{b}\text{(775)}\right] = 1.61\,\rho\_{\text{w}}\text{(775)}\left[0.082-0.6\,\rho\_{\text{w}}\text{(775)}\right] \tag{2b}$$

Second, the phytoplankton absorption at 665 nm is retrieved from a NIR/red band ratio:

$$a\_{\rm ply} \text{(665) } = \text{(0.70 + b\_b)} \rho\_w \text{(705)} / \rho\_w \text{(665) } - 0.40 - b\_b^p \quad \text{(3)}$$

where p is a unitless tuning parameter. Third, chl a concentration is computed by division with the chlorophyll-specific absorption coefficient at 665 nm, a ∗ phy(665):

$$[\text{chl } a] \;=\; a\_{\text{phy}} \text{(665)} / a\_{\text{phy}}^\* \text{(665)} \tag{4}$$

It is assumed in Equation (3) that b<sup>b</sup> (λ) is spectrally neutral between 665 and 783 nm, and that at 665 nm the absorption by chlorophyll a and by pure seawater is much higher than the absorption by mineral particles and CDOM.

The parameters a ∗ phy(665) and p were initially estimated using a large dataset of field measurements from diverse inland, estuarine and coastal waters (Gons, 1999; Gons et al., 2002, 2005). For our study site in Bourgneuf Bay, a ∗ phy(665) was recalibrated to 0.133 m<sup>2</sup> (mg chl a) −1 (standard error is 0.002 m<sup>2</sup> (mg chl a) −1 ) and p to 1.02. The recalibration was done using a fitting procedure based on a root mean square error minimization (**Figure 4**).

FIGURE 4 | (A) Linear regression between the measured and simulated chlorophyll a concentration, obtained from in situ measurements. The thick line shows the fit between 4 and 52.5 mg m−<sup>3</sup> . The fit was used to calibrate the Gons et al. (2005) algorithm. (B) Linear regression between the measured and simulated suspended particulate matter concentration, obtained from in situ measurements. The thick line shows the fit between 10 and 700 g m−<sup>3</sup> . The fit was used to calibrate the Novoa et al. (2017) algorithm.

### SPM Algorithm

The SPM concentration was computed using a multi-conditional algorithm previously developed for Bourgneuf Bay and the Loire estuary (Novoa et al., 2017). This algorithm has been validated for the Operational Land Imager (OLI) onboard Landsat8 (Novoa et al., 2017). A spectral recalibration has been performed here so that the algorithm could be applied to Sentinel2/MSI using bands B4 (665 nm) and B8A (865 nm). The algorithm is based on a switching method that automatically selects the most relevant SPM vs. ρ<sup>w</sup> relationship to avoid saturation effects at high turbidity. The final SPM concentration is computed as a dynamic combination of SPM retrievals in the red and NIR bands:

$$\text{[SPM]} = \alpha \text{[SPM]}\_{\text{red}} + \beta \text{[SPM]}\_{\text{NIR}} \tag{5}$$

where [SPM]red, [SPM]NIR, α, and β are defined as:

$$[\text{SPM}]\_{\text{red}} = 297 \rho\_{\text{w}} \text{(665) } / \left[ 1 - \rho\_{\text{w}} \text{(665)} / 0.1238 \right] \tag{6}$$

$$[\text{SPM}]\_{\text{NIR}} = 4302 \rho\_{\text{w}} \text{(865) } / \left[ 1 - \rho\_{\text{w}} \text{(865)} / 0.2115 \right] \tag{7}$$

$$\alpha = \log[0.090/\rho\_{\text{w}}(665)] / \log(0.090/0.046) \quad \text{(8)}$$

$$\beta = \log[\rho\_{\text{w}}(665)/0.046] / \log(0.090/0.046) \quad \text{(9)}$$

Due to the shift from 655 nm (Landsat8/OLI) to 665 nm (Sentinel2/MSI) the coefficients used in Equation 6 were recalibrated for Sentinel2/MSI using a fitting procedure based on a root mean square error minimization (**Figure 4**). The coefficients used in Equations (7–9) are the initial values computed by Novoa et al. (2017).

### Satellite Data and Processing Atmospheric Correction

Ortho-rectified, geo-located, and radiometrically calibrated top-of-atmosphere (TOA) reflectance Sentinel2 images were downloaded in the SAFE format from the US Geological Survey web portal (https://earthexplorer.usgs.gov). A single scene can contain multiple granules (sub-tiles), but the USGS web portal makes it possible to directly download Sentinel2 data at granule level, thus reducing downloading and processing time. Sentinel2 TOA data was processed using the ACOLITE software (http://odnature.naturalsciences.be/remsem/softwareand-data/acolite) to derive the water-leaving radiance. This software proposes two options for the atmospheric correction (AC): (i) the NIR algorithm based on the assumption of spatial homogeneity of the red/NIR ratio for aerosol and marine reflectance (Ruddick et al., 2000; Vanhellemont and Ruddick, 2014) using Sentinel2 spectral bands at 665 and 865 nm, (ii) and the SWIR algorithm based on the assumption of zero waterleaving reflectance in the SWIR, using Sentinel2 spectral bands at 1,610 and 2,190 nm (Vanhellemont and Ruddick, 2015, 2016). ACOLITE establishes a per-tile aerosol type (or epsilon) as the ratio between the Rayleigh corrected reflectance in the two aerosol correction bands, for pixels where the marine reflectance can be assumed to be zero (i.e., where ρw(665 nm) <0.005, as defined by Vanhellemont and Ruddick, 2014). The epsilon is then used to extrapolate the observed aerosol reflectance to the NIR and visible bands. For the SWIR algorithm, ACOLITE also provides a choice for aerosol correction using a fixed epsilon over the region of interest (ROI), or a per pixel variable epsilon.

As the NIR AC option is not adapted to turbid waters (Vanhellemont and Ruddick, 2015), we used here the SWIR AC option with a fixed epsilon over the ROI, as recommended by several authors (Van der Zande et al., 2016; Novoa et al., 2017; Tristan Harmel, personal communication). The ROI was taken as Bourgneuf Bay and the Loire estuary (i.e., longitude from −2.35 to −1.95◦E, and latitude from 46.85 to 47.35◦N). The atmospheric correction is then performed in two steps: (i) a Rayleigh correction for scattering by air molecules using a look-up table generated using 6SV (Vermote et al., 2006), and (ii) an aerosol correction based on the assumption of black water reflectance in the SWIR bands due to the extremely high pure-water absorption, and an exponential spectrum for multiple scattering aerosol reflectance. Due to the low signal in the SWIR wavelengths, a spatial smoothing filtering for these bands was performed (Vanhellemont and Ruddick, 2016)

The final output of ACOLITE software is the ρw(λ) data in Network Common Data Form (NetCDF). SPM and chl a concentrations were then computed from ρw(λ) using the R project for statistical computing (R Development Core Team, 2008).

### Selection, Clustering, and Sorting of Satellite Data

In intertidal waters, the spatio-temporal distribution of in-water suspended constituents is mainly driven by tidal dynamics. A total of 12 clear sky images was selected in order to observe the oyster farming site over a variety of seasonal, hydrological, and tidal conditions (**Table 1**). The time difference between satellite observation and low tide varied from <1 h to more than 5 h, thus providing a set of images acquired from low to high tide. The water height at the nearest reference harbor varied between 0.93 and 4.43 m, and the oyster farming site was observed over a variety of tidal configurations, from almost full emersion to complete submersion.

During the selected days of satellite acquisition the tidal range varied from 2.75 to 6 m, encompassing neap and spring tides (**Table 1**). The dataset was then divided in two subsets according to the tidal amplitude so that images were either clustered into neap tide (tidal amplitude <4 m) or into spring tide (tidal amplitude >4 m).

Irrespective of their acquisition date, Sentinel2 data were tidally sorted from ebb tide to flow tide according to the time difference between satellite observation and low tide, and to the water height at the time of acquisition. Sentinel2-derived SPM and chl a concentration maps were thus clustered in 2 composite tidal cycles, either representative of neap or spring tide. In order to investigate the influence of changes in SPM and

### TABLE 1 | Sentinel2 data used in the present study.


Tide information was taken from the service hydrographique et océanographique de la Marine (SHOM) web portal using Pornic (France) as reference harbor (http://maree.shom.fr/). All times are UT. Data acquired during neap and spring tides were used in Figures 5, 6, 9, and in Figures 7, 8, 10, respectively.

chl a concentration on oysters, several physiological functions were directly retrieved from satellite data, as described below.

### Simulating Oyster Physiology from Space

Oyster clearance rate was computed from SPM concentration as in Barillé et al. (1997) using a non-linear function response (see also Figure 3 in Gernez et al., 2014). Briefly, oyster clearance rate is constant and equal to 4.8 L h−<sup>1</sup> when SPM concentration is lower than 60 g m−<sup>3</sup> . Over this threshold the clearance rate is negatively impacted by the high turbidity. It follows a linear and decreasing trend between 60 and ∼200 g m−<sup>3</sup> , and exponentially collapses over ∼200 g m−<sup>3</sup> . This latter point corresponds to the saturation of the oyster gills (Barillé et al., 1997). The chlorophyll consumption rate, defined as the biomass of chl a consumed per hour, is then computed as the product of the chl a concentration by the clearance rate (Barillé et al., 1997):

$$\text{CONS} = \text{[chl } a\text{]} \cdot \text{CR} \tag{10}$$

where CR and CONS are, respectively the oyster clearance and chlorophyll consumption rates.

The clearance and chl consumption rates were simulated for each pixel of the Sentinel2 images using satellite-derived SPM and chl a data. In order to analyze the influence of the tidedriven chl a and SPM dynamic on oyster physiological response, composite tidal cycles of the clearance and chl consumption rates were also computed for the neap tide and spring tide clusters.

### RESULTS

### In situ Reflectance Spectra

In situ hyperspectral marine reflectance spectra show some typical characteristics of coastal turbid waters (**Figure 3A**). From 400 to 580 nm ρw(λ) is relatively insensitive to changes in SPM and chl a concentration, preventing the use of a blue to green ratio algorithm to derive chl a concentration. From 600 to 900 nm the changes in the magnitude and spectral composition of the marine reflectance are mainly driven by variation in SPM concentration, notably as a result of particulate scattering.

In the red and NIR spectral region, the ρ<sup>w</sup> spectra also display several features associated with the presence of pigmentbearing particles. An inflection in the reflectance slope is visible around 632 nm, attributable to the absorption by both chl a and chl c, a pigments association specific to diatoms (Méléder et al., 2005). The most striking feature is however the trough at 675 nm associated with chl a absorption, and the resulting reflectance shift between 675 and 700 nm, generally referred to as the NIR/red edge (Gons et al., 2002). Significant chl a concentration was confirmed by the analysis of HPLC data.

It most likely originates from the tide-driven resuspension of benthic microalgae resuspended together with surface sediments.

Though a significant loss of information results from the downscaling of the TriOS hyperspectral reflectance to S2/MSI spectral resolution, a NIR/red edge between 665 and 705 nm is still noticeable on several S2-simulated ρ<sup>w</sup> spectra (**Figure 3B**), making it possible to apply the Gons et al. (2005) algorithm in Bourgneuf Bay's intertidal waters.

## In situ Calibration of the SPM and Chl a Algorithms

The SPM and chl a in situ data acquired concomitantly with the ρw(λ) were used to calibrate the bio-optical algorithms using a fitting procedure (**Figure 4**). In situ SPM concentration ranged from 10.92 to 700.83 g m−<sup>3</sup> , with a mean of 146.53 g m−<sup>3</sup> . Over this range, a significant relationship was obtained between simulated and measured SPM concentration (p-value < 10−<sup>5</sup> , correlation coefficient of 0.96, a slope of 0.93 and an intercept of 0.32). For the SPM retrieval, the root mean square error was 56.26 g m−<sup>3</sup> .

The range of chl a concentration was from 3.97 to 52.51 mg m−<sup>3</sup> , with a mean of 18.48 mg m−<sup>3</sup> . In the initial algorithm of Gons et al. (2005), the chlorophyll-retrieval parameters were originally set to p = 1.05 and a ∗ phy (665) = 0.014 m<sup>2</sup> (mg Chl a) −1

using observations performed in a variety of inland, estuarine and coastal waters over a range of chl a concentration from 1 to 181 mg m−<sup>3</sup> (Gons, 1999; Gons et al., 2002). For the present study p was recalibrated to 1.02. A mean a ∗ phy(665) of 0.013 m<sup>2</sup> (mg Chl a)−<sup>1</sup> was obtained, a value consistent with previously reported specific absorption coefficients for the Saint Laurent Estuary (Bricaud et al., 1995), and for the Baltic and North seas (Babin et al., 2003). The comparison between the marine reflectance- and HPLC-derived chl a concentration was satisfactory (**Figure 4A**). The obtained linear regression shows a significant correlation coefficient of 0.93 (p-value < 10−<sup>5</sup> ), a slope of 1.02 and an intercept of −0.07. For the chl retrieval, the root mean square error was 3.05 mg m−<sup>3</sup> .

### Chl a Concentration within the Oyster Farm

The high spatial resolution (20 m) and spectral characteristics of Sentinel2 made it possible to quantify the distribution of SPM and chl a concentration at the scale of an oyster farm (**Figures 5**–**8**). Three main features characterized SPM and chl a spatial distribution in the shellfish farming site. First, both SPM and chl a concentrations are generally high, often exceeding 200 g m−<sup>3</sup> and 10 mg m−<sup>3</sup> , respectively. Second, their spatial structure depends on the bathymetry. SPM and chl a concentrations generally increase coastward, and the changes observed in their spatial distribution are more or less parallel to the isobaths (see for examples **Figure 6C** where the change in [chl a] from <5 to >10 mg m−<sup>3</sup> occurred around the 1 m isobath, and **Figure 7D** where [SPM] displayed a clear gradient coastward). Third, the tidal cycle is a significant driver of SPM and chl a dynamics. Due to the tidal resuspension of benthic microalgae together with the other particles of the sedimentary surface, the chl a distribution was generally associated with SPM, a common feature of intertidal mudflats (Koh et al., 2006). Spatial fronts of highest chl a concentration generally move seaward during ebb tide (**Figures 6A–C**, **8A–C**), and coastward during flow tide (**Figures 6D–F**, **8D–F**).

SPM and chl a concentrations exhibited similar spatial pattern during neap tide and spring tides, but the amplitude of the changes in their concentrations varied markedly between neaps and springs. For example, within the oyster farming zone SPM concentration varied from <10 to 300 g m−<sup>3</sup> during neap tides and from 50 to >1,000 g m−<sup>3</sup> during spring tides. Chl a concentration varied from <5 to 25 mg m−<sup>3</sup> during neap tides and from 10 to 40 mg m−<sup>3</sup> during spring tides.

The tide-driven variability was confirmed by the analysis of SPM and chl a concentration at the three selected fixed locations (black circled crosses in **Figures 5**–**8**). Both SPM and chl a concentration increased during ebb tide, reaching their maximum value during low tide and the start of the flow tide, and eventually decreasing during the end of the flow tide (**Figures 9A,C**, **10A,C**). The temporal correlation between SPM and chl a concentration is attributable to the tide-driven resuspension of surface sediments, which contain both mineral particles and benthic microalgae. As expected, the amplitude of the tide-driven changes in SPM and chl a was higher during spring than during neap tides.

The temporal changes in clearance and chl consumption rates were then analyzed in order to quantify the influence of the tide-driven SPM and chl a dynamic on the oyster physiological responses.

### Influence of SPM and Chl a Variation on Oyster Physiology

During neap tides the simulated chl consumption rate mirrored the changes observed in chl a concentration (**Figures 9C,D**). This is attributable to the limited negative impact of SPM concentration on the clearance rate (**Figures 9A,B**). Generally SPM concentration remained below 100 g m−<sup>3</sup> throughout the composite tidal cycle, except during flow tide where SPM concentration increased up to 175 g m−<sup>3</sup> due to the erosion of surface sediments by tidal currents. The clearance rate mostly fluctuated between 4 and 5 L h−<sup>1</sup> , and the decrease which occurred just after low tide was too small to counterbalance the increase in chl consumption.

During spring tides the simulated oyster physiological functions were more complexly affected by SPM and chl a tidal variability (**Figure 10**). First, a significant part the intertidal zone rapidly became emerged, and after mid-ebb the oysters could no longer filter seawater nor consume particles (see plain lines in **Figure 10**). In the waters just outside the intertidal zone, SPM concentration rapidly exceeded 200 g m−<sup>3</sup> due to the strong tidal currents occurring from mid-ebb to the end of flow tide (dashed and dotted lines in **Figure 10A**). Such high SPM concentrations are known to saturate the oyster gills (Barillé et al., 1997), and it resulted in the dramatic collapse of the clearance rate from midebb to the end of flow tide (**Figure 10B**). Meanwhile, the positive effects of the tide-driven increase in chl a concentration were rapidly counterbalanced by the collapse of the clearance rate, and the chl consumption rate dropped to zero from mid-ebb to the end of the flow (**Figure 10D**).

In summary, during neap tide, the tidal cycle in SPM concentration does not negatively impact oyster physiological response very much, and the low-tide increase in chl a concentration was directly translated into an increase in chl consumption for the farmed oysters. As most of the intertidal zone remains immerged during neap tides, the altitudinal location of the oyster farms has little influence on the tidal cycle of the clearance and chl consumption rate (**Figure 9**). During spring tides on the contrary, a significant fraction of the intertidal zone is emerged. In the waters adjacent to the intertidal zone and in the intertidal areas still under water, the high SPM concentration negatively impacts both clearance and chl consumption rate, whatever the chl a concentration, thus further limiting oyster physiological activity during a significant fraction of the tidal cycle (**Figure 10**).

# DISCUSSION

### Chl a Algorithms in Turbid Coastal Waters

The Gons et al. (2005) algorithm, recalibrated to Sentinel2/MSI, was fitted to in situ measurement (**Figure 4**) before being applied to Bourgneuf Bay. The fit was satisfactory, as demonstrated by the consistency of the value of a ∗ phy(665) with the literature (Bricaud et al., 1995; Gons et al., 2002; Babin et al., 2003). Other algorithms could have been used. Besides the Gons et al. (2005) method, various approaches were previously developed for turbid and/or eutrophic coastal and inland waters, including the 2-, and 3-band models (Dall'Olmo et al., 2005; Gitelson et al., 2008; Le et al., 2013), the fluorescence line height (FLH, Gower et al., 1999), the maximum chlorophyll index (MCI, Gower et al., 2005), and the 705 nm peak height (Toming et al., 2016). These algorithms are all based on a NIR/red edge either associated with chl a absorption at 675 nm or with sun-induced chl a fluorescence. An obvious limitation of these NIR/red algorithms is the lack of sensitivity in waters where the reflectance trough associated with chl a absorption around 675 nm is hardly pronounced (see for example the lowest ρw(λ) spectrum in **Figure 3**). In less eutrophic waters (i.e., chl a concentration smaller than ∼4 mg m−<sup>3</sup> ), the use of blue-green wavelengths would be more relevant to retrieve chl a concentration.

A recent study demonstrated that the 2- and 3-band models (Dall'Olmo et al., 2005) worked well with simulated Sentinel2/MSI-like imagery (Beck et al., 2016). In our algorithm inter-comparison (see Supplementary Information), the most performant method for our study site was the Gons et al. (2005) algorithm. Besides its good performance, another advantage of the Gons et al. (2005) algorithm is that the p and a ∗ phy(665) parameters seem to be relatively stable over a variety of coastal waters, including the numerous inland, estuarine, and coastal sites initially sampled by Gons (1999), and the intertidal waters of Bourgneuf Bay. Additional field data are however needed to assess the geographical robustness of a common set of parameters, as well as its seasonal stability.While the range of chl a concentration retrieved from Sentinel2 data in our study site was consistent with previous field measurements (Barillé-Boyer et al., 1997), the accuracy of the chl a concentration maps is more difficult to quantitatively appraise due to the lack of validation data. More in situ and match-up data are needed to improve the method.

### Advantages and Limitations of Sentinel2 for Aquaculture Applications

The retrieval of chl a concentration using a NIR/red algorithm is relevant in eutrophic waters, but at lower chl a concentration the accuracy would probably decrease. It is then generally advised to switch toward shorter wavelengths in the blue-green parts of the visible spectrum. As far as Sentinel2 is concerned, the lack of a spectral band at around 412 nm will certainly limit the performance of chl a inversion methods, as it has been long demonstrated that such a spectral band improves the deconvolution of chl a, PIM, and CDOM absorption (Carder et al., 1999). For sensors equipped with a spectral band at 412 nm such as MERIS, the ocean color 5 (OC5) algorithm (Gohin et al., 2002) has been recently recommended in recent intercomparison studies of the North West European (Tilstone et al., 2017) and Vietnamese (Loisel et al., 2017) coastal waters. As already indicated by Vanhellemont and Ruddick (2016), several other limitations specific to Sentinel2 can also arise, due to its relatively wide bands, low signal-to-noise ratio, and lack of vicarious calibration.

Despite these issues, Sentinel2 offers four main advantages for the remote-sensing of shellfish farming ecosystems. First, its high spatial resolution (20 m) made it possible to observe aquaculture sites located nearshore, in narrow bays and estuaries (Gernez et al., 2014), and to analyze within-farm spatial variability. Second, its relatively small revisit time (which is now 5 days since the launch of Sentinel-2B) increases the probability of acquiring cloud-free data over a given site. Sentinel2 acquisition frequency also limits subsampling and observation biases for the study of rapidly varying environments. There is no doubt that the Sentinel2 time-series will strengthen observation robustness and statistical descriptors of the very dynamic and changing coastal waters. Third, its SWIR spectral band facilitates atmospheric correction over turbid waters (Vanhellemont and Ruddick, 2015). Fourth, its spectral resolution in the red and NIR spectral regions made it possible to apply a variety of chlorophyll inversion algorithms (see previous section). Altogether, these characteristics represent a significant improvement for the remote sensing of turbid oyster farming ecosystems, and more generally for coastal zone observation.

### Shellfish Ecology from Space?

The combination of EO and shellfish physiological models opens new perspectives for aquaculture management, shellfish farming ecosystems studies (Gernez et al., 2014), and more broadly for a better understanding of the coastal ocean response to global changes. For example, the poleward extent of the Pacific oyster (a well-known invasive species, Herbert et al., 2016) along the European coasts has been recently quantitatively analyzed using an original coupling of EO with mechanistic physiological oyster modeling (Thomas et al., 2016). In another recent study the EO time-series archive has been used with climatic, biological and energetics models to better understand predicted changes in growth, reproduction and mortality risk for commercially and ecologically important bivalves in the Mediterranean Sea (Montalto et al., 2016).

Concurrently with the increase of EO aquaculture applications, the development of improved satellite products should not be neglected. The detection of phytoplankton species causing harmful algal blooms (HABs) is a major concern for fisheries and shellfish farming management (Sourisseau et al., 2016), and recent algorithm developments have proved useful to provide early warnings (Davidson et al., 2009) or statistical estimation of HAB-related risks (Kurekin et al., 2014). Enhanced characterization of the composition of the particulate assemblage could also be used to improve satellite-derived aquaculture products. For example, as oysters have the ability to preferentially select organic rather than mineral particles before ingestion (Barillé et al., 1997; Dutertre et al., 2009), estimation of the organic fraction of the particulate assemblage (Wozniak et al., ´ 2010) could be used to better constrain shellfish physiological models.

## CONCLUSION

In summary, it has been demonstrated that Sentinel2/MSI has the potential to map chlorophyll a and SPM concentration in turbid, chlorophyll-rich, intertidal waters. Sentinel2 high spatial resolution (20 m) made it possible to analyze SPM and chl a distribution at the scale of an oyster farm, thus opening new opportunities for aquaculture applications. The influence of the tidal dynamic on SPM and chl a concentration was highlighted, and its influence on oyster physiological response was analyzed in the shellfish farm and adjacent nearshore waters. During neap tides oysters were little influenced by the high turbidity, whereas during spring tides their clearance and chl consumption rates were significantly impacted by the extremely high SPM concentration during a significant fraction of the tidal cycle. This study confirms the potential of EO for marine spatial planning (Ouellette and Getinet, 2016), and offers a generic framework where the combination of high resolution satellite remote sensing with bivalves ecophysiological model makes it possible to explore the response of cultivated suspension feeders to environmental conditions in many coastal areas, and to optimize site selection for shellfish farming.

## AUTHOR CONTRIBUTIONS

All authors contribute to work design, data acquisition, and data interpretation. PG processed the data and wrote the manuscript. All authors gave their approval to the manuscript final version.

### FUNDING

This work has been supported by the "Programme National de Télédétection Spatiale" (PNTS, http://www.insu.cnrs.fr/pnts) in the frame of the TURBO project (grant n◦ PNTS-2015- 07), by the French Research National Agency (ANR) in the frame of the GIGASSAT project (grant n◦ ANR-12-AGRO-0001, http://www.gigassat.org/), by the European Union's Research and Innovation FP7 program in the frame of the HIGHROC project (grant n◦ 606797, http://www.highroc.eu/), and by the Tools for Assessment and Planning of Aquaculture Sustainability (TAPAS) project, a Horizon 2020 Research and Innovation Action funded by the European Commission (Grant agreement No: 678396, http://tapas-h2020.eu/).

### REFERENCES


### ACKNOWLEDGMENTS

The European Space Agency is acknowledged for the production and distribution of Sentinel2 data. The US Geological Survey is thanked for maintaining the earth explorer web portal (https://earthexplorer.usgs.gov). Morgane Larnicol and Stéfani Novoa are thanked for their participation to field work. The authors thank Quinten Vanhellemont and Kevin Ruddick for making the ACOLITE software freely available. The organizers of the CLEO workshop and special issue are thanked for their efforts to federate the ocean color scientific community. Emmanuel Devred and Zhubin Zheng are thanked for their comments.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fmars. 2017.00137/full#supplementary-material

using red and near-infrared bands. Remote Sens. Environ. 96, 176–187. doi: 10.1016/j.rse.2005.02.007


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Gernez, Doxaran and Barillé. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Regional Empirical Algorithms for an Improved Identification of Phytoplankton Functional Types and Size Classes in the Mediterranean Sea Using Satellite Data

Annalisa Di Cicco<sup>1</sup> \*, Michela Sammartino<sup>1</sup> , Salvatore Marullo1, 2 and Rosalia Santoleri <sup>1</sup>

1 Institute of Atmospheric Sciences and Climate, National Research Council (CNR), Rome, Italy, <sup>2</sup> National Agency for New Technologies, Energy and Sustainable Economic Development, Frascati, Italy

### Edited by:

Shubha Sathyendranth, Plymouth Marine Laboratory, UK

### Reviewed by:

Emmanuel Devred, Fisheries and Oceans Canada, Canada Jochen Wollschläger, University of Oldenburg, Germany

\*Correspondence:

Annalisa Di Cicco annalisa.dicicco@artov.isac.cnr.it

### Specialty section:

This article was submitted to Ocean Observation, a section of the journal Frontiers in Marine Science

Received: 16 January 2017 Accepted: 18 April 2017 Published: 19 May 2017

### Citation:

Di Cicco A, Sammartino M, Marullo S and Santoleri R (2017) Regional Empirical Algorithms for an Improved Identification of Phytoplankton Functional Types and Size Classes in the Mediterranean Sea Using Satellite Data. Front. Mar. Sci. 4:126. doi: 10.3389/fmars.2017.00126 Regional relationships to estimate the main Phytoplankton Functional Types (PFTs) and Size Classes (PSCs) from satellite data are presented. Following the abundance-based approach and selecting the Total Chlorophyll a (TChla) as descriptor of the trophic status of the environment, empirical relations between the TChla concentration and seven accessory pigments, marker for the main algal groups, have been developed for the Mediterranean Sea. Using only in-situ data acquired in this basin, firstly a previous regional diagnostic pigment analysis has been conducted to evaluate the specific pigment ratios featuring the phytoplankton assemblage that occurs in the Mediterranean Sea. Secondly, the new regional PFT and PSC algorithms have been calibrated and validated on the in-situ dataset. The statistical analysis showed a very good predictive power for all the new regional models. A quantitative comparison with global abundance-based models applied to our validation dataset showed that the regionalization improves the uncertainty and the spread of about one order of magnitude for all the classes (e.g., in the nano class, where the mean bias error improves from −0.056 to 0.001 mg m−<sup>3</sup> ). These results highlighted that a regionalization for the PSC and PFT estimates are required, to take into account the peculiar bio-optical properties of the Mediterranean Sea. Finally, the new regional equations have been applied to the Mediterranean TChla satellite (1998–2015) time series to estimate annual and monthly PFT and PSC climatology. The analysis of the climatological maps, relative to the phytoplankton assemblage distribution patterns, reveals that all the three size classes reach their maxima in the higher nutrient areas, with absolute values >3 mg m−<sup>3</sup> of TChla for micro-, and about 1.6 and 0.4 mg m−<sup>3</sup> for nano- and pico-phytoplankton, respectively. Moreover, the nano component shows intermediate percentage values in the whole basin, ranging from 30 to 40% of the TChla in the western basin, up to 45% in the more productive areas. In terms of chlorophyll concentration, in the coastal areas we find the predominance of the Diatoms and Haptophytes, while in the ultra-oligotrophic waters Prokaryotes predominates on the other groups, constituting the principal component of the pico-phytoplankton.

Keywords: PFTs, PSCs, Mediterranean Sea, regional algorithms, ocean color

# INTRODUCTION

Phytoplankton have a key role in the biogeochemistry of the Earth, with a predominant position in several ecological processes as climate regulation, food webs, fossil fuel formation, and correlated economic human activities (Falkowski et al., 2003). The biogeochemical functions performed by the phytoplankton assemblage are closely linked to its composition. Key groups of organisms with their specific biogeochemical metabolism mediate the elemental fluxes in the biosphere (Falkowski et al., 2003; Le Quéré et al., 2005). The composition and succession of various phytoplankton taxa in the community are also a mirror of the ecological status of the marine environment (Devlin et al., 2007). Within this context, in the recent years the scientific interest in the comprehension of the phytoplankton assemblage structure is surging. The methods for the identification of these organisms have strongly evolved, moving from single cell counting and taxonomic identification based on the traditional microscopic techniques to most recent approach based on remote sensing investigation (IOCCG, 2014).

Proper identification of Linnaean taxonomic species that compose a natural phytoplankton assemblage requires the use of multiple combined techniques. By using the classical optical microscopy only, which remains one of the best approaches for the identification of the largest phytoplankton cells, it has been ignored for years the smallest fraction of the phytoplankton, instead detectable through specific techniques such as flowcytometry, chemotaxonomy, epifluorescence microscopy, sizefractionation, and determination of chlorophyll a content with High Performance Liquid Chromatography, HPLC (Siokou-Frangou et al., 2010). A species-specific identification also requires great time demanding and needs of deep experience in the taxonomy knowledge (Reynolds, 2006). Nowadays, the systematic classification of phytoplankton at the level of phyla and of certain classes is well-established, with the agreement of microscopists and biochemicals (Reynolds, 2006). For several years, one of the most useful techniques for the algal classification at these taxonomic levels has been the HPLC. The liquid chromatography allows the separation, with resulting identification and quantification, of the main algal pigments, some of them considered markers for specific phytoplankton groups (see **Table 1**). The number of phytoplankton species is by far smaller than the terrestrial plants, but with a greater phylogenetic diversity, strictly related with the principal ecological functions (Falkowski and Raven, 1997; Falkowski

TABLE 1 | Diagnostic Pigments (DP) and their taxonomic meaning in microalgal divisions or classes (Jeffrey and Vesk, 1997; Prezelin et al., 2000; Vidussi et al., 2001; Wright and Jeffrey, 2006; Ras et al., 2008; Brunet and Mangoni, 2010).


\*Sieburth et al. (1978) Classification: Micro (>20 µm), Nano (2–20 µm), Pico (<2 µm)

<sup>a</sup>DP presents in lower concentration or in some types only (Jeffrey and Vesk, 1997); \*PSCs: grouping of the main taxa into size classes selected for this work (see text).

et al., 2003). Phylogenetic studies on the oxygenic phytoplankton evolution suggested three main recognizable lineages. The first, in the prokaryotic empire, consists of all the Cyanobacteria. The other two, within the eukaryotic algae, are divided in "green lineage," characterized by chlorophyll b as secondary pigment and by a small quantity of several carotenoids (Phyla Chlorophyta and Euglenophyta) and in "red-lineage" including Rhodophyta, pigmented with phycobiliproteins and a number of other algal groups characterized by chlorophyll c and a wide variety of carotenoids. These groups involve Cryptophyta, Heterokontophyta, Haptophyta, and probably those Dinophyta pigmented with peridinin (Delwiche, 1999; Falkowski et al., 2003; Reynolds, 2006).

In order to better understand the ecological systems and monitor the ecological status of marine environment, the main target is to identify the structures and processes that can explain ecosystem dynamics, linking descriptors of state to descriptors of change. Recent trends in the comprehension of the community structure and functioning are aimed to the research of those "functional traits" species-independent able to act as non-taxonomic "descriptors of community." Two of the most relevant taxonomic-free descriptors are the body size class and the functional group (Basset et al., 2004; Mouillot et al., 2006).

The definition of "functional group" is open to different interpretations, clustering phytoplankton on the base of various ecological roles and specialized requirements. This term groups species with similar "morphological and physiological traits and ecologies" (Reynolds et al., 2002): a functional group is composed by different species that, starting from the same resource or ecological component, perform a common ecological function (Blondel, 2003). On the basis of their biogeochemical metabolism or, farther, on the "resource" shared by the organisms, main taxonomic phytoplankton groups can be assembled in four specific "functional groups" (Blondel, 2003; Falkowski et al., 2003; Litchman et al., 2007; IOCCG, 2014): nitrogen fixers (this ability is unique to the Prokaryotes), calcifiers (including the taxonomical class of Haptophyceae, generally known as coccolithophores), silicifers (represented by the class of Bacillariophyceae, tipically known as diatoms, followed by some chrysophytes, silicoflagellates, and xanthophytes, which are not very widespread in the Mediterranean Sea), and Dimethylsulfoniopropionate (DMSP) producers (referred to some marine phytoplankton organisms belonging primarily to the group of Dinoflagellates, followed by Haptophytes).

The other important "taxonomic-free" descriptor is the "size." A great number of single organism and community characteristics depend, in a known manner, on individual dimension. The "metabolic theory" of Brown et al. (2004), closely links the performance of "individuals" in terms of metabolism and energy transfer efficiency to the ecology of "population, community, and ecosystems." There is a flow of energy and matter between the various ecological systems at different hierarchical scales, depending on environmental and individual characteristics that regulate the metabolism of the single organism and consequently, the features of each hierarchical level. According to this theory, body size, together with temperature and stoichiometry, is one of three key factors that affect individual metabolism and, consequently, the community ecology.

Although, also size measurements may be affected by uncertainties, especially at ecological "individual" level, morphometric, or "body size" descriptors offer however important advantages with respect to the taxonomic ones: cell size is simpler to measure in quantitative and reproducible way and overcomes the long times and great experience required for taxonomic identification (Basset et al., 2004; Mouillot et al., 2006). In the aquatic ecosystems, the role of the individual dimension as phytoplankton community descriptor is based on the relationship between size and pigmentary content, different taxa, or stages of growth in the same taxon, photosynthetic efficiency, bio-optical phytoplankton properties, and water column dynamic (Chisholm, 1992; Raven, 1998; Organelli et al., 2007). Raven (1998), in his important work "The twelfth Tansley Lecture. Small is beautiful: the picophytoplankton", summarizes the influences of the phytoplankton cell size on its photosynthetic activity and its role in biogeochemical cycling and biodiversity. Size affects, above all, maximum specific growth rate, photon acquisition, nutrient solute, and water fluxes across the plasmalemma and loss of cells in the euphotic layer (Chisholm, 1992; Raven, 1998). On the base of the different ecological hierarchical levels of investigation, it is possible to identify several specific morphometric descriptors. For individual levels, we have bio-volume, surface area, or surfacevolume ratio. Instead, for population and guild, we can consider body size-abundance distribution, body size-spectra, or biomass size fractions (Vadrucci et al., 2007). In the present work we take into account the biomass fractions of three Phytoplankton Size Classes (PSCs) related to the Sieburth et al. (1978) classification, micro- (>20 µm), nano- (2–20 µm), and pico- (<2 µm) phytoplankton and the main Phytoplankton Functional Types (PFTs).

In the recent years, several physical, biological, and ecological models have been proposed to estimate PSCs and PFTs from remote sensing data. Satellite technologies provide a great tool for a synoptic observation of the ecological state of the marine ecosystem at daily and global scale.

The most important current approaches used to detect dominant phytoplankton groups are designed for global application and are based on Brewin (2011) and IOCCG (2014): spectral-response, taking into account the specific optical signature of the different algal groups and deriving from both Apparent (AOPs) and Inherent (IOPs) Optical Properties (e.g., Ciotti et al., 2002; Sathyendranath et al., 2004; Alvain et al., 2005, 2008; Ciotti and Bricaud, 2006; Kostadinov et al., 2009; Pan et al., 2010, 2011; Roy et al., 2013; Navarro et al., 2014); phytoplankton abundance, based on the well-known inter-current relation between phytoplankton types and cell size and trophic status of environment (Chisholm, 1992; e.g., Devred et al., 2006; Uitz et al., 2006; Hirata et al., 2008, 2011; Brewin et al., 2010, 2011); ecological approach, in which additional ecological and physical information supports the ocean color data (e.g., Raitsos et al., 2008).

The applications of these approaches to the optical characteristics of Mediterranean Sea, featured by unique optical properties in the water column, with "oligotrophic waters less blue (30%) and greener (15%) than the global ocean" (Volpe et al., 2007), are very few. The great interest in the Mediterranean Sea rises from its peculiarities of quasi enclosed sea with dimension, morphology, dynamics, and external forcing that make it a "miniature model" for a better comprehension of the global ocean complex processes, from mesoscale to basin scale (Lacombe et al., 1981; Robinson and Golnaraghi, 1995; Siokou-Frangou et al., 2010). Only recently, Navarro et al. (2014) exploited the PHYSAT method of Alvain et al. (2005), based on an empirical correlation between normalized water leaving (nLw) radiances (AOPs) and diagnostic pigments of an HPLC global dataset, performing its regionalization for the Mediterranean Sea. The new PHYSAT-Med has been validated mainly for nanoeukaryotes, Prochlorococcus, Synechococcus, and diatoms and provides the dominant phytoplankton group for each satellite pixel. Furthermore, Sammartino et al. (2015) exploited the capability of a global empirical model, Brewin et al. (2011), solely based on chlorophyll a data, to describe the phytoplankton size biomass distribution in the Mediterranean Sea.

In this work, with the intent to investigate the composition of phytoplankton assemblage and its variability, we first analyze the relationship between chlorophyll a content and diagnostic pigment composition of phytoplankton assemblage in the Mediterranean Sea. Afterwards, following the global abundancebased approach and selecting the Total Chlorophyll a (TChla) as descriptor of the trophic status of the environment, we identified Mediterranean empirical relations between the concentration of TChla and seven accessory pigments considered diagnostic for the main algal groups (**Table 1**). This allows us to develop new regional algorithms for satellite biomass estimates of PFTs and size classes and assess their accuracy respect to global models. Finally, we applied these new regional algorithms to the 1998– 2015 TChla satellite time series to compute Mediterranean PFT and PFC climatologies.

The paper is organized as follows: second section presents the in-situ and remote "data and methods" selected for this work, also describing the diagnostic pigment analysis performed on the Mediterranean pigment dataset; in Section "Results," we present and validate new Mediterranean regional algorithms for the identification of PFTs and PSCs and compared them with the results obtained by applying two global models; at last, Section "Discussion and Conclusions" are exposed in fourth section.

# DATA AND METHODS

# In-situ Pigment Data and Quality Assurance

Diagnostic pigment data for the determination of the in-situ PFTs and PSCs come from a Mediterranean subset of the SeaWiFS Biooptical Archive and Storage System (SeaBASS) HPLC pigment in-situ dataset (Werdell and Bailey, 2005). Data were collected during different cruises and periodical activities of fixed mooring monitoring. More in details, this subset consists of data from Prosope cruise (1999, September–October), Boussole mooring data (with sampling activities nearly every month from 2001 to 2006 and only in July for the 2008) and Boum cruise (2008, July). It consists of 1,454 sets of pigments, including stations sampled in case 1 waters and in various trophic conditions. **Figure 1** shows the location of the whole SeaBASS Mediterranean in-situ measurements. We used all the in-situ data acquired in the first 50 m of the water column. Since these field samples were collected by several teams and were analyzed in different laboratories using a variety of HPLC instruments and protocols, we performed a quality assurance analysis to build up coherent combination of the data sets. At first, pigment data were visually checked in order to identify and remove suspected low quality values (for instrumental or clear stochastic errors). Then, we applied the Aiken et al. (2009) method to remove the outliers, according to Trees et al. (2000), which identified a strong log-linearity between TChla and accessory pigments.

The data outside of the 95% confidence interval were eliminated. Following Hirata et al. (2011) and Brewin et al. (2010) we performed a 5-point moving average to the raw data, sorted according to increasing values of TChla, to maximize the signal to noise ratio and underline the main trend of the data. The quality control reduces the useful measurements to 1,379, with values ranging from 0.02 to more than 5 mg m−<sup>3</sup> (well-representative of the Mediterranean chlorophyll a variability). Even if the in-situ dataset is predominantly collected in the western and central Mediterranean Sea while the eastern Mediterranean is less sampled, it includes a significant part of samples (38% of the total) that fall in the oligotrophic chlorophyll a range typical of the eastern basin, therefore our dataset can be considered representative of the entire Mediterranean trophic regimes.

# Determination of PFTs and PSCs from Pigment Composition: Diagnostic Pigment Analysis (DPA)

Information about the composition of phytoplankton assemblage in terms of "types" and "size classes" has been obtained from the analysis of cell's pigmentary content of in-situ samples, exploiting the diagnostic properties of some marker pigments.

Following Vidussi et al. (2001), according to previous works on chemotaxonomy (Wright and Jeffrey, 1987; Gieskes et al., 1988; Everitt et al., 1990; Williams and Claustre, 1991; Claustre, 1994; Jeffrey and Vesk, 1997), we take into account seven diagnostic pigments (DPs), able to detect the main phytoplankton types and to outline the size structure of the whole assemblage in the Mediterranean Sea. Some of these pigments are unambiguous markers, others typify a principal group (with a minor contribution of some other classes, see **Table 1**). In Vidussi et al. (2001) the identification of the PSCs is founded on the relation between taxonomic groups and their most common dimensions in the Mediterranean Sea. Although, the phytoplankton grouping method based on the auxiliary pigments does not exactly reflect the phytoplankton size such as the one based on the size fractionated chlorophyll (lacking for the Mediterranean Sea), nevertheless several investigations about the typical off shore composition of phytoplankton Mediterranean community have proven the validity of this approach (Vidussi et al., 2001; Siokou-Frangou et al., 2010). Therefore, it must be taken into account that, on the basis of Sieburth et al. (1978) size classification, "micro" consists of Diatoms and Dinoflagellates in general, nano includes Cryptophytes, Haptophytes, and some classes of Heterokontophytes, and picophytoplankton is referred to Cyanobacteria, green flagellates, and Prochlorophytes (**Table 1**). In this work we also applied the linear adjustment of Brewin et al. (2010) for the assignment of 19′ hexanoyloxyfucoxanthin (primarily marker of the Haptophytes), more traceable to pico size class rather than to nanophytoplankton in the ultra-oligotrophic waters (Hirata et al., 2008; Ras et al., 2008).

For the quantification of each type, a now well-established method is to estimate the contribution of different phytoplankton groups to the TChla of the whole assemblage on the basis of the pigment ratio of each marker to the TChla (Gieskes and Kraay, 1983; Gieskes et al., 1988; Barlow et al., 1993). Following this approach, Uitz et al. (2006), carried out a multiple regression analysis between the concentrations of TChla and the seven diagnostic pigments suggested by Vidussi et al. (2001), providing the best estimates of the "Total Chlorophyll a–Diagnostic Pigments" ratios (TChla/DPs) for a global data set. Applying this method, recently Di Cicco (2014) found a regional TChla–DPs relationship, based on Mediterranean data only, to evaluate the different pigment ratios of the phytoplankton assemblage that occur in this basin (Sammartino et al., 2015).

In this work, we revised this regional relationship defining new coefficients according to the new quality assurance applied to the SeaBASS data. The analysis is carried out on the 1,379 individual samples where TChla and all the seven selected biomarker pigments were available at the same time. It is important to underline that in accordance with Hooker et al. (2012) we defined TChla as the sum of Chlorophyll a with its allomers and epimers, Divinyl-Chlorophyll a, and Clorophyllide a (see **Table 1**).

**Table 2** presents the best estimates resulting from the multiple regression analysis for the determination of the seven Mediterranean TChla/DPs ratios. The coefficients for each DP with their standard deviation and significance level are shown. The regression is highly significant, with a determination coefficient (r 2 ) between the SeaBASS in-situ TChla and TChla estimated (TChla ∗ , **Table 3**) equal to 0.99, and a p < 0.001 (based on the t-test).

The final estimation formulas used for the in-situ quantification of each PFT and PSC fractions are schematically presented in **Table 3** (each group is expressed as fraction of TChla ∗ ).

## PSC and PFT Model Development

The in-situ dataset of PSC and PFT fractions resulted from the DPA has been randomly divided in two independent subsets, the first used for the model calibration (70% of the total data) and the remaining 30% for their validation. The existing co-variability founded between the accessory pigments linked to each fraction and the TChla allows the use of the latter as an index of the phytoplankton assemblage structure (Chisholm, 1992; Hirata et al., 2011). For each PFT and PSC group the relative in-situ fractions were regressed against the corresponding log10-transformed in-situ TChla concentrations




TABLE 3 | PSCs and PFTs used in this work with their in-situ estimation formulas (fraction of TChla\*, ranging from 0 to 1) resulted from the Diagnostic Pigment Analysis.

<sup>a</sup>Linear adjustment of Brewin et al. (2010) for the assignment of 19′ hexanoyloxyfucoxanthin to pico size class in the ultra-oligotrophic waters.

<sup>b</sup>The contribute of the But-Fuco is so low in Mediterranean data that Haptophytes can be considered the only component of the Nanoflagellates.

(**Figure 2**), considering the log-normal distribution of this pigment (Campbell, 1995). We used the ordinary least square fit to define the functional forms that were better appropriate to represent the Mediterranean data distribution. Different functional forms were tested against our calibration dataset, starting from linear equation to more complex polynomial or exponential function, obviously also including the functions adopted by global PFT and PFC models. This allowed us to select the most appropriate functional forms, corresponding to those that better minimize the residual between the estimates and the observations. This results in six empirical relationships obtained by the regression technique, while the other three are derived as difference to maintain the mass balance. To obtain the TChla concentration related to each PFT and PSC group is sufficient to just multiply the fraction for the in-situ TChla.

## Satellite Data and Processing

For the PSC and PFT determination from remote sensing we used the TChla Mediterranean reprocessed product available from Copernicus Marine Environment Monitoring Service (CMEMS, see OCEANCOLOUR\_MED\_CHL\_L3\_REP\_OBSERVATIONS\_ 009\_073 product). These data were produced by the CMEMS Ocean Color Thematic Assembling Centre (OCTAC) using the ESA OC-CCI (European Space Agency—Ocean Color Climate Change Initiative) processor. MERIS, MODIS-Aqua, and SeaWiFS observations were merged into a single data by applying a series of state-of-the-art algorithms, from the atmospheric correction to the band shift correction schemes (for a comprehensive overview of the ESA-CCI products see http:// www.esa-cci.org). Remote Sensing Reflectance (Rrs) spectrum is used as input to compute surface TChla (nominal resolution of 1 Km) via regional ocean color algorithm. The specific product used in this work, specialized for the Mediterranean Sea, is a merged Case 1—Case 2 product that takes into account the different optical properties of the offshore and inshore waters. Two different regional algorithms were applied on the reflectance: the MedOC4 algorithm (Volpe et al., 2007) for the case 1 waters, developed by the Group for Satellite Oceanography (GOS-ISAC) of the Italian National Research Council (CNR), and the AD4 (D'Alimonte and Zibordi, 2003), specialized for the case 2 ones. The exact identification of the two water types is performed by taking into account the whole light spectrum from blue to NIR bands for both two water types from in-situ data (D'Alimonte et al., 2003). For the waters with intermediate features, a weighted average of the two former algorithms was applied, based on the distance between the actual reflectance spectrum and the two reference reflectance spectra for case 1 and 2 waters, respectively.

For more details on the processing adopted by the data producers and the quality product assessment see Volpe et al. (2012) and http://marine.copernicus.eu/documents/QUID/ CMEMS-OC-QUID-009-038to045-071-073-078-079-095-096. pdf.

In this work, we used 18 years (from 1998 to 2015) of daily TChla to compute Mediterranean daily PFT and PSC maps using the new regional algorithms described in Section Empirical Algorithms for the Identification of the PFTs and PSCs: Calibration and Validation. Daily fields were then used to buildup Mediterranean PFT and PSC climatology. Taking into account the applicability range of our models (0.02–5.52 mg m−<sup>3</sup> ), in our processing we considered "good values" only the satellite TChla data falling in this range, masking the outsider.

in-situ PSC/PFT fraction obtained from equations in Table 3 (966 data) and the red line indicates the best fitting curve obtained from the calibration (see equations in Table 4). (A) Micro-Cal, (B) Nano-Cal, (C) Pico-Cal, (D) Diatoms-Cal, (E) Cryptophytes-Cal, (F) Green algae & Prochlorophytes-Cal, (G) Dinophytes-Cal, (H) Haptophytes-Cal, and (I) Prokaryotes-Cal.

# RESULTS

# Empirical Algorithms for the Identification of the PFTs and PSCs: Calibration and Validation

The new regional algorithms with their mathematical equations and the resulting regression coefficients are showed in **Table 4**. Most of the considered phyto-groups are well outlined by simple polynomial functions (cubic for micro, Diatoms, Cryptophytes, and Prokaryotes and quadratic for nano), except for the class of "Green algae & Prochlorophytes," better represented by a different equation following the approach of Hirata et al. (2011, see **Table 4**). These functions, shown in **Table 4**, are applicable over a TChla range from 0.02 to 5.52 mg m−<sup>3</sup> .

**Figure 2** shows the results of the algorithm calibration. Microphytoplankton function (**Figure 2A**) increases monotonically with the increase of the TChla, ranging from the 8% to the 63% of the TChla concentration. Pico equation (**Figure 2C**), instead, shows an opposite behavior, with minimum and maximum values for maximum and minimum TChla concentrations, respectively, ranging between 5 and 81% of TChla. The nano function (**Figure 2B**) presents an intermediate trend, ranging from 12 to 48% of TChla, with a maximum in correspondence of about 0.57 mg m−<sup>3</sup> of TChla. The micro component consists almost entirely of the Diatom group contribution, represented by a cubic function (**Figure 2D**) similar to the micro one, increasing monotonically with the TChla too. The contribution of the Dinophytes (**Figure 2G**) to the micro component and to the


TABLE 4 | Regional algorithms developed to estimate the PSCs and PFTs in the Mediterranean Sea (as fraction of TChla, ranging from 0 to 1). For each dimensional and functional group, the equation and its relative coefficients are given.

TChla concentration is very low, with a small range of variation between 1 and 6% of TChla, indicating that diatoms are the major constituent of the micro-phytoplankton in the Mediterranean Sea. Prokaryotes curve (**Figure 2I**) decreases monotonically from 55 to 3%, co-varying with the TChla increase. At lower chlorophyll concentrations Prokaryotes represents the main component of the pico group (with the contribution of small Haptophytes in the ultra-oligotrophic water). Increasing the chlorophyll value, the non-monotonic signal of the "Green algae & Prochlorophytes" (**Figure 2F**) grows up to the maximum value (about 13%) in correspondence of the TChla concentration of about 0.5 mg m−<sup>3</sup> . For higher value of TChla the function decreases with a weaker slope, concurring to the pico group more than to the Prokaryotes. Also the Cryptophytes (**Figure 2E**) co-vary with the TChla, growing up with the increment of this pigment from a minimum of 1% to a maximum of 22% at higher TChla values. At last but not least in terms of relative contribution to the TChla, the Coccolithophores curve (**Figure 2H**) presents a small range of variation (35–40%) for almost the entire range of chlorophyll, decreasing up to a minimum value about 10% at maxima TChla concentrations (about 5 mg m−<sup>3</sup> ).

The results of the application of the Mediterranean algorithms (**Table 4**) on the validation dataset are shown in **Figures 3**, **4** (right panels), for the PSCs and the PFTs, respectively. The scatter plots of the TChla, estimated for each class applying the algorithms against the observed TChla fractions, clearly show the goodness of the fits for all the considered groups. The data points are uniformly distributed around the 1:1 line with a very narrow scatter.

A more quantitative evaluation of the proposed algorithm performances comes from the computation of the mean absolute error (root mean squared error, RMSE) and other statistical parameters (see **Table 5** for the relative reference equations) with respect to the original PSC and PFT in-situ data (**Table 6**, calibration; **Table 7**, validation). A hindcast evaluation of the algorithm performances was also carried out (i.e., the same calibration data were used for fitting and testing). Furthermore, the error relative to the new regional algorithms is compared to the error associated with global abundance-based models applying them to the same validation dataset (**Table 7**). In particular, we used the empirical global relationships of Hirata et al. (2011), the only ones based on the abundance which are focused also on the PFTs, and the Brewin et al. (2010) models, applying the coefficients recalibrated in Brewin et al. (2011), developed only for the PSCs.

The new regional algorithms show good performances for most of the groups taken into account (**Table 6**, calibration; **Table 7**, validation). The results obtained applying the algorithms to the validation dataset are consistent with the hindcast evaluation. Pearson correlation coefficient, which gives an estimate of the covariance between the models and the in-situ validation data, shows high correlation, with values ranging from 0.75 to 0.99 both for PSCs and PFTs, excluding the group of the Dinophytes (r = 0.60), probably also because of this group is derived as difference. All groups show very low values of mean bias error (MBE), ranging from −0.002 to 0.003 mg m−<sup>3</sup> for the validation dataset. The RMSE, which gives a measure of the spread of the estimated values around the in-situ observed ones, goes from 0.018 mg m−<sup>3</sup> for the "Green algae & Prochlorophytes" to 0.068 mg m−<sup>3</sup> for the Diatoms in the PFT group and from 0.042 mg m−<sup>3</sup> for the pico- to 0.070 mg m−<sup>3</sup> for the microphytoplankton in the PSCs.

The comparison of the scatter plot obtained by the application of the PSC regional models with respect to the global models (**Figure 3**) shows that the Mediterranean algorithms perform better than the global ones for all the three groups. This evidence is confirmed by the statistical analysis (**Tables 6, 7**). Although, the Brewin et al. (2011) models applied to the Mediterranean data show high values of the correlation coefficient (0.9 for micro- and nanophytoplankton), the statistical results highlighted that the regionalization improves the uncertainty (MBE) and the spread (RMSE) of about one order of magnitude for all the size classes. For example, for the micro-phytoplankton the MBE decreases from 0.068 to 0 when the regional algorithm is applied. In particular, the Brewin's algorithm slightly overestimates the micro component (**Figure 3A**) in the entire dynamical range

of concentration. The behavior of this global model is exactly the opposite for the nano class, always underestimating the observed values (**Figure 3C**) resulting into a MBE of −0.056. As consequence, the main trend is the overestimation of the pico-phytoplankton component for concentrations lower than 0.1 mg m−<sup>3</sup> , and the underestimation at greater values (**Figure 3E**).

**Figure 4** shows that applying the global models of Hirata et al. (2011) to the Mediterranean data Prokaryotes (**Figure 4M**), Haptophytes (**Figure 4G**), and Diatoms (**Figure 4A**) would be underestimated. This underestimation results into a mean relative percentage difference, RPD, of −20, −19, and −29%, respectively. The "Green algae & Prochlorophytes" (**Figure 4I**), instead, are overestimated (RDP = 116%). The predictive power for the Dinophytes (**Figure 4C**) is negligible, as in the global validation of the model (see (Hirata et al., 2011) for more details), with an r = 0.26 (**Table 7**). It must be taken into account that, unlike to this work, the development of the models of Diatoms and Haptophytes in Hirata et al. (2011) is based on in-situ Fuco and Hex-fuco data at which a background correction was applied. The Fuco signal in oligotrophic waters (<0.25 mg m−<sup>3</sup> ) is assumed to be due to smaller Haptophytes rather than Diatoms. This correction is significant only at lower TChla concentrations. It means that, applying this global model in this TChla range, the estimates of Diatoms could be slightly improved for the Mediterranean Sea but, at the same time, the estimates of the Haptophytes would get worse. As for the PSCs, also for the PFTs (**Figure 4**) the regionalization reduces the bias of about one order of magnitude for all the types (**Tables 6, 7**). The preliminary analysis of r, MBE, and RMSE has showed a very good predictive power for all the new regional models. The best performances seem to be associated with the algorithms

FIGURE 4 | Comparison between the validation of the new PFT regional algorithms (Val, right panel: B,D,F,H,L,N) vs. the global PFT model of Hirata et al. (2011) (Hirata, left panel: A,C,E,G,I,M): in-situ (x axis) vs. estimated (y-axis) PFT TChla concentrations. (A) Diatoms-Hirata, (B) Diatoms-Val, (C) Dinophytes-Hirata, (D) Dinophytes-Val, (E) Cryptophytes-Hirata, (F) Cryptophytes-Val, (G) Haptophytes-Hirata, (H) Haptophytes-Val, (I) Green algae & Prochlorophytes-Hirata, (L) Green algae & Prochlorophytes - Val, (M) Prokaryotes - Hirata, (N) Prokaryotes - Val. For the statistics, see Table 7.

of Cryptophytes and "Green algae & Prochlorophytes" for the PFTs, followed by Haptophytes and Diatoms, and with the nano model for the PSC group. The study of the RPD and the mean

### TABLE 5 | Mathematical equations used to compute the statistic parameters.


TABLE 6 | Statistical results of the new regional algorithms (Med) applied to the calibration dataset (70% of the entire subset = 966 data).


The statistic is computed on TChla concentration values (mg m−<sup>3</sup> ). MBE and RMSE are expressed in mg m−<sup>3</sup> , while r, RPD (%), and APD (%) are dimensionless.

absolute percentage difference, APD, has integrated this statistic information, taking also into account the different dynamical range of the TChla concentration represented by each class. Weighing the uncertainty on the dynamical range of the observed concentration values, statistical data confirmed the goodness of the fits for all the phytoplankton groups (**Table 7**) and showed the best predicted power for the algorithms which estimate the nano (RDP = 3% and ADP = 12%) and Haptophytes (RDP = 4% and ADP = 13%) components, for PSCs and PFTs, respectively. These considerations are also confirmed by the validation results represented in **Figures 3**, **4**.

# Application of the New Regional Algorithms to the Daily Mediterranean Reprocessed TChla CASE1–2 Time Series: PSC and PFT Climatology (1998–2015)

The new regional algorithms (**Table 4**) are applied on an 18 years' time series of TChla satellite estimates (see Section Data and Methods) to compute PSCs and PFTs. **Figures 5**, **6** show annual PSC and PFT (1998–2015) climatology, respectively.


TABLE 7 | Statistical results of the new regional algorithms (Med) validation compared with the statistics resulting by applying the global models (Brewin et al., 2011; Hirata et al., 2011) on the same validation dataset (30% of the entire subset = 413 data).

The statistic is computed on TChla concentration values (mg m−<sup>3</sup> ). MBE and RMSE are expressed in mg m−<sup>3</sup> , while r, RPD (%) and APD (%) are dimensionless.

In the left panel, each map shows the fractions of the TChla represented by each phytoplankton component. For each pixel the percentage maps give fraction values relative to the chlorophyll concentration (**Figure 5**, top panel), whose distribution is typically characterized by a West–East decreasing gradient in the Mediterranean Sea (Siokou-Frangou et al., 2010; Estrada and Vaqué, 2014). In the right panel, instead, the maps show the relevance of each class in terms of TChla estimates (mg m−<sup>3</sup> ).

All the three size classes reach their maxima absolute values, >3 mg m−<sup>3</sup> of TChla for micro (**Figure 5B**), and about 1.6 and 0.4 mg m−<sup>3</sup> for nano (**Figure 5D**) and pico (**Figure 5F**), respectively, in the more productive zones of the basin (see **Figure 5**, top panel). In the eastern basin these areas are: the North Adriatic Sea and in general the whole Adriatic coast (due to the great nutrient supply from the Po river); the south-eastern area of the Levantine basin influenced by the outflow of the Nile river; the Northern Aegean Sea and the Gulf of Gabès (probably only an area of very shallow water). Otherwise, in the western basin, these more productive regions are the Gulf of Lion, the eastern cost of Spain, and the Tyrrhenian Sea coast. Very high values (greatest for the nano class) are also evident in the North–western Alborán Sea and along the Algerian and Tunisian coasts up to the Sicily channel, in the Liguro-Provençal and part of the Catalan Basin and in the cyclonic area of the North Tyrrhenian Sea. As expected, the existing co-variability between the accessory pigments linked to each fraction and the TChla is highlighted by the percentage maps (**Figure 5**, left panel). In the more oligotrophic eastern Mediterranean Sea, where TChla climatology shows the lower absolute concentrations (**Figure 5**, top panel), the relative dominant component is the pico class (**Figure 5E**). Here, the TChla concentration of the pico-phytoplankton (**Figure 5F**) is about five times micro (**Figure 5B**) and two times the nano one (**Figure 5D**). On the contrary, in the regions where the TChla reaches higher values, the percentage contribution of the nano and micro components increase (**Figures 5A,C**, respectively). Generally, in the whole basin the nano component shows intermediate values, in particular ranging from 30 to 40% of the TChla in the western basin, growing up to 45% in highly productive areas (**Figure 5C**).

Moving to the PFTs, the abundance of each class in terms of TChla concentration (**Figure 6**, right panel) reflects, as occurred in the size classes, the gradient of this pigment (**Figure 5**, top panel), showing for all groups higher values in the western basin and in the already mentioned high productive zones of the entire basin. This is true for all groups. In the climatological analysis, Prokaryotes constitute the principal component of the pico-phytoplankton in almost all areas, both in terms of percentage (**Figure 6M**) and concentration (**Figure 6N**). They are the absolute dominant group in the oligotrophic and ultra-oligotrophic waters of the eastern basin, but also for the western basin in the southern Tyrrhenian Sea and in some areas of the Algero-Provençal basin (**Figure 6M**). In the Levantine basin the second group in terms of TChla concentration is the Haptophytes (**Figure 6H**). They represent the dominant class within the nano-phytoplankton in the whole Mediterranean Sea and constitute the main group featuring the case 1 water of the western basin (**Figure 6G**). Diatoms (**Figures 6A,B**) dominate in the micro-phytoplankton and can be considered the third group in terms of TChla concentration in the open sea, followed by the Green algae (**Figure 6L**), Cryptophytes (**Figure 6F**) and, finally, Dinoflagellates (**Figure 6D**). Otherwise, in coastal areas Diatoms dominate, reaching values about of 3 mg m−<sup>3</sup> . This is well-evident in the North Adriatic Sea and in general in the entire Adriatic coast, in the southeastern area of the Levantine basin influenced by the outflow of the Nile River and in the Gulf of Gabès (**Figure 6B**). In general, in terms of chlorophyll concentration, in the coastal areas we find the predominance of the Diatoms and Haptophytes, followed by the remaining classes (**Figure 6**, right panel).

At last but not least in terms of biological importance, the phytoplankton distribution in the Alborán Sea and along the Algerian-Tunisian coasts is characterized by the dominance of

(right panel: B,D,F). PSCs are retrieved applying the new regional algorithms (see Table 4) on the daily TChla times series (Mediterranean reprocessed product produced by the CMEMS-OCTAC). (A,B) Micro-phytoplankton, (C,D) Nano-phytoplankton, (E,F) Pico-phytoplankton.

the Haptophytes, followed by Diatoms, and then Prokaryotes, Green algae, Cryptophytes, and at last Dinoflagellates.

In addition, in **Figure 7** we show the results of the monthly climatology of each group, averaged over the whole Mediterranean Sea. On basin scale, the component mostly representative of the TChla seems to be the nano-phytoplankton for the PSCs (top panel), especially in the bloom periods typical of the midlatituds. Nano is followed by pico and the component with the lower contribution to the TChla is the micro one. Only in summer, pico-phytoplankton dominates on the TChla concentration, exceeding the other two classes. In the same season, the micro component reaches its minimum values. The monthly mean PFT climatology (**Figure 7**, bottom panel) confirms the predominance of the Haptophytes within the nano-phytoplankton in the whole Mediterranean Sea as highlighted in the previous PFT map analysis (**Figure 6H**). This is also the predominant group all over the year. The Haptophytes are followed by Diatoms and Prokaryotes representing the main component for the micro- and the pico-phytoplankton, respectively. More in detail, the contribute of the Diatoms to the TChla concentration is greater than the Prokaryotes only in the early spring. The two classes show similar concentrations in late autumn and winter season while, in the remaining part of the year, Prokaryotes dominate. Cryptophytes and "Green algae & Prochlorophytes" always reveal a same contribution, even if the concentration of the latter is slightly greater in late winter—early spring. These two functional groups represent the smallest fractions within the nano- and pico- size classes, respectively. At last, Dinophytes constitute the class with the lowest TChla concentration all over the year.

## DISCUSSION AND CONCLUSIONS

The Mediterranean Sea is typically characterized by peculiar optical properties that make its color different from the global ocean (Volpe et al., 2007). In addition to an abundant aerosol dominated by continental anthropogenic pollution (Moulin et al., 1997) and the presence of Saharan dust in the water column (Claustre et al., 2002), one of the main reasons that justifies its color seems to be a different phytoplankton assemblage structure typical of this basin (Volpe et al., 2007). This is also confirmed by the presence of pigment ratios different with respect to those of the global ocean (Sammartino et al., 2015). This implies the need of regional algorithms that take into account all these peculiar characteristics. In the last year, several specialized algorithms have been proposed for the detection of the chlorophyll a concentration (e.g., Volpe et al., 2007; Santoleri et al., 2008). Instead, PFT and PSC regional algorithms do not exist, except for the recent work of Navarro et al. (2014). They adapted the previous version of the PHYSAT method of Alvain et al. (2005, 2008), providing regional estimates of dominant PFT groups. In our work, for the first time, new regional algorithms have been advanced to identify, together, the contribution of each PSC and PFT group to the satellite estimates of TChla concentration. This different approach, based on the close link existing between the abundance of each group and the trophic status of the environment (Margalef, 1967, 1978; Brewin, 2011), provides new kind of information, complementary to the results of the PHYSAT-Med.

Our assessment of the uncertainty associated to the new developed regional algorithms and the most used global models based on the same approach, highlight and confirm that a regionalization for the PSC and PFT satellite algorithms is required. As shown by our validation results (Section Empirical Algorithms for the Identification of the PFTs and PSCs: Calibration and Validation), the use of Mediterranean PSC and PFT algorithms allowed to eliminate the bias between observations and estimates and to reduce the RMSE of an order of magnitude respect the global models.

Even if the uneven distribution of the in-situ observations between western-central Mediterranean Sea and the eastern basin could imply that the new formulations are more appropriate for the western basin, we are confident that the derived parameterizations can be applied also in the eastern Mediterranean Sea without introducing a significant bias on satellite estimates. In fact, the in-situ dataset used for the algorithm calibration includes the typical values of chlorophyll a observed in the oligotrophic waters of eastern Mediterranean Sea (ranging from 0.02 to 0.14 mg m−<sup>3</sup> ). The number of the samples that fall in this chlorophyll range represent the 38% of the total number of the calibration data, 18% of which are acquired in the eastern Mediterranean Sea. This implies that the oligotrophic condition is well represented in our dataset. For a further assessment, we made a preliminary evaluation of the new parameterizations limited to the eastern Mediterranean Sea using all available in-situ observation in our dataset. This results into a bias (from −0.001 to 0.001 depending on the PFT/PCS parameterization) and RMS (from 0.002 to 0.004 depending on the PFT/PCS parameterization), values comparable with the bias obtained in western Mediterranean for the same TChla range and with the values resulting from the algorithm validation (see **Table 7**). Even if this result cannot be considered conclusive since has been obtained with a limited number of in-situ observations, the statistical results seem to indicate that our parameterization should not introduce any significant bias to satellite derived estimates.

The analysis of the phytoplankton assemblage distribution patterns resulted from the application of our new algorithms to the Mediterranean multi-sensor reprocessed dataset (1998–2015) is consistent with the main previous knowledge, both in terms of distribution and phytoplankton ecology (Siokou-Frangou et al., 2010; Uitz et al., 2012; Estrada and Vaqué, 2014; Navarro et al., 2014). Pico-phytoplankton, with Prokaryotes as the main component, is widespread throughout the whole basin and always dominant in oligotrophic and stratified waters (see Section Results) in agreement with the observation of Siokou-Frangou et al. (2010), according to which the pico component constitutes more than 50% of the total biomass in these conditions (Estrada and Vaqué, 2014). Furthermore, our results show that pico class reaches its maximum value (about 0.4 mg m−<sup>3</sup> ) in the more productive areas. These considerations are also in consonance with ecological behavior and strategy of this group. Size affects nutrient solute and water fluxes across the plasmalemma, favoring the smaller sized cells in the oligotrophic water. This is due to the larger surface to volume ratio of small cells with respect to the larger ones, which make the former efficient nutrient absorber in very low nutrient conditions. Moreover, in stratified environments the probability of sinking out from the euphotic zone is greater in the micro-sized cells than in the smaller ones, undergoing a lower loss of organisms (Chisholm, 1992; Raven, 1998). However, this does not imply that Prokaryotes and picophytoplankton in general reach the maxima values in terms of chlorophyll concentration in the oligotrophic conditions. In fact, Chisholm (1992) suggested that they usually achieve their "maximum potential biomass" (of about 0.5 mg m−<sup>3</sup> ) in high nutrient conditions, in accordance with our estimates. We showed that, in the more productive region of the basin, the chlorophyll concentrations of the pico and Prokaryotes classes correspond to relative small percentage of TChla (Chisholm, 1992). Indeed, our climatological maps shows the dominance, in these conditions, of the micro component with values that exceeds up to the 50% of the TChla, followed by nano- and, at last, by a minor contribute of the pico-phytoplankton (about 10–15%).

Within the micro group, the major contribute is clearly due to the Diatoms in the higher nutrient areas. This is justified by the ecological strategy of this functional group, physiologically better adapted to high dynamic conditions and more efficient in the nutrient absorption. Moreover, they are also able to subtract nutrients from the surrounding environment and to store them in their large vacuoles, depriving other groups and supporting their growth at the same time (Margalef, 1978; Falkowski et al., 2003; Litchman et al., 2007; Estrada and Vaqué, 2014). About the Dinoflagellates, their contribution to the microphytoplankton is very low with respect to the Diatoms one. This is probably due to their different ecological strategy, welladapted to high dynamical environment but with a higher affinity for low nutrient conditions. Moreover, Estrada and Vaqué (2014) suggested that the use of peridinin as biomarker pigment for the Dinoflagellates identification could cause an underestimation of their abundance because it could be not present in some organisms of this class (Jeffrey and Vesk, 1997).

An important result of this study is the information on the Nanoflagellate distribution, mainly represented by the Haptophytes in our dataset. This can represent a precious novelty, considering the lack of knowledge on the Nanoflagellate spatial distribution, improved only in the last years thanks to the more diffused usage of chemotaxonomic and molecular techniques (Latasa et al., 2010). The widespread distribution of the nano component and its high contribution to the TChla in the whole basin confirms the Uitz et al. (2012) results, according to which the primary production in the Mediterranean Sea is mainly due to the nano-phytoplankton component.

The lack of data on the phytoplankton biogeography at different spatio-temporal scales in the whole Mediterranean Sea and the well-known difficulties in the long-term acquisition of in-situ data at basin scale make essential the use of the remote sensing technique for a synoptic observation of the phytoplankton assemblage composition and its diversity. Our analysis revealed the importance of providing regional algorithms strictly required to suit the peculiar bio-optical properties featuring this basin. The statistical results demonstrated the goodness of the performance and the applicability of our models for the abundance estimations of PSCs and PFTs together.

Nowadays, in the context of international Climate Change Initiatives and cooperation, a synergic effort of the Space Agencies in collaboration with remote sensing scientist is conducted to identify the major gaps (both instrumental and scientific) that should be filled to improve the accuracy of satellite estimates of the phytoplankton groups and their variability (Bracher et al., 2017). In this framework, the following actions summarize our future perspectives to improve remote observations on the Mediterranean Sea: (a) to extend the validation and calibration of the PFT new regional algorithms including new in-situ dataset of HPLC Total Chlorophyll a and diagnostic pigments acquired, in recent years, by the Mediterranean scientific community; (b) to improve the accuracy of the PSC algorithms with a new calibration and validation only based on TChla size-fractions; (c) to carry on the insitu bio-optical measurements to cover all the un-sampled Mediterranean regions, also with the intent to exploit different approaches (e.g., spectral response-based); (d) to extend this regionalization activity to new generation sensors (e.g., OLCI for Sentinel-3) to obtain higher resolution information also for phytoplankton dynamical studies at mesoscale; (e) to analyse the Mediterranean PFT and PSC trends, thanks to the availability of consistent long term satellite observation time series.

On time scales larger than the period we considered, climate or human induced changes in environmental conditions can produce modifications of phytoplankton pigment composition and thus the pigment ratios to the Total chlorophyll a. This implies that the simple empirical relations used to compute the PFTs and PFCs from the chlorophyll observations need to be re-evaluated and or a more sophisticated approach which links the pigment ratios, the PFT, and PSC composition and the major environmental forcing should be developed.

### AUTHOR CONTRIBUTIONS

All the authors contributed to the conception and design of the work and approved the final version of the manuscript to publish. AD carried out the work, performing the Diagnostic Pigment Analysis, the algorithm development and their analysis supported by the experience of RS and SM and drafting the most of the manuscript. MS processed the satellite data, developed all the climatological maps and edited all the images of the manuscript. MS, SM, and RS revised critically the manuscript and contributed to its draft.

# REFERENCES


### FUNDING

This research was supported by the European Commission in the framework of the Copernicus Marine Environmental Services—Ocean Color Thematic Assembling Center Project (Grant agreement 9836100). The research was also supported by the "Ministero dell'Istruzione, dell'Università e della Ricerca" in the framework of the Italian Flagship Project RITMARE (la Ricerca ITaliana per il MARE).

### ACKNOWLEDGMENTS

The authors would like to thank the OC-CCI and CMEMS Project that generated the satellite chlorophyll data used in this paper, which are free available at http://marine.copernicus.eu/. We also acknowledge the SeaBASS archive for the in-situ biooptical dataset, free available at https://seabass.gsfc.nasa.gov/. We want to thank ESA/ESRIN for the "Phytoplankton Diversity at Global and Regional Scale" session, within the "Color and Light from Earth Observation" (CLEO) workshop. We are grateful to Simone Colella for his invaluable scientific and technical advices. This work also benefited from discussions with Alessio Ansuini, Vittorio Brando, Riccardo Droghei, Marco Picone, Jaime Pitarch Portero, and Gianluca Volpe. We also would like to thank Vega Forneris and Flavio La Padula for their valuable technical support.


ocean color algorithms. IEEE Trans. Geosci. Remote Sens. 41, 2833–2843. doi: 10.1109/TGRS.2003.818020


of different trophic status in the northwestern Mediterranean Sea. Mar. Ecol. Prog. Ser. 407, 27–42. doi: 10.3354/meps08559


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Di Cicco, Sammartino, Marullo and Santoleri. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Extended Formulations and Analytic Solutions for Watercolumn Production Integrals

Žarko Kovacˇ 1 \*, Trevor Platt <sup>2</sup> , Suzana Antunovic´ 3 , Shubha Sathyendranath2, 4 , Mira Morovic´ <sup>1</sup> and Charles Gallegos <sup>5</sup>

<sup>1</sup> Physical Oceanography Laboratory, Institute of Oceanography and Fisheries, Split, Croatia, <sup>2</sup> Plymouth Marine Laboratory, Plymouth, United Kingdom, <sup>3</sup> Faculty of Civil Engineering, Architecture, and Geodesy, Split, Croatia, <sup>4</sup> Plymouth Marine Laboratory, National Centre for Earth Observations, Plymouth, United Kingdom, <sup>5</sup> Smithsonian Environmental Research Center, Edgewater, MD, United States

The effect of biomass dynamics on the estimation of watercolumn primary production is analyzed, by coupling a primary production model to a simple growth equation for phytoplankton. The production model is formulated with depth- and time-resolved biomass, and placed in the context of earlier models, with emphasis on the canonical solution for watercolumn production. A relation between the canonical solution and the general solution for the case of an arbitrary depth-dependent biomass profile was derived, together with an analytical solution for watercolumn production in case of a depth dependent biomass profile described with the shifted Gaussian function. The analysis was further extended to the case of a time-dependent, mixed-layer biomass, and two additional analytical solutions to this problem were derived, the first in case of increasing mixed-layer biomass and the second in case of declining biomass. The solutions were tested with Hawaii Ocean Time-series data. The canonical solution for mixed-layer production has proven to be a good model for this data set. The shifted Gaussian function was demonstrated to be an accurate model for the measured biomass profiles and the shifted Gaussian parameters extracted from the measured profiles were further used in the analytical solution for watercolumn production and results compared with data. The influence of time-dependent biomass on mixed-layer production was studied through analytical solutions. Re-examining the Critical Depth Hypothesis we derived an expression for the daily increase in mixed-layer biomass. Finally, the work was placed in a remote sensing context and the time-dependent model for biomass related to the remotely sensed-biomass.

Keywords: primary production, watercolumn production integrals, analytic solutions, growth models, critical depth criterion, remote sensing

## 1. INTRODUCTION

In the ocean, phytoplankton form the foundation of the pelagic ecosystem and by virtue of their phototsynthesis (primary production) act as a source of organic carbon for the remainder of the ecosystem (Chavez et al., 2011). The abundance and growth of virtually all marine life on earth depend on phytoplankton. Consequently, the world's largest fisheries are concentrated around ocean areas with high primary production (Cushing, 1971; Mann and Lazier, 2006). Moreover, the

### Edited by:

Katja Fennel, Dalhousie University, Canada

### Reviewed by:

Toru Hirawake, Hokkaido University, Japan Alexandre Mignot, UMR7093 Laboratoire D'océanographie de Villefranche, France

> \*Correspondence: Žarko Kovacˇ kovac@izor.hr

### Specialty section:

This article was submitted to Ocean Observation, a section of the journal Frontiers in Marine Science

Received: 29 October 2016 Accepted: 12 May 2017 Published: 30 May 2017

### Citation:

Kovac Ž, Platt T, Antunovi ˇ c S, ´ Sathyendranath S, Morovic M and ´ Gallegos C (2017) Extended Formulations and Analytic Solutions for Watercolumn Production Integrals. Front. Mar. Sci. 4:163. doi: 10.3389/fmars.2017.00163 role of phytoplankton does not end with the food chain itself. Through the action of the so-called biological pump, a complex ecosystem process which starts with primary production (Volt and Hoffert, 1985), phytoplankton contribute to the transfer of carbon into the deep ocean (Longhurst and Harrison, 1989) and subsequently affect atmospheric carbon concentration on longer time scales (from decadal to millennial) (Honjo et al., 2008). With this in mind, prediction of primary production is important, not just for the open ocean, but also for coastal seas, and is also relevant to fisheries and climate change research. Given the vastness of the ocean, the basic means by which such predictions are made is through the combined use of primary production models and ocean-color data, acquired by satellites (Platt and Sathyendranath, 1991; Siegel et al., 2014).

In such applications prediction of the total amount of carbon assimilated in the water column in 1 day, i.e., daily watercolumn production, is the target (Platt et al., 1991b). Models for watercolumn production predict the amount of carbon assimilated by phytoplankton per unit area of ocean surface during the day (Platt and Sathyendranath, 1988, 1993). Typical models have chlorophyll concentration as an initial condition and use photosynthesis parameters to determine the response of phytoplankton to light (Platt et al., 1977). The light attenuation coefficient and daylength are also required to determine the depth interval in which photosynthesis takes place and the time interval during which photosynthesis occurs (Kirk, 2011). Surface photosynthetically-active radiation is the forcing variable, which integrated over daylength gives the amount of energy available for photosynthesis.

Various attempts have been made to relate watercolumn production mathematically to environmental factors. Models of various complexities have been proposed and equations have been derived for predicting the amount of primary production per unit area of ocean surface (Kirk, 2011). These equations are often referred to as estimators of watercolumn production (Platt and Sathyendranath, 1993). A straightforward application of such estimators is in converting satellite images of ocean color into primary production maps of the ocean (Platt and Sathyendranath, 1988; Campbell et al., 2002; Platt et al., 2008). For example, with such applications the global annual primary production has been estimated at ∼ 45 − 50 (Longhurst et al., 1995), ∼ 52 (Westberry et al., 2008), and 58 ± 7 (Buitenhuis et al., 2013) giga tonnes of carbon per year.

Some of the earliest primary production estimators date back to Ryther (1956), Ryther and Yentsch (1957), and Talling (1957), all semi empirical. These are followed by Rodhe (1965) whose model has later been used by Bannister (1974) and others (Smith and Baker, 1978; Eppley et al., 1985). Platt (1986) developed a linear model and finally in 1990 the first, and until today the only, analytical solution of the nonlinear model for daily watercolumn production was given by Platt et al. (1990). A thorough historical review of the topic is found in Platt and Sathyendranath (1993). A common feature of the models mentioned is their assumption of daily time-independent, vertically-uniform biomass (Platt and Sathyendranath, 1993). Therefore, strictly speaking, they are valid only for calculating watercolumn production occurring in the mixed layer during non bloom conditions (when net growth is low). These conditions do prevail over vast ocean areas and for prolonged periods of the year, but it is precisely when highest primary production occurs, that they do not.

Since for the open ocean the depth of the mixed layer is on the same order of magnitude as the photic depth (de Boyer Montegut et al., 2004), vertical uniformity in biomass is thought not to be a severe limitation for calculating watercolumn production. However, during stratified periods the mixed layer production (i.e., production taking place from the ocean surface up to the base of the mixed layer) may no longer constitute a major segment of watercolumn production. The reason is that when stratification is strong, the mixed layer depth is often found to be much shallower then the photic depth (Longhurst, 1998). In such conditions biomass tends to develop vertical dependency below the base of the mixed layer, and the water column below it can contribute significantly to daily watercolumn production. These conditions tend to prevail during summer periods of intense surface sunlight and in oligotrophic environments (Mignot et al., 2014). In such conditions the correctness of applying models with vertical uniformity in biomass, for the whole photic depth, may be challenged.

Also, resolving the daily time dependence of biomass when calculating watercolumn production may be advantageous in some situations. Consider for example the period of a bloom. After stratification sets in, circulation of phytoplankton to greater depths is prevented and rapid growth can occur (Sverdrup, 1953; Sathyendranath et al., 2015). Certainly, during such a period the usage of time-independent biomass in watercolumn production models can be challenged. After all, the bloom is defined as the period of rapid growth in biomass. For estimating production during such conditions a model with time-dependent biomass would be more suitable. Another case in which the time dependence of biomass may be important is during periods of sharp decline in biomass, which may be caused by intense grazing or dilution losses from deepening of the mixed layer (Zhai et al., 2010). Thanks to remote sensing technologies, time dependence in biomass is easily seen in serial satellite records of chlorophyll (Platt and Sathyendranath, 2008; Racault et al., 2012; Cabre et al., 2016).

Therefore, during periods of blooming/declining biomass or periods of strong biomass stratification, a different specification of model biomass may be more suitable. In this work, we begin with an outline of the model, followed by mathematical descriptions of the aforementioned problems and derivation of the solutions. The basic model we use is already established in the literature. It was first put forward by Platt et al. (1990) and has since received numerous applications. This primary production model was coupled to hydrodynamical models (Platt and Sathyendranath, 1991), put in the context of Sverdrup critical depth theory (Platt et al., 1991a), used in the estimation of primary production from satellite data (Platt and Sathyendranath, 1993), in studying the interaction between the mixed layer and watercolumn production (Platt et al., 1994) and finally in explaining the dynamics of high nutrient low chlorophyll zones (Platt et al., 2003). Here we extend this model, providing additional solutions for watercolumn production and analyzing its relation to biomass dynamics through simple growth models. By doing so we broaden the range of applicability of the model and place it on a more rigorous foundation.

First we define the watercolumn production integral and proceed to discuss different ways of calculating watercolumn production based on the way biomass is specified. We then explore the general case of depth dependent biomass and give an exact solution for watercolumn production with the shifted Gaussian biomass. After that we explore the case of time dependent mixed layer biomass and provide analytical solutions for the case of growing and declining biomass. We test the new solutions on data collected at the Hawaii Ocean Time-series station located in the Pacific. Finally, we discuss the implications of this model for Sverdrup's Critical Depth Hypothesis and for remote sensing applications.

### 2. THEORY

### Watercolumn Primary Production

Let the z axes be positive downwards with the origin at the ocean surface (**Figure 1**). Let time t equal zero at sunrise and D at sunset. At an arbitrary depth z and time t, primary production P(z, t) (measured in mg C m−<sup>3</sup> h −1 ) is the product of phytoplankton biomass B(z, t) (measured in mg Chl m−<sup>3</sup> ) and the biomass-normalized production P B (z, t) (measured in mg C (mg Chl)−<sup>1</sup> h −1 ):

$$P(z,t) = B(z,t)P^B(z,t). \tag{1}$$

In analytical models of primary production, the usual definition of watercolumn primary production is that of a double integral of the product of initial biomass B(z, 0) and the biomass-normalized production P B (z, t):

$$P\_{Z,T} = \int\limits\_{0}^{D} \int\limits\_{0}^{\infty} B(z,0) P^{B}(z,t) \,\mathrm{d}z \,\mathrm{d}t.\tag{2}$$

The subscript Z denotes integration over depth, whereas subscript T denotes integration over time, following the notation of Platt et al. (1990). The biomass-normalized production is a function of irradiance and is specified with the photosynthesis irradiance function p B (I) (Jassby and Platt, 1976), which is stated as:

$$P^B(z,t) = p^B(I(z,t)),\tag{3}$$

where I(z, t) is the irradiance, calculated from surface irradiance I0(t) with the aid of a light penetration model (Kirk, 2011). Typically, production increases linearly with irradiance, the increase begins to decline with higher irradiance and saturation occurs, or production is reduced if irradiance reaches high enough levels, i.e., photoinhibition occurs (Platt et al., 1980). Neglecting photoinhibition leaves the photosynthesis-irradiance function determined by two parameters: the initial slope α B and the assimilation number P B <sup>m</sup> (Platt and Sathyendranath, 1988). The effects of nutrients and temperature on production are assumed to be included implicitly in the magnitude of the photosynthesis parameters, following the approach of Platt

FIGURE 1 | Sketch of the basic model relations with respect to depth. With the information on surface irradiance I0(t) and the optical properties of the water column, the irradiance at depth I(z, t) (gray curve) is first calculated. Using I(z, t) and the photosynthesis irradiance function p <sup>B</sup>(I), along with the information on biomass B(z, t) (blue curve on the left hand side), normalized instantaneous production P <sup>B</sup>(z, t) is obtained (orange curve). Further on, taking the product of B(z, t) and P <sup>B</sup>(z, t) and integrating it over time we get the daily production at depth (blue curve on the right hand side). In this depiction watercolumn production PZ,<sup>T</sup> is the blue surface under the P<sup>T</sup> (z) curve. Blue arrows indicate mixing and Zm marks the mixed layer depth. Daily production from the surface up to Z<sup>m</sup> is marked with PZm,<sup>T</sup> .

and Jassby (1976). When evaluating the integral (2), various approaches and assumptions may be adopted. It can either be integrated over depth, following integration over time, or vice versa. Depth dependence of biomass can be specified, or biomass can be set constant with depth. Similarly, surface irradiance can be considered as time-dependent or constant, spectrally-resolved or spectrally integrated. Different photosynthesis-irradiance functions can be used (Jassby and Platt, 1976). How to solve integral (2) depends in large part on mathematical convenience, in the context of the particular problem under study.

Using dimensional analysis Platt and Sathyendranath (1993) showed that the canonical form for the solution of integral (2), in the case of vertically uniform biomass B(z, 0) = B, is:

$$P\_{Z,T} \sim \frac{BP\_m^B D}{K} f(I\_\*^m),\tag{4}$$

where K is the diffuse attenuation coefficient for downward irradiance (Kirk, 2011) and I m <sup>∗</sup> = α B I m 0 /P B <sup>m</sup> the scaled noon irradiance, with I m 0 as noon irradiance. The f(I m ∗ ) function is determined by the formulation of the production-light relationship. In case of the Platt photosynthesis irradiance function (Platt et al., 1980):

$$p^B(I) = P\_m^B\left(1 - \exp\left(-\alpha^B I/P\_m^B\right)\right),\tag{5}$$

the f(I m ∗ ) is:

$$f(l\_\*^m) = \sum\_{n=1}^{\infty} \frac{2\left(l\_\*^m\right)^{2n-1}}{\pi \ (2n-1)\ (2n-1)!!} \frac{(2n-2)!!}{(2n-1)!!}$$

$$-\sum\_{n=1}^{\infty} \frac{\left(l\_\*^m\right)^{2n}}{2n\ (2n)!} \frac{(2n-1)!!}{(2n)!!}.\tag{6}$$

In that case the exact solution to integral (2) is:

$$P\_{Z,T} = \frac{B P\_m^B D}{K} f(I\_\*^m). \tag{7}$$

When the water column is of a finite depth Z<sup>m</sup> (**Figure 1**) the solution is:

$$P\_{Z\_m,T} = \frac{BP\_m^B D}{K} \left( f(I\_\*^m) - f(I\_\*^m e^{-KZ\_m}) \right). \tag{8}$$

This solution was derived by Platt et al. (1990) and thus far no other analytical solution for watercolumn production has been reported in the literature. It is called the canonical solution. The assumptions of the model regarding light conditions are: optically-uniform water column with sinusoidally varying surface irradiance, such that the irradiance at depth is given by:

$$I(z,t) = I\_0^m \sin(\pi t/D)e^{-Kz}.\tag{9}$$

Inserting this expressions into (5) gives the biomass normalized production as:

$$P^B(z,t) = P\_m^B\left(1 - \exp\left(-\alpha^B I\_0^m \sin\left(\pi t/D\right)e^{-Kz}/P\_m^B\right)\right). \tag{10}$$

Time integral of P B (z, t) gives the daily normalized production P B T (z). Here we shall use this same model setup but will relax the assumptions of homogeneous and constant biomass. We will allow for time- and depth-dependent biomass, seek the solution for PZ,<sup>T</sup> in specific situations and explore the influence that a time-, or depth-, dependent biomass has on watercolumn production. Therefore, in this paper, a more complete definition for PZ,<sup>T</sup> would be:

$$P\_{Z,T} = \int\limits\_{0}^{D} \int\limits\_{0}^{\infty} B(z,t)P^B(z,t) \,\mathrm{d}z \,\mathrm{d}t.\tag{11}$$

A complete list of symbols with corresponding descriptions is given in Appendix A.

### Biomass Specification

As formulated, this watercolumn production integral requires the specification of biomass as a function of depth and time. To specify biomass for integral (11) it is useful to write a differential equation for the time evolution of the biomass. In this way biomass dynamics are incorporated into the integral and its solution. The dynamics of biomass can be modeled by a simple equation of the following form:

$$\frac{\partial}{\partial t}B(z,t) = \frac{1}{\chi}P^B(z,t)B(z,t),\tag{12}$$

Sathyendranath and Platt (2007) assuming the carbon to chlorophyll ratio χ constant during time (Sathyendranath et al., 2009). This equation governs the time evolution of biomass resulting from photosynthesis and allows for the accumulation of biomass at each depth in accordance with (10).

Let B ∗ (z, t) be the solution to the growth equation (12). A direct approach for calculating PZ,<sup>T</sup> would be to insert B ∗ (z, t) into the integral (11) and solve for PZ,T:

$$P\_{Z,T} = \int\limits\_{0}^{D} \int\limits\_{0}^{\infty} B^\*(z,t) P^B(z,t) \,\mathrm{d}z \,\mathrm{d}t.\tag{13}$$

With the usage of the solution B ∗ (z, t) in (11), watercolumn production is coupled to the biomass dynamics expressed by (12). However, this approach can be mathematically complex, depending on the form of the solution B ∗ (z, t). A simpler approach follows by recognizing that the process of biomass accumulation described by (12) is in fact primary production. Therefore, the difference between the final B ∗ (z, D) and initial biomass B ∗ (z, 0), multiplied by χ, equals daily primary production at depth z. Mathematically, the solution B ∗ (z, t) satisfies (12) and the following holds:

$$B^\*(z,t) = \frac{\chi}{P^B(z,t)} \frac{\partial}{\partial t} B^\*(z,t). \tag{14}$$

Inserting this expression into (13) and solving yields:

$$P\_{Z,T} = \chi \int\_0^\infty \left( B^\*(z, D) - B(z, 0) \right) dz. \tag{15}$$

This expression gives the watercolumn production as the difference between the final and initial biomass multiplied by the carbon to chlorophyll ratio. The two integrals, (13) and (15), are equivalent. The advantage of the second approach (15) is evident in cases when the biomass dynamics are governed by the simple growth equation, such as (12), whereas the first approach (13) is more useful when the solution to equation (12) is a simple mathematical expression. Both approaches will be used later.

If in the time interval from t = 0 (sunrise) until t = D (sunset) the following expression holds for the solution B ∗ (z, t):

$$\frac{\partial B^\*(z,t)}{\partial t} \approx 0,\tag{16}$$

it is safe to assume B ∗ (z, D) ≈ B ∗ (z, 0) and justified to use initial biomass throughout the calculation of daily production. Biomass profiles of this type can be considered to be in, or close to, a steady state (Hodges and Rudnick, 2004). From the dynamical perspective this approach is crude, but solutions with initial biomass are valid in situations when the biomass does not change significantly during the time course of 1 day. In this sense, integral (2) is a special case of integral (13) when (16) is valid. The accumulation of biomass due to photosynthesis is either not significant compared with the initial biomass, or is balanced by the loss processes. With this approach, the functional form of the initial biomass profile can be inferred from measurement, instead of by solving the model equations, which is an advantage. In fact, this is the standard practice in remote sensing applications (Platt and Sathyendranath, 1988; Behrenfeld and Falkowski, 1997).

### Stratified, Time-Independent Biomass

In summer periods of intense heating and weak mixing, strong stratification in temperature, as well as stratification in biomass, tend to develop below the mixed layer (Mann and Lazier, 2006). In this stratified region, the biomass tends to remain virtually constant in time, with only slight fluctuations in the shape of the chlorophyll profile as the season advances. In the tropical ocean stratification can be a permanent feature, whereas in the temperate regions stratification tends to be eroded during autumn and winter (Longhurst and Harrison, 1989). Given that stratification usually persists for intervals of time longer than 1 day, it is safe to assume constant biomass when calculating daily watercolumn production. Depth variation in biomass is assumed to have a larger influence on the magnitude of daily watercolumn production than time dependence does. Mathematically, integral (2) is appropriate in these conditions and model solutions with time-independent biomass are valid, because we assume that biomass does not change significantly during the time course of 1 day, as stated by (16). This assumption is valid in regions of the ocean where a stratified biomass profile is observed to be persistent on time scales longer than that of 1 day.

### General Case

When the biomass profile is held constant in time B(z, 0) = B(z), a link between the canonical solution and the general solution to integral (2) can be established. To demonstrate this relation, we take advantage of the daily normalized production profile P B T (z). With it, integral (2) becomes:

$$P\_{Z,T} = \int\_0^\infty B(z) P\_T^B(z) \,\mathrm{d}z.\tag{17}$$

In this way, watercolumn production is given as an integral of the product between the time-independent biomass B(z) and normalized daily production P B T (z). The solution for the daily normalized production profile P B T (z) in case of the Platt et al. (1980) photosynthesis irradiance function (5) is Kovac et al. ˇ (2016a):

$$P\_T^B(z) = P\_m^B D f\_z(I\_\*^m e^{-Kz}),\tag{18}$$

where the fz(I m ∗ e <sup>−</sup>Kz) function is related to the f(I m ∗ ) function in the following manner:

$$f\_z(I\_\*^m e^{-Kz}) = -\frac{1}{K} \frac{d}{dz} f\left(I\_\*^m \exp(-Kz)\right). \tag{19}$$

Inserting this expression into (17) and solving by partial integration gives:

$$P\_{Z,T} = \frac{P\_m^B D}{K} \left( B(0) f\left(I\_\*^m\right) + \int\_0^\infty \frac{\mathrm{dB}(\mathbf{z})}{\mathrm{d}z} f\left(I\_\*^m e^{-Kz}\right) d\mathbf{z} \right), \tag{20}$$

Where we have used B(∞)f(I m ∗ e <sup>−</sup>K∞) = 0. The first term on the right hand side is simply the canonical solution (7) and the second term is recognized as the contribution arising from the shape of the biomass profile (stratification term).

The interpretation of the first term is simple: it gives the watercolumn production in the case where surface biomass stretches over the entire watercolumn. According to the second term, any change in biomass with depth causes a deviation from the canonical solution. If there is an increase in biomass, dB(z)/ dz > 0, this contribution is positive, whereas if there is a decline, dB(z)/ dz < 0, this contribution is negative. The change in biomass dB(z)/ dz is scaled by the f(I m ∗ e <sup>−</sup>Kz) function. The product dB(z)f(I m ∗ e <sup>−</sup>Kz) equals the production that would occur below the depth z in case the biomass from z to ∞ were equal to dB(z). Total contribution from all these infinitesimal changes in B(z) is accounted for by the integral on the right hand side of (20). With increase in depth, the contribution from biomass variation decreases, simply because production declines with increasing depth (10).

Expression (20) is a formal relation linking the canonical solution (7) to the solution for watercolumn production with stratified biomass (2). It is valid for an arbitrary biomass profile and clearly displays the role surface biomass B(0) has on the magnitude of PZ,T. Surface biomass appears as a leading factor in PZ,T. The significance of this result is emphasized given that surface biomass is readily accessible to satellite measurement. Therefore, if the remotely-sensed surface biomass is precise, and assuming the remaining parameters of the model are characteristic of the ocean region in question, the error in the estimated watercolumn production arises solely as a consequence of the error in estimating the biomass profile, which is inaccessible to remote sensing and has to be assigned based on prior information (Platt and Sathyendranath, 1988).

### Shifted Gaussian Biomass Profile

An important case of the time independent biomass profile is that of the shifted Gaussian superimposed on a constant background. The shifted Gaussian is a suitable function for the description of the vertical structure of biomass for diverse regions of the oceans (Platt et al., 1991b), especially for the case of the widespread deep chlorophyll maximum (Longhurst, 1998; Mignot et al., 2014). In remote sensing applications it has been widely used to model biomass profiles in algorithms for primary production calculation (Platt and Sathyendranath, 1988, 1991). It is also used in operational oceanography (Platt et al., 2008), with wellestablished and tested procedures (Platt et al., 1988; Longhurst et al., 1995). The role of the shifted Gaussian in ocean color remote sensing has been studied and modeled by many authors, including: Morel and Berthon (1989), Sathyendranath and Platt (1989), Andre (1992), Stramska and Stramski (2005), Uitz et al. (2006), Xiu et al. (2008), and Mignot et al. (2011).

In this case the biomass profile equals:

$$B(z) = B\_0 + \frac{h}{\sigma\sqrt{2\pi}}\exp\left(-\frac{(z - z\_m)^2}{2\sigma^2}\right),\tag{21}$$

where the integrated biomass beneath the Gaussian curve is given by h, the depth of the maximum is at z<sup>m</sup> and the width of the biomass peak is determined by σ. B<sup>0</sup> is the background biomass. The peak biomass above the background B<sup>0</sup> at z<sup>m</sup> is H = h/(σ √ 2π). Upon inserting this expression into integral (17) we get:

$$P\_{Z,T} = \int\_0^\infty B\_0 P\_T^B(z) \, \mathrm{d}z + \int\_0^\infty \frac{h}{\sigma \sqrt{2\pi}} \exp\left(-\frac{(z - z\_m)^2}{2\sigma^2}\right) P\_T^B(z) \, \mathrm{d}z. \tag{22}$$

The first integral can be replaced with the canonical solution and the second integral can be simplified with the help of (18) to give:

$$\begin{split} P\_{Z,T} &= \frac{B\_0 P\_m^B D}{K} f(I\_\*^m) \\ &+ \frac{h P\_m^B D}{\sigma \sqrt{2\pi}} \int\_0^\infty \exp\left(-\frac{(z - z\_m)^2}{2\sigma^2}\right) f\_z\left(I\_\*^m e^{-Kz}\right) \,\mathrm{d}z. \end{split} \tag{23}$$

With reference to (20) we label the second term on the right hand side as 1PZ,<sup>T</sup> (stratification term). The exact mathematical derivation of the solution for 1PZ,<sup>T</sup> is given in the Appendix B. The final solution is:

become trapped in the upper, well-illuminated, nutrient-rich layer, where conditions for growth are fulfilled, and in this upper layer production is most intense, which can lead to rapid accumulation of biomass (Chiswell et al., 2015). Blooms last until nutrients are depleted, after which they crash (Levy, 2015), or are terminated by overgrazing. How rapidly the bloom develops will be determined by physical conditions, the physiological status of the phytoplankton population and by loses (Banse and English, 1994), resulting in diverse patterns of seasonal cycles of phytoplankton biomass, as evident in remotely sensed records of chlorophyll concentration (Vargas et al., 2009; Racault et al., 2012). In the model, the physiological status is described by the photosynthesis irradiance function, whereas the physical conditions are presented by the mixed layer depth, surface irradiance, and the attenuation coefficient. Next, we calculate primary production in the mixed layer and show how it is affected by rapidly-growing or declining biomass.

### Increasing Mixed-Layer Biomass

Let us consider a mixed layer of depth Z<sup>m</sup> (constant in time) with uniformly-distributed biomass at initial time B(z, 0) = B0, for z = 0 to z = Zm. To simplify the equations, we introduce the following notation for the total biomass in the mixed layer:

$$\begin{split} \Delta P\_{Z,T} &= P\_m^B D \frac{h}{2} \Bigg[ \sum\_{n=1}^{\infty} \exp\left( \frac{z\_{2n-1}^2 - z\_m^2}{2\sigma^2} \right) \frac{2\left(l\_\*^m\right)^{2n-1}}{\pi \left(2n-1\right)!} \frac{(2n-2)!!}{(2n-1)!!} \Bigg( 1 + \Phi\left(\frac{z\_{2n-1}}{\sqrt{2}\sigma}\right) \Bigg) \\ & \quad - \sum\_{n=1}^{\infty} \exp\left( \frac{z\_{2n}^2 - z\_m^2}{2\sigma^2} \right) \frac{\left(l\_\*^m\right)^{2n}}{(2n)!!} \frac{(2n-1)!!}{(2n)!!} \left(1 + \Phi\left(\frac{z\_{2n}}{\sqrt{2}\sigma}\right) \right) \Bigg), \end{split} \tag{24}$$

where 8 is the error function, z2n−<sup>1</sup> = z<sup>m</sup> − (2n − 1)σ <sup>2</sup>K and z2<sup>n</sup> = z<sup>m</sup> − 2nσ <sup>2</sup>K. The complete solution to (22) is now:

$$P\_{Z,T} = \frac{B\_0 P\_m^B D}{K} f(I\_\*^m) + \Delta P\_{Z,T},\tag{25}$$

where the 1PZ,<sup>T</sup> depends explicitly on the values of h, zm, σ, α B , P B <sup>m</sup>, I m 0 , and D. This expression calculates the amount of carbon assimilated during 1 day per meter squared of the ocean surface, by phytoplankton distributed vertically according to the shifted Gaussian function (21).

The shifted Gaussian is flexible enough to describe various features in the measured chlorophyll profiles and therefore this solution covers a wide range of situations encountered in the field. That flexibility is achieved by altering the parameters of the function, namely: B0, zm, σ, and h. The disadvantage is that in addition to the six basic quantities: α B , P B <sup>m</sup>, B0, I m 0 , D, and K, which appear in the canonical solution, the solution for the shifted Gaussian has three more: zm, σ, and h. To apply the solution, the values of these quantities need to be specified.

### Time-Dependent, Mixed-Layer Biomass

Accounting for time dependence may be advantageous when considering mixed-layer production, especially during the period of a bloom. Blooms are typically initiated by stratification, either from heating, river discharge or from sea ice melt (Mann and Lazier, 2006). After the onset of stratification, the phytoplankton

$$B\_{Z\_m}(t) = \int\_0^{Z\_m} B(z, t) \, dz. \tag{26}$$

At time t = 0 the total biomass is BZ<sup>m</sup> (0) = B0Zm. Let us also assume that the newly-synthesized mixed layer biomass at time t is redistributed through the mixed layer during a time interval 1t, so that there is no stratification in biomass at t + 1t. Mixed-layer biomass at time t + 1T is now:

$$B\_{Z\_m}(t+\Delta t) = B\_{Z\_m}(t) + \frac{1}{\chi} P\_{Z\_m}(t)\Delta t,\tag{27}$$

with PZ<sup>m</sup> (t) given as PZ<sup>m</sup> (t) = R <sup>Z</sup><sup>m</sup> 0 P(z, t) dz. In the limit of 1t → 0, that is instantaneous mixing of newly-synthesized biomass, this equation reduces to:

$$\frac{\partial}{\partial t}B\_{Z\_m}(t) = \frac{1}{\chi Z\_m} P\_{Z\_m}^{B}(t)B\_{Z\_m}(t). \tag{28}$$

Derivation of this equation is given in the Appendix C. The solution for the mixed-layer production with time-dependent biomass described by (28) and initial biomass BZ<sup>m</sup> (0) = B0Z<sup>m</sup> is:

$$P\_{Z\_m, T} = \chi B\_0 Z\_m \left[ \exp\left[ \frac{P\_m^B D}{\chi Z\_m K} \left( f(l\_\*^m) - f(l\_\*^m e^{-K Z\_m}) \right) \right] - 1 \right]. \tag{29}$$

The solution is found by the application of (15); details are given in the Appendix C. The term in the exponent is recognized as the canonical solution (8) divided by χ, Zm, and B0. It is clear that the canonical solution (8) will underestimate production in comparison with this solution. Due to photosynthesis, at a given moment there will be more biomass in the mixed layer, than in the case of time-independent biomass.

### Declining Mixed-Layer Biomass

Given that the term P B (z, t) is always positive, we have considered so far only the case of growing biomass. Here we consider the case when this term is reduced in magnitude by a loss term, and investigate the effect biomass loss has on the magnitude of daily production. With this goal, equation (12) is modified by addition of a loss term in the most general form L B :

$$\frac{\partial}{\partial t}B(z,t) = \left(\frac{1}{\chi}P^B(z,t) - L^B\right)B(z,t). \tag{30}$$

For there to be a negative growth (decline) in biomass the term L <sup>B</sup> has to be greater than P B (z, t):

$$L^B \gg \frac{1}{\chi} P^B(z, t). \tag{31}$$

In this case the solution to equation (30) at time t is:

$$B(z,t) = B\_0 e^{-L^B t},\tag{32}$$

where we have used the initial condition B(z, 0) = B0. Time dependence of this type is often observed in satellite records of surface biomass, for example during the termination phase of a bloom (Cabre et al., 2016). Therefore, to calculate mixed layer production in this case, we insert the previous expression into (11) and obtain the following integral:

$$P\_{Z\_m,T} = \int\limits\_{0}^{D} \int\limits\_{0}^{Z\_m} B\_0 e^{-L^B t} P^B(z,t) \,\mathrm{d}z \,\mathrm{d}t.\tag{33}$$

This procedure is in accord with (13). The derivation of the solution to this integral is given in the Appendix D. The final solution is:

### Relation to the Canonical Solution

In case of increasing, mixed layer biomass, the canonical solution (8) is easily recognized in the exponential function on the right hand side of expression (29). Writing the exponential as a sum and rearranging, gives:

$$\begin{split} P\_{Z\_m,T} &= \frac{B\_0 P\_m^B D}{K} \left( f(I\_\*^m) - f(I\_\*^m e^{-KZ\_m}) \right) \\ &+ B\_0 \sum\_{n=2}^{\infty} \frac{(P\_m^B D)^n}{n! (\chi Z\_m)^{n-1} K^n} \left( f(I\_\*^m) - f(I\_\*^m e^{-KZ\_m}) \right)^n. \end{split} \tag{35}$$

The first term on the right hand side is the canonical solution (8) and the additional terms arise due to time dependent biomass. When the biomass is time independent, as is the case with the canonical solution, these terms will vanish. Therefore, when biomass is time dependent the canonical solution (8) can be interpreted as the first order approximation for mixed layer production, representing the lower limit on watercolumn production. Important to note is that biomass has to increase with time in accordance with (28) for this interpretation to hold.

On the other hand, when biomass is declining with time, as stated in (32), the canonical solution will be the upper limit on daily watercolumn production, given that at each time instant there is less biomass in the mixed layer, in comparison with constant biomass B0. In this case we were unable to find an exact link between solution (34) and the canonical solution. Instead, we have found by numerical exercises, the canonical solution with noon biomass B0e −L <sup>B</sup>D/2 , in place of initial biomass B0, to be a good approximation for the full solution (34):

$$P\_{Z\_m, T} \approx B\_0 \, e^{-L^B D / 2} \, \frac{P\_m^B D}{K} \left( f(I\_\*^m) - f(I\_\*^m e^{-K Z\_m}) \right). \tag{36}$$

It is important to emphasize the initial assumption behind solution (34). For it to hold, the loss rate has to be significantly larger than normalized production (31) so that the growth due to production is insignificant in comparison with losses and (32) is valid. Therefore, solution (34) is valid when (31) holds and the approximation (36) is only then justified.

$$\begin{split} \boldsymbol{P}\_{\mathcal{Z}\_{m,T}} &= \frac{B\_0 \boldsymbol{P}\_m^B D}{K} \Bigg[ \left( \boldsymbol{e}^{-\boldsymbol{L}^B \boldsymbol{D}} + 1 \right) \sum\_{n=1}^{\infty} \frac{\left( \boldsymbol{I}\_{\ast}^m \right)^{2n-1} - \left( \boldsymbol{I}\_{\ast}^m \boldsymbol{e}^{-\boldsymbol{K} \boldsymbol{Z}\_m} \right)^{2n-1}}{\pi \left( 2n-1 \right)} \prod\_{m=1}^n \frac{1}{\left( -\boldsymbol{L}^B \boldsymbol{D}/\pi \right)^2 + (2m-1)^2} \\ &- \frac{\left( \boldsymbol{e}^{-\boldsymbol{L}^B \boldsymbol{D}} - 1 \right)}{-\boldsymbol{L}^B \boldsymbol{D}} \sum\_{n=1}^{\infty} \frac{\left( \boldsymbol{I}\_{\ast}^m \right)^{2n} - \left( \boldsymbol{I}\_{\ast}^m \boldsymbol{e}^{-\boldsymbol{K} \boldsymbol{Z}\_m} \right)^{2n}}{2n} \prod\_{m=1}^n \frac{1}{(-\boldsymbol{L}^B \boldsymbol{D}/\pi \boldsymbol{I})^2 + (2m)^2} \Bigg]. \end{split} \tag{34}$$

In the expression inside the square brackets, the loss term L B and daylength D appear as a product L <sup>B</sup>D, which imposes itself as a dimensionless factor for the problem, determining just how much the continual loss of initial biomass reduces watercolumn production. This type of loss can occur when the grazing pressure on phytoplankton is significant, or by sinking out of the mixed layer. Potentially, the magnitude of the loss term could be determined from time series of satellite-estimated surface chlorophyll.

### 3. DATA

The models described so far and the solutions derived therefrom require parameter values to be implemented and tested against data. Oceanic time series are suitable sources of such information. For our model, the Hawaii Ocean Time-series (HOT) is an ideal testing ground. Data from it have already served for other model testing, with over 600 publications testifying to the quality of the data. The entire data set is publicly available with full documentation of the methods and procedures used. Information on the data set can be found in Karl et al. (2001) and Karl and Church (2014).

HOT is a program of oceanic measurements started in 1988 at station Aloha, located near the Hawaii Islands at 22◦ 45′N 158◦W. The basic set of measurements relevant for this model encompasses: primary production using the standard in-situ implemented <sup>14</sup>C method (Steemann Nielsen, 1952), fluorimetric determination of chlorophyll concentration (Strickland and Parsons, 1972), optical measurements using biooptical profilers, surface photosynthetically-available radiation (PAR) measurement using on deck radiometer, and finally the mixed-layer depth determination based on potential density.

There are in total 194 cruises with available data. Production and chlorophyll were measured at: 5, 25, 45, 75, 100, 125, 150, and 175 m. Surface PAR was given in µE m−<sup>2</sup> s −1 and the conversion to W m−<sup>2</sup> was done using Smith and Morel's procedure (Morel and Smith, 1974). From it I m <sup>0</sup> was determined ase<sup>I</sup> m <sup>0</sup> <sup>=</sup> <sup>e</sup>ITπ/2D, where <sup>e</sup>I<sup>T</sup> is the total received irradiance throughout the day. Daylength was provided from PAR measurement. The diffuse attenuation coefficient was calculated from one percent light level which was given for 150 cruises, based on optical profiles. For the remaining cases, the average value of 0.0435 m−<sup>1</sup> has been used. Mixed-layer depth Z<sup>m</sup> was estimated by the offset of 0.125 kg m−<sup>3</sup> in potential density at depth, from the surface value (de Boyer Montegut et al., 2004; Suga et al., 2004). Photosynthesis parameters were estimated from chlorophyll and production profiles using the method of Kovac et al. (2016a,b) ˇ . The data on production, chlorophyll, PAR, one percent light depth and mixed layer depth are publicly available at hahana.soest.hawaii.edu/hot, whereas the data on the photosynthesis parameters are publicly available at jadran.izor.hr/∼kovac/parameters. There were no data on the carbon-chlorophyll ratio.

With the available data, we now proceed to test the solution for the mixed layer production and the shifted Gaussian solution. We further use the solutions with time dependent biomass to predict the influence that time-dependent biomass exerts on mixed layer production. For the measured/known value of a variable/parameter we use <sup>e</sup>, e.g., <sup>e</sup>x, whereas for a model variable/parameter we use ordinary symbols, e.g., x.

### 4. RESULTS

### Testing the Canonical Solution for Mixed-Layer Production

A straightforward way of testing the canonical solution is simply to calculate mixed layer production with it. The mixed layer is by definition a region of uniform biomass and the assumption of uniformity in biomass required by the canonical solution is fulfilled. To test the model structure we calculate mixed-layer production with expression (8) and compare it with the measured mixed layer production, calculated with the trapezoidal rule from the measured production profile. The obtained results are displayed in **Figure 2**.

For the mixed-layer biomass in expression (8) the average value of the measured biomass from the first two measuring depths was used. As can be seen from the figure, the match between the modeled and the measured mixed-layer production is quite satisfactory. The coefficient of determination is r 2 = 0.94. Therefore, the canonical solution is a good model for mixed-layer production. Some discrepancy is seen at higher values of production. These errors may be caused by the way in which the mixed-layer depth was estimated. It may not always be the case that the mixed layer depth estimated from potential density corresponds well with the depth of active mixing and it is the depth of active mixing that is relevant for biomass homogenization (Franks, 2015). In some data sets we have observed the biomass not to be homogeneous from the surface all the way down to the base of the mixed layer estimated from potential density. This increase in biomass causes more production, than would otherwise occur without it and is reflected in the slight deviation in **Figure 2**.

### Testing the Shifted Gaussian Solution

A prerequisite in the application of the solution (25) is to know the values for the parameters of the shifted Gaussian. These have to be estimated from HOT data on chlorophyll profiles. To obtain the biomass parameters we have fitted the shifted Gaussian to the measured chlorophyll profiles by adjusting the parameter values. The conjugate gradient method was used (Baldick, 2006; Knyazev and Lashuk, 2008). The shifted Gaussian was convergent for each chlorophyll profile. Average concentration of background biomass B<sup>0</sup> is 0.085 mg Chl m−<sup>3</sup> , with a standard deviation of 0.025 mg Chl m−<sup>3</sup> . Average depth of the deep chlorophyll maximum z<sup>m</sup> is 104.12 m, with a standard deviation of 19.00 m.

Average width of the biomass peak σ is 21.93 m, with a standard deviation of 8.89 m, and finally the average height of the biomass peak H is 0.175 mg Chl m−<sup>3</sup> , with a standard deviation of 0.075 mg Chl m−<sup>3</sup> .

With the estimated parameters we have further calculated the accuracy of representing the measured biomass profiles with the shifted Gaussian. The biomass given by the shifted Gaussian was calculated at the depth of each measurement and compared with the measured value. The results are shown in **Figure 3**. There are in total 1552 measurements of chlorophyll. As can be seen, the shifted Gaussian is a good model for the biomass profile at all the measuring depths except the last two (150 and 175 m). The coefficient of determination for the data from all depths is 87.84%. However, once the last two depths are excluded the coefficient of determination jumps to a high 98.39%, signifying that the shifted Gaussian is an even better model for the biomass up to the measurement depth of 125 m. The contribution to watercolumn production from depths >125 m is expected to be minimal.

The results of applying the analytical solution (24) are displayed in **Figure 4**. The solution did not exhibit convergent behavior for all the cruise data: out of 194, it converged for 168 cruises. The reason for divergence in the remaining 26 cruises comes from the behavior of the exponential terms exp (z 2 <sup>2</sup>n−<sup>1</sup> − z 2 <sup>m</sup>)/2σ 2 and exp (z 2 <sup>2</sup><sup>n</sup> − z 2 <sup>m</sup>)/2σ 2 , which upon summation in solution (24) divergee when σ is high and z<sup>m</sup> is small. This corresponds to the case of a wide chlorophyll maximum close to the surface. The solution behaves well when σ is small and z<sup>m</sup> large, which is the case of a narrow deep maximum, i.e., a deep chlorophyll maximum. As a measure of the applicability of the given solution the 3σ rule can be applied. For the Gaussian function 99% of the biomass concentration above B<sup>0</sup> is located in the depth interval (z<sup>m</sup> − 3σ, z<sup>m</sup> + 3σ). When z<sup>m</sup> is larger than 3σ this biomass is located below the surface. That is the dominant situation at HOT station and the solution converges, as evident in **Figure 4**.

# Predictions with the Time-Dependent Biomass Solutions

Application of the canonical and the shifted Gaussian solutions is straightforward, given that all the necessary parameters are available. The solutions with time-dependent biomass are more complex to apply due to the requirement for additional parameter values. However, even without knowledge of these parameter magnitudes, the solution can be used to estimate the effect of growth, or decline, in biomass on the magnitude of mixed-layer production. If the biomass is allowed to accumulate, then production is expected to increase in comparison with the case for a time-independent biomass. The opposite holds for the case of a decline in biomass. Just how strong these effects are can easily be calculated for hypothetical cases, but to be as close as possible to real scenarios we apply the solutions with the parameter values typical of HOT.

To illustrate our point, we calculate mixed-layer production using the new solutions with the median values for the mixedlayer depth, assimilation number, mixed-layer biomass, and the attenuation coefficient from HOT. We plot the solutions as a function of the dimensionless noon irradiance to demonstrate

FIGURE 3 | Scatter plot of measured <sup>e</sup><sup>B</sup> and modeled biomass <sup>B</sup> with the shifted Gaussian (21). The first six measurement depths are given in orange, whereas the data from 150 m measurement depth is given in light gray and the data from 175 m measurement depth is given in gray. The gray line represents the 1:1 model vs. data ratio. For the points above/below the gray line the shifted Gaussian overestimates/underestimates the measured biomass.

FIGURE 4 | Scatter plot of measured watercolumn production <sup>e</sup>PZ,<sup>T</sup> , calculated with the trapezoidal rule from the measured production profile, and modeled watercolumn production PZ,<sup>T</sup> , calculated with the analytical solution for the shifted Gaussian biomass (24). The solution did not converge for 26, out of 194 tested cruises.

the effect light has on the magnitude of mixed-layer production (**Figure 5**). In this exercise daylength D is set to 10 h. The blue curve on the figure gives the canonical solution for mixed layer production with the median parameter values: Z<sup>m</sup> = 54.75 m, P B <sup>m</sup> <sup>=</sup> 7.85 mg C (mg Chl)−<sup>1</sup> <sup>h</sup> −1 , B = 0.072 mg Chl m−<sup>3</sup> , and K = 0.043 m−<sup>1</sup> . The orange curve is the solution (29) with the carbon-chlorophyll ratio equal to 150 mg C (mg Chl)−<sup>1</sup> and the light orange with the carbon-chlorophyll ratio equal to to 100 mg C (mg Chl)−<sup>1</sup> . As expected, the production is higher when increase in biomass is accounted for. The red curve corresponds to the solution (34) with the dimensionless factor L <sup>B</sup>D = 1. In this example D = 10 h, which gives L <sup>B</sup> = 0.1 h−<sup>1</sup> , i.e., a 10% decrease of biomass per hour. At this rate the biomass at sunset declines to its e-folding value B(D) = B0/e. Finally the pink curve is the approximation (36), of the solution (34), with the same parameter values.

In both cases the effect on mixed layer production is significant. The effect of growth is more pronounced for phytoplankton with lower values of carbon-chlorophyll ratio. This is simply understood by considering that for higher value of χ more carbon is required for a unit increase of chlorophyll. Mathematically, χ appears in the denominator in equation (28) signifying that the change in biomass per unit time is inversely proportional to χ. A straightforward conclusion is that the growth effect on primary production will be more pronounced for phytoplankton with lower values of χ and vice versa. The effect of loss on primary production is also straightforward: the decline of biomass during the day results in lower production. The greater the loss, the greater the diminution in production in comparison with the canonical solution. The exact magnitude of the reduction is now easily calculated using solution (34). The agreement between the approximation (36) with the exact solution (34) is remarkable.

# 5. DISCUSSION

Taken together, the solutions presented cover a wide range of different mixing and growth conditions (**Figure 6**), such as might be encountered in the field: from intense mixing and low growth—canonical solution (8), intense mixing and high growth—increasing mixed-layer biomass solution (29), intense mixing and negative growth—declining mixed-layer biomass solution (34), and finally low mixing and low growth—shifted Gaussian solution (24). As all these conditions are indeed observable at times in the ocean, so too are the assumptions of the outlined solutions fulfilled to some extent, and the application of the solutions justified. The only case not solved here is of low mixing and high growth in biomass, which corresponds to the case of time-dependent stratified biomass. Analytical solution to this problem has yet to be found and is a potential topic for future theoretical work. The problem can be treated as formulated in this work, by first solving the growth equation for biomass (12) and inserting the solution directly into the integral for watercolumn production with time-dependent biomass (11).

Irrespective of the model, there are in essence three possible outcomes concerning temporal evolution of biomass. It can be constant, or accumulating or declining. Time dependence in biomass is easily seen in satellite records of chlorophyll (Racault et al., 2012; Cabre et al., 2016) which can potentially serve for assessing whether or not time dependence in biomass should be accounted for when calculating primary production.

growth, blue for uniform biomass with intense mixing and low growth, and pink for time dependent biomass with intense mixing. Solution for watercolumn production with time dependent stratified biomass has yet to be found.

Frontiers in Marine Science | www.frontiersin.org

If the chlorophyll time series displays quiet periods with no temporal dependence in biomass, the canonical solution is then a valid model for mixed layer production. The model with increasing time-dependent biomass is adequate for primary production calculation during a blooming period. This period is also easy to diagnose from satellite chlorophyll time series. After the bloom crashes and the decline of chlorophyll begins to show in the record, the assumption behind the model with declining mixed-layer biomass becomes valid. The value of the loss rate can potentially be determined empirically from the chlorophyll record. Therefore, with respect to the annual cycle of biomass, the solutions presented in this work are each appropriate for a particular period of the year when their respective assumptions are met. Calculating annual watercolumn production in this manner can potentially be a topic for future research.

In earlier literature, integrals for watercolumn production were usually formulated with time-independent biomass (Platt and Sathyendranath, 1993) and could therefore be applied at any stage during the course of the annual cycle. Calculating production in this manner proved suitable for ocean areas where field biomass was known not to change considerably during the interval over which production was calculated, and was in a sense mandatory from the practical standpoint, given that observations of biomass were predominantly performed once per day. However, this approach resulted in biomass accumulation due to primary production having no effect on daily production. Formulating production integrals with biomass constant in time presumes incremental production due to newly-synthesized biomass as insignificant in comparison with production arising from initial biomass. The approach presented here alleviates this limitation and allows a positive feedback between biomass accumulation and primary production, with newly-synthesized biomass contributing to primary production.

The approach advocated here treats the temporal evolution of biomass as being governed by the growth equation, with the growth term dictated by instantaneous production. Coupling this equation to the watercolumn production integral was achieved through the reformulation of the integral via a time-dependent biomass term multiplying the normalized production term. Due to mathematical complexities of this formulation, different approaches for handling the problem were proposed. First, the problem of depth-dependent biomass was analyzed, with an important note that in this context it was viewed as a special case of a steady state solution for biomass distribution. Assuming steady state is reasonable for non-bloom conditions in the oligotrophic ocean, during summer periods at higher latitudes and below the mixed layer (Fennel and Boss, 2003). Biomass profiles of thissort are assumed to be solutions to a more complex equation for biomass, which accounts for other processes besides growth, such as losses by grazing, mixing, and sinking (Beckman and Hense, 2007). These processes are easily included in vertically resolved models for biomass, but the price paid for their inclusion is the increase in model complexity, which makes them difficult to solve analytically, with numerical procedures stepping in as the method of choice to obtain solutions (Huisman et al., 2002; Taylor and Ferrari, 2011).

To circumvent the problem of finding an analytical, steadystate solution to a more general equation, involving not only the nonlinear, time-dependent production term, but also sinking and mixing, we have employed the shifted Gaussian as an approximation to the solution for the biomass profile at steady state. The types of profiles described by the shifted Gaussian are indeed in close agreement with the numerical simulations of the biomass profiles (Beckman and Hense, 2007; Liccardo et al., 2013). More importantly, they are often obtained as results of measurements, and biomass profiles observed in the open ocean are a prototype example for using the shifted Gaussian in the integral for watercolumn production (Platt and Sathyendranath, 1988; Platt et al., 1991b). For vast regions of the open ocean biomass is vertically structured (Longhurst and Harrison, 1989). The most common structure is that of the deep chlorophyll maximum (Navaro and Ruiz, 2013), which is a perfect match for the shifted Gaussian. The deep chlorophyll maximum is often observed to be a quasi permanent feature, existing for months, if not for the duration of the whole annual cycle (Platt and Sathyendranath, 1988). In this case the processes that act to create the deep chlorophyll maximum, and sustain it, are in equilibrium on time scales longer than that of 1 day (Chiswell et al., 2015), justifying the assumption of time independent biomass in daily watercolumn production calculations. Deep chlorophyll maxima are also observed on majority of HOT cruises, with the shifted Gaussian demonstrated here to be a good model for HOT data (**Figure 3**). From this alone stems the legitimacy of using the shifted Gaussian as a model for the biomass profile in the watercolumn production integral for HOT.

In addition, when biomass remains stratified for prolonged periods of time it is safe to assume that mixing is not vigorous enough to distort the established stratification (Liccardo et al., 2013). Mixing itself is caused by various physical agents such as wind, waves, and convection (Franks, 2015), which increase turbulent kinetic energy in the mixed layer. In numerical models this process is parameterized with the mixing coefficient through turbulence closure schemes (Cushman-Roisin and Beckers, 2011). In the presented model however, mixing is not expressed explicitly, but rather the consequences of mixing are assumed implicit through the effect it had on shaping the initial biomass profile. In the field, uniform biomass profiles are most likely associated with strong mixing, whereas stratified biomass profiles are surely associated with less intense mixing. For if mixing were intense, stratification in biomass would not come about, because any incipient stratification would be quickly eroded. This process is well represented in numerical models of the biomass profile (Beckman and Hense, 2007). When assuming a uniform biomass profile we are in fact assuming that mixing was strong enough to cause homogenization of biomass.

This is of course valid for the mixed layer, in which biomass is homogeneous by definition, but it is important to make a distinction between the mixed layer and a layer of active mixing, as highlighted by Franks (2015). The solutions for mixed layer production presented here are strictly valid for a layer of active mixing, since we assume that mixing of newly-synthesized biomass is occurring instantaneously. However, it is often the case that mixed layer depth is determined based on density, or temperature, offset from the surface value (de Boyer Montegut et al., 2004), even though the mixed layer depth determined in this way may not always correspond well with the depth of active mixing (Franks, 2015). We suspect this to be the cause for the slight bias evident in **Figure 2**. For the model presented here, the assumption of active mixing is required to redistribute the newlysynthesized biomass so that growth occurs uniformly throughout the mixed layer.

Another relevant consequence of the assumption of active mixing is that it enabled the loss rate to be assumed vertically uniform in the mixed layer. In addition to vertical uniformity, not stating the loss rate explicitly left a certain flexibility in the presented model. The loss rate of the phytoplankton population is known to be a complex mixture of various processes such as mixing, sinking, predation, and mortality (Platt et al., 1991a; Zhai et al., 2010), and in the ocean these processes can combine to give a more complex pattern than simply a vertically-uniform loss rate. However, since the work of Sverdrup (1953), it is commonly assumed in theoretical considerations of mixed layer production that losses are in fact uniform due to mixing itself, and here presented model follows this basic approach.

Contrary to the loss rate, a basic feature of the model, shared also by all the previous models of watercolumn production (Platt and Sathyendranath, 1993), is depth resolved instantaneous production, caused by the decline of light intensity with depth. This allows for the accumulation of biomass in the growth equation to proceed in accord with the well established response of primary production to light, stated in (10). Therefore, production is given a more realistic treatment, than losses are. The lack of detailed treatment of the loss terms is not a serious drawback for this model, because parameterizations for respiration, excretion, grazing by micro- and macrozooplankton, and sedimentation were already studied by Platt et al. (1991a) and Zhai et al. (2010). Inclusion of parametrization for losses given in Zhai et al. (2010) is straightforward here. Losses became as important as production in considerations of bloom dynamics and the model used here can shed some light on this topic, specifically on the Critical Depth Hypothesis.

# Implications for the Critical Depth Hypothesis

The new formulation of the mixed-layer production model also has consequences for the Critical Depth Hypothesis. To demonstrate, let us again return to the case of the mixed layer with uniform biomass and the mixing depth given by Zm. According to Sverdrup (1953), if the mixing depth is shallower than the critical depth Zcr conditions for the initiation of a bloom are fulfilled (Siegel et al., 2002; Fischer et al., 2014). Critical depth is defined as the depth at which the mixedlayer production is balanced by mixed layer losses (Platt et al., 1991a; Sathyendranath et al., 2015) and is derivable from a conservation of biomass equation (Levy, 2015; Mignot et al., 2016). Mathematical formulations of the critical depth criterion are well established in the literature (Sverdrup, 1953; Platt et al., 1991a) and Sverdrup's criterion is usually acknowledged to be a necessary and a sufficient condition for bloom initiation, but not a sufficient condition for determining the amplitude of the bloom (Platt et al., 1994). It specifies whether the bloom can occur, but leaves the bloom amplitude unspecified. With the dynamic approach for primary production calculation, we now demonstrate that Sverdrup's criterion can also be used to calculate the daily increase in mixed layer biomass.

To elaborate, let us augment the production equation (28) with the loss term, so that the time evolution of mixed layer biomass becomes:

$$\frac{\partial}{\partial t}B\_{Z\_m}(t) = \frac{1}{\chi Z\_m} \left( P\_{Z\_m}^{B}(t) - L\_{Z\_m}^{B} \right) B\_{Z\_m}(t),\tag{37}$$

where L B Zm represents total losses in the broadest sense, arising from respiration, excretion, grazing, sinking, and so on (Zhai et al., 2010). Let the growth and loss of the mixed layer biomass be on the same order of magnitude. We assume the loss rate to be vertically uniform and time independent, a justifiable assumption given the mixing. The solution to this equation at time D + N, where N marks the night interval, is then:

$$B\_{Z\_m}(D+N) = B\_{Z\_m}(0) \exp\left(\frac{1}{\chi Z\_m} \left(P\_{Z\_m,T}^B - L\_{Z\_m,T}^B\right)\right). \tag{38}$$

If there is to be an increase in biomass during a 24 h period, that is:

$$B\_{Z\_m}(D+N) > B\_{Z\_m}(0),\tag{39}$$

the term in the exponential function has to be greater than zero. It will be greater than zero when the mixed layer production surpasses the losses, that is P B <sup>Z</sup>m,<sup>T</sup> > L B Zm,T . Since the net production decreases with depth, because light intensity decreases and losses remain constant, there will be a depth at which the vertically-integrated production P B Zm,T equals vertically-integrated losses L B Zm . The depth at which the two terms exactly balance P B <sup>Z</sup>cr,<sup>T</sup> <sup>=</sup> <sup>L</sup> B Zcr,T is recognized as the critical depth Zcr, defined by Sverdrup (1953). If the mixing depth is shalower than the critical depth the term in the exponential function is positive.

Invoking the canonical solution for P B <sup>Z</sup>cr,<sup>T</sup> we get the implicit expression for the critical depth:

$$\frac{P\_m^B D}{K} \left( f(l\_\*^m) - f(l\_\*^m e^{-KZ\_{cr}}) \right) = L^B (D+N) Z\_{cr}, \tag{40}$$

where we have used L B Zcr,T = L B (D + N)Zcr. Dividing both sides by Zcr gives:

$$\frac{P\_m^B D}{K Z\_{cr}} \left( f(I\_\*^m) - f(I\_\*^m e^{-K Z\_{cr}}) \right) = L^B(D+N). \tag{41}$$

This expression states that the average production in the mixed layer equals the average loss when the mixed layer depth is equal to the critical depth, which is precisely the critical depth criterion of Sverdrup (1953). Therefore, when the mixed layer extends to the critical depth there is no accumulation of biomass. Mixing beyond the critical depth leads to losses in the mixed-layer biomass, and finally mixing not extending to the critical depth leads to accumulation of mixed-layer biomass. The mathematical condition expressing this latter statement is simply:

$$Z\_{\mathfrak{m}} < Z\_{\mathfrak{cr}, \mathfrak{h}} \tag{42}$$

with Zcr given as a solution of (41). If this condition is met so is condition (39) and accumulation of mixed-layer biomass occurs. Whenever condition (42) is met, there will be a positive increase in mixed layer biomass over the course of 1 day of the following magnitude:

$$\Delta B\_{Z\_m} = B\_{Z\_m}(0)\left[\exp\left(\frac{1}{\chi Z\_m}\left(P\_{Z\_m,T}^B - L\_{Z\_m,T}^B\right)\right) - 1\right].\tag{43}$$

The critical-depth criterion can now be restated as: when the mixed layer depth is shallower/deeper than the critical depth, there is an increase/decrease in mixed layer biomass of magnitude 1BZ<sup>m</sup> . If the critical depth and the mixed-layer depth are equal, the biomass remains constant.

### Implications for Remote Sensing

Applying the models presented in this work as estimators of watercolumn production, based on remotely-sensed data on ocean color, is straightforward. The solutions can be used as a part of existing remote sensing algorithms for watercolumn production, requiring only alterations to be made on the module that calculates watercolumn production, with the analytical solutions taking place of the commonly employed numerical ones. The models are formulated in a similar fashion to our previous ones and when implemented require information on the same parameters and variables (Platt and Sathyendranath, 1993), of which the following are accessible by remote sensing: chlorophyll concentration, surface irradiance and the attenuation coefficient. A relevant distinction from the previous models concerns the assumption of time dependence in biomass. This has implications for remote sensing applications, given that all prior models assumed biomass constant in time, implying that sampling biomass at any time of the day was sufficient for these estimators. However, in the newly presented models temporal evolution of biomass is accounted for and we can ask how the biomass sampled at a specific time of day relates to the initial biomass required by the models.

Acknowledging that ocean color satellites have access to surface chlorophyll (approximately up to the first photic depth 1/K) and in line with the models presented so far, we write the equation for the time evolution of surface chlorophyll B(0, t) as:

$$\frac{\partial}{\partial t}B(0,t) = \left(\frac{1}{\chi}P^B(0,t) - L^B\right)B(0,t),\tag{44}$$

ignoring advection and mixing, which makes it valid for laterally uniform fields, or for time scales short enough so that neither advection, nor mixing, cause significant changes in biomass over the course of integration. Let us assume that the satellite samples surface biomass at time <sup>t</sup><sup>s</sup> and label it <sup>e</sup>B(0, <sup>t</sup>s). From the previous equation surface biomass at time t<sup>s</sup> is:

$$B^\*(0, t\_s) = B(0, 0) \exp\left(\int\_0^{t\_s} \left(\frac{1}{\chi} P^B(0, t) - L^B\right) dt\right). \tag{45}$$

Equating the remotely sensed biomass <sup>e</sup>B(0, <sup>t</sup>s) with <sup>B</sup> ∗ (0, ts) enables as to express the initial surface biomass as:

$$B(0,0) = \widetilde{B}(0,t\_i) \exp\left(-\int\_0^{t\_i} \left(\frac{1}{\varkappa}P^B(0,t) - L^B\right)dt\right). \tag{46}$$

This expression takes the remotely sensed biomass <sup>e</sup>B(0, <sup>t</sup>s) and transforms it into the initial biomass B(0, 0), under the assumption that biomass evolves according to equation (44). It corrects the remotely sensed surface biomass for dynamical processes of growth and loss, to yield the initial biomass. Taking the satellite overpass time to be at local noon t<sup>s</sup> = D/2, further enables us to express initial biomass explicitly as:

$$B(0,0) = \widetilde{B}(0,D/2) \exp\left(-\frac{D}{2} \left(\frac{P\_m^B}{\chi} f\_{\overline{z}}(l\_\*^m) - L^B\right)\right). \tag{47}$$

According to these expressions, having dynamically evolving biomass in the model, affects not only the magnitude of watercolumn production, but also the way in which initial biomass should be calculated from remotely sensed biomass, to compensate for growth and loss. It is important to note that the correction is not linear, but exponential, with respect to production and loss.

### 6. CONCLUSIONS

The work presented here extends on the standard formulation of daily watercolumn production by allowing for depth- and time-resolved biomass. In the standard formulation, biomass was specified in advance and treated as unrelated to primary production (Platt and Sathyendranath, 1993), leaving prior models without proper dynamics in this regard. To avoid having this problem, we proposed an alternative approach and stated a growth equation for biomass, thus allowing for a time-dependent solution in biomass. Coupling this equation to the watercolumn production integral was achieved by reformulating the integral with time-dependent biomass. Therefore, biomass was related to growth, and as such, subsequently used in the watercolumn production integral.

Depth-resolved biomass was set via an initial condition and we distinguished two possibilities regarding depth dependence in biomass: stratified water column and a mixed layer. For the mixed layer, we further distinguished between growing and declining biomass, providing analytical solutions for watercolumn production in both cases. For the stratified water column we used the shifted Gaussian function to represent biomass profiles and derived an exact analytical solution for daily watercolumn production in this case. No analytical solutions to these problems have been reported in the literature until now.

The new solutions were tested with data from the HOT program. The shifted Gaussian was used to model biomass profiles and it was demonstrated to be a good model. The canonical solution for mixed layer production and the solution with the shifted Gaussian were applied as models of watercolumn production. Both analytical solutions proved to be good models for this open ocean station. The solutions for growing and declining biomass were used to predict mixed layer production in the cases where biomass was increasing/decreasing in accordance with the assumptions of the models.

Sverdrup's critical depth criterion was explored further and an exact expression for mixed layer biomass increment during 1 day was derived. The final statement of the critical depth criterion remained unaltered, although it was based on the argument of growth, whereas prior statements of the criterion were based on the balance between watercolumn production and losses (Sverdrup, 1953; Platt et al., 1991a). The two approaches are now seen equivalent with respect to the final outcome, that being the critical depth criterion.

It was further discussed how to merge the temporally-evolving surface biomass with the remotely-sensed surface biomass, to get to the initial condition on biomass. We expect the processes of growth and loss to affect surface biomass significantly during the initiation or termination phases of a bloom, when the biomass is changing rapidly on time scales shorter than 1 day. A potential course for future research would be to implement this approach in producing maps of chlorophyll from remotely-sensed data.

# AUTHOR CONTRIBUTIONS

CG and TP had the original ideas. TP, SS, and ŽK formulated the mathematical models. ŽK, TP, and SA solved the problems

### REFERENCES


analytically. SA did the mathematical analyses of the derivations. CG found the approximate solution. ŽK implemented the models. ŽK and TP wrote the original draft. SS, SA, MM, and CG provided critical reviews and commentary on the draft, and wrote the final manuscript.

### FUNDING

This work has been supported in part by Croatian Science Foundation under the projects: MARIPLAN (IP-2014-09-3606) and SCOOL (IP-2014-09-5747). This work is a contribution to the European Space Agency Projects "STSE Marine Primary Production: Model Parameters from Space" and "SEOM Photosynthetically Active Radiation for Primary Production." Additional support from the National Centre for Earth Observation of the Natural Environment Research Council (UK) is also acknowledged. TP acknowledges the support provided by a Jawaharlal Nehru Science Fellowship (Government of India).

### ACKNOWLEDGMENTS

We acknowledge the Hawaii Ocean Time Series for the publicly available database (hahana.soest.hawaii.edu/hot). We thank their scientists and staff, and in particular David Karl for his interest.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fmars. 2017.00163/full#supplementary-material


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Kovaˇc, Platt, Antunovi´c, Sathyendranath, Morovi´c and Gallegos. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Synergistic Exploitation of Hyperand Multi-Spectral Precursor Sentinel Measurements to Determine Phytoplankton Functional Types (SynSenPFT)

Svetlana N. Losa<sup>1</sup> \*, Mariana A. Soppa<sup>1</sup> , Tilman Dinter 1, 2, Aleksandra Wolanin<sup>2</sup> , Robert J. W. Brewin3, 4, Annick Bricaud5, 6, Julia Oelker <sup>2</sup> , Ilka Peeken<sup>1</sup> , Bernard Gentili 5, 6 , Vladimir Rozanov <sup>2</sup> and Astrid Bracher 1, 2

<sup>1</sup> Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Bremerhaven, Germany, <sup>2</sup> Institute of Environmental Physics, University of Bremen, Bremen, Germany, <sup>3</sup> Plymouth Marine Laboratory, Plymouth, United Kingdom, <sup>4</sup> Plymouth Marine Laboratory, National Centre for Earth Observation, Plymouth, United Kingdom, <sup>5</sup> Sorbonne Universités, UPMC-Université Paris-VI, UMR 7093, Laboratoire d'Océanographie de Villefranche, Observatoire Océanologique, Villefranche-sur-Mer, France, <sup>6</sup> Centre National de la Recherche Scientifique, UMR 7093, LOV, Observatoire océanologique, Villefranche-sur-Mer, France

### Edited by:

Laura Lorenzoni, University of South Florida, United States

### Reviewed by:

Lionel Arteaga, Princeton University, United States Raphael M. Kudela, University of California, Santa Cruz, United States Cecile S. Rousseaux, USRA, United States

> \*Correspondence: Svetlana N. Losa svetlana.losa@AWI.de

### Specialty section:

This article was submitted to Ocean Observation, a section of the journal Frontiers in Marine Science

Received: 08 March 2017 Accepted: 13 June 2017 Published: 05 July 2017

### Citation:

Losa SN, Soppa MA, Dinter T, Wolanin A, Brewin RJW, Bricaud A, Oelker J, Peeken I, Gentili B, Rozanov V and Bracher A (2017) Synergistic Exploitation of Hyper- and Multi-Spectral Precursor Sentinel Measurements to Determine Phytoplankton Functional Types (SynSenPFT). Front. Mar. Sci. 4:203. doi: 10.3389/fmars.2017.00203 We derive the chlorophyll a concentration (Chla) for three main phytoplankton functional types (PFTs) – diatoms, coccolithophores and cyanobacteria – by combining satellite multispectral-based information, being of a high spatial and temporal resolution, with retrievals based on high resolution of PFT absorption properties derived from hyperspectral satellite measurements. The multispectral-based PFT Chla retrievals are based on a revised version of the empirical OC-PFT algorithm applied to the Ocean Color Climate Change Initiative (OC-CCI) total Chla product. The PhytoDOAS analytical algorithm is used with some modifications to derive PFT Chla from SCIAMACHY hyperspectral measurements. To combine synergistically these two PFT products (OC-PFT and PhytoDOAS), an optimal interpolation is performed for each PFT in every OC-PFT sub-pixel within a PhytoDOAS pixel, given its Chla and its a priori error statistics. The synergistic product (SynSenPFT) is presented for the period of August 2002 March 2012 and evaluated against PFT Chla data obtained from in situ marker pigment data and the NASA Ocean Biogeochemical Model simulations and satellite information on phytoplankton size. The most challenging aspects of the SynSenPFT algorithm implementation are discussed. Perspectives on SynSenPFT product improvements and prolongation of the time series over the next decades by adaptation to Sentinel multi- and hyperspectral instruments are highlighted.

Keywords: synergistic, Sentinel, satellite retrievals, phytoplankton functional type

# 1. INTRODUCTION

Phytoplankton supplies over 90% of the nutrition consumed by the higher trophic levels of the marine ecosystem and contributes to 50% of the global primary production (Field et al., 1998). Therefore, it is very important for global biogeochemical fluxes (e.g., carbon) since it fixes atmospheric carbon, CO2, and produces organic carbon compounds. In combination with physical factors, this process helps to determine which part of the ocean is a sink or source of CO<sup>2</sup> (Laufkötter et al., 2016). Furthermore, global biogeochemical fluxes can be impacted by the size and composition of phytoplankton, in addition to the structure of the trophic community. Ocean color remote sensing has revolutionized our understanding of these processes in the past decades by providing globally continuous data on surface chlorophyll a concentration (Chla, mg m−<sup>3</sup> ). Chla, however, is an index of total phytoplankton biomass within which each phytoplankton group has a specific morphology and photophysiology and plays a particular role in biogeochemical cycling. For instance, diatoms are the phytoplankton silicifiers, which contribute to most of the primary production and biomass during the spring bloom in temperate and polar regions (Buesseler, 1998). Their importance is related to the efficiency of carbon export through the direct sinking of single cells, key grazing pathways and through mass sedimentation events at the end of the spring blooms when nutrients are depleted (Le Qur et al., 2005). Coccolithophores are the main planktonic calcifiers in the ocean: through building and releasing calcium carbonate plates, coccoliths, coccolithophores make a major contribution to the total content of particulate inorganic carbon in the open oceans (Milliman, 1993; Ackleson et al., 1988). Cyanobacteria regenerate nutrients and, therefore, influence the marine recycled production (Waterbury et al., 1986; Morán et al., 2004).

The ability to observe the phenology and variability of different phytoplankton groups simultaneously is "a scientific priority for understanding the marine food web, and ultimately predicting the oceans role in regulating climate and responding to climate change on various time scales" (Bracher et al., 2017b). Highly resolved information about phytoplankton diversity is also essentially required for a variety of socio-economic applications dealing with predicting harmful algal blooms, eutrophication, hypoxia and other events affecting water quality (see IOCCG 2009). Nearly all global ocean color products presently available are retrieved from multispectral ocean color sensors (IOCCG, 2014; Mouw et al., 2017) providing information on water leaving radiance at number of spectral bands (up to 5 used for ocean color) with the bands width varying from 7.5 nm to 20 nm. Compared to available sensors allowing for more detailed spectral information (hyperspectral), multispectral sensors provide data with relatively high spatial (∼1 to 4.6 km) resolution and temporal (within a few days) coverage. This spatial and temporal resolution as well as the legacy of the data make them especially attractive for exploiting in phytoplankton diversity and phenology.

Some of the well-known multispectral-based algorithms differentiating phytoplankton groups (IOCCG 2014) are applied to the total Chla (TChla) product that due to strong absorption of Chla (a pigment produced by all phytoplankton species) at 443 nm can be retrieved quite accurately given the spectral resolution and radiometric sensitivity of the ocean color sensors (Mouw et al., 2017). To derive either the fraction or Chla of phytoplankton size classes (PSC) (Uitz et al., 2006; Brewin et al., 2010, 2015; Hirata et al., 2011) or phytoplankton functional types (PFT) (Hirata et al., 2011), these so-called abundancebased approaches use empirical relationships between TChla and in situ marker pigments determined using high precision liquid chromatography (HPLC). The HPLC method itself is based on spectral absorption properties of the specific phytoplankton marker pigments, which allows to identify particular phytoplankton groups. However, the empirical nature of their relation to TChla (usually estimated in a global context) leads to some limitations of the abundance-based approaches: they cannot predict atypical associations, the relationships derived may differ regionally (Soppa et al., 2014), may vary with environmental conditions and thus the model parameters may change in a future ocean state (Brewin et al., 2010; Ward, 2015; Brewin et al., 2017) or if shifts in phytoplankton composition occur without any change in TChla. For instance, the shift from diatom-dominated composition to haptophyte-dominated, as observed in the Arctic Ocean under resent warming conditions and indicated by (Nöthig et al., 2015).

Other so-called spectral-based approaches, besides being different in underlying physical principles and input information (radiance, absorption, backscattering), rely explicitly on spectral signatures of specific PFT or PSC retrieved directly from the input information (or satellite radiometric measurements) (for a comprehensive overview see Mouw et al. 2017) and, therefore, have the potential to discriminate blooms of phytoplankton groups which are not correlated with TChla. These methods also use to some extent empirical relationships by parameterizing their spectral models based on, for example, choosing certain (mean) phytoplankton absorption spectra measured in the field as representative of specific PSC or PFT. Current discretization of multispectral-based inputs (up to five wavebands) limits the ability to differentiate among the optical imprints of different

**Abbreviations:** AWI, Alfred Wegener Institute; Chla, chlorophyll "a" concentration; c-micro, microphytoplankton Chla; CNRS, Centre National de la Recherche Scientifique; c-nano, nanophytoplankton Chla; c-pico, picophytoplankton Chla; coc, coccolithophores Chla; cya, cyanobacteria Chla; dia, diatom Chla; DOAS, Differential Optical Absorption Spectroscopy; DPA, Diagnostic Pigment Analysis; ESA, European Space Agency; f-micro, fraction of microplankton Chla to total Chla; f-nano, fraction of nanoplankton Chla to TChla; f-pico, fraction of picoplankton Chla to TChla; f-PSC, fraction of phytoplankton size class Chla to TChla; IOCCG, International Ocean-Color Coordinating Group; IOP, Inherent Optical Properties; IUP, Institute of Environmental Physics; LOV, Laboratoire d'Oceanographie de Villefranche; LUT, Look Up Table; MAE, Mean Absolute Error; MAD, Mean Absolute Differences; micro, microplankton; nano, nanoplankton; NASA, National Aeronautics and Space Administration; NOBM, NASA Ocean Biogeochemical Model; OCI, Ocean Color Instrument; OC-PFT, Algorithm of Hirata et al. (2011) to retrieve phytoplankton functional types; OC-CCI, Ocean Color Climate Change Initiative; OI, Optimal Interpolation; OLCI, Ocean Land Color Instrument; OMI, Ozone Monitoring Instrument; PACE, Plankton, Aerosol, Cloud, ocean Ecosystem; PCA, Principal Component Analysis; PFT, Phytoplankton Functional Type; PhytoDOAS, DOAS applied for retrieval of phytoplankton and PFT biomass; pico, picoplankton; PSC, Phytoplankton Size Class; PML, Plymouth Marine Laboratory; PVR, product validation report; QAA Quasi-Analytical Algorithm; RMSD, root mean square difference; Rrs, Rotational Raman Scattering; RTM, Radiative Transfer Model; SCIAMACHY, Scanning Imaging Absorption Spectrometers for Atmospheric Chartography; SCIATRAN, Radiative Transfer Model and Retrieval Algorithm; SEOM, Scientific Exploitation of Operational Missions; Sf, phytoplankton size factor; SynSenPFT, Synergistic hyper- and multispectral satellite PFT product; SZA, Solar zenith angle; TChla, total chlorophyll "a" concentration; TC, Triple collocation; TOA, Top of Atmosphere; TROPOMI, TROPOspheric Monitoring Instrument; UV-A, Ultraviolet A; VRS, Vibrational Raman Scattering; WF, Weighting-function.

water constituents. Although phytoplankton types have different marker pigments, the differences in the spectral absorption structures can be small, since they also have many pigments in common. The small number of wavelength bands and the broadband resolution of multispectral sensors provide only limited information on the difference of the phytoplankton absorption structures (Bricaud et al., 2004; Organelli et al., 2011). In the study by Organelli et al. (2013) it was highlighted that the use of hyperspectral data (radiometric information with a spectral resolution higher than those coming from the current ocean color sensors) would allow to improve the accuracy of spectral-based phytoplankton composition retrievals. This was also supported by Wolanin et al. (2016b) who applied PFT retrievals to a large synthetically simulated data set testing different band settings for multispectral data and various resolutions of hyperspectral information. The authors showed that hyperspectral data are the most beneficial for various PFT retrievals.

Former and current satellite instruments with a very high spectral resolution (1 nm and higher) provide the opportunity for distinguishing more accurately multiple PFTs using spectral approaches. The capability to retrieve quantitatively major PFTs (even several PFTs simultaneously) based on their absorption properties has been demonstrated with the Phytoplankton Differential Optical Absorption Spectroscopy (PhytoDOAS) method (Bracher et al., 2009; Sadeghi et al., 2012) in the open ocean using hyperspectral satellite data from the sensor "SCanning Imaging Absorption Spectrometer for Atmospheric CHartographY" (SCIAMACHY). Being originally developed for atmospheric applications, hyperspectral sensors like SCIAMACHY do not provide operational water-leaving radiance products as do ocean color sensors. Hence, the PhytoDOAS algorithm was designed to retrieve three PFTs directly from top-of-atmosphere radiances, which requires handling properly strong atmospheric absorbers. Due to the high spectral resolution of SCIAMACHY (<0.5 nm) it is possible to separate high frequency absorptions features of each particular PFT from optical signature of the relevant atmospheric absorbers (broad band effects are accounted for by a low order polynomial). However, the global exploitation of hyperspectral satellite data for ocean color applications has been so far very limited, since the pixel size of these data is very large (30 km by 60 km per pixel) and global coverage by these measurements is reached only within six days. This circumstance constrains any assessment of the retrievals accuracy with in situ point measurements. It also limits the application of hyperspectral-based PFT data products.

The aim of this study was to overcome the aforementioned short-comings of current multispectral PFT products (supplying either dominant groups only, Bracher et al. 2017a, or data products with strong linkage to a priori information) and of current PhytoDOAS-data products (low temporal and spatial coverage). This was done based on the synergistic use of low spatial resolution hyperspectral data with higher spatial and temporal resolution multispectral data. In this study a quantitative estimation of the abundance (given as Chla in mg/m<sup>3</sup> ) of the same (diatoms and cyanobacteria/i.e., prokaryotic phytoplankton) or similar (coccolithophores vs. haptophytes) PFTs are obtained by the PhytoDOAS (Bracher et al., 2009; Sadeghi et al., 2012) and by the abundance-based OC-PFT algorithm (Hirata et al., 2011). Comparison to other satellite products bears the limitation that different aspects of diversity of phytoplankton groupings are compared (see also Bracher et al. 2017a for mismatch between satellite products). **Figure 1** summarizes the different classifications of phytoplankton used as PFTs within the study (PhytoDOAS: Bracher et al. 2009, OC-PFT: Hirata et al. 2011). Note that OC-PFT retrieves haptophytes while PhytoDOAS coccolithophores, a sub-group of haptophytes (often dominating). **Figure 1** also illustrates how the considered PFTs relate to main PSCs (Brewin et al., 2010, 2015) and size factor (Sf, Ciotti and Bricaud 2006).

This paper highlights the first development of a synergistic algorithm (SynSenPFT) and shows its potential via evaluation against in situ PFT data (derived from HPLC) and intercomparison with the satellite phytoplankton composition products focusing on the variation in phytoplankton size (Ciotti and Bricaud, 2006; Brewin et al., 2010, 2015) and the PFT products from the NASA ocean biogeochemical model (NOBM, Gregg and Casey 2007). Given the evaluation results, we discuss the potential of this method for future applications to other recent and upcoming sensors data, such as the Ocean Land Color Instrument (OLCI) on Sentinel-3 (in operation since 2016), the hyperspectral sensor Ozone Monitoring Instrument (OMI) on Aura (in operation since 2004) and the TROPOspheric Monitoring Instrument (TROPOMI) on Sentinel-5P to be launched in 2017. Such a synergistic approach shall enable a PFT time series from 2002 onwards which can then be extended by exploiting the data of the hyperspectral global Ocean Color

FIGURE 1 | Diagram describing the different phytoplankton classifications given by the different data products of this study and how they relate to each other: functional types (PFTs), size classes (PSCs) and size factor (Sf). In bold are the data used in this study: in the SynSenPFT algorithm (PFTs) and for validation (PSC and Sf). The numbers in the boxes indicate the corresponding references of the products used in this study: <sup>1</sup>Ciotti and Bricaud (2006), <sup>2</sup>Brewin et al. (2010, 2015), <sup>3</sup>Hirata et al. (2011), <sup>4</sup>Bracher et al. (2009) and Sadeghi et al. (2012). Note that different PFTs can vary among size classes, e.g., diatoms also can spread into the nanoplankton fraction.

Instrument (OCI) planned by NASA for launching within the "Plankton, Aerosol, Cloud, ocean Ecosystem" (PACE) mission in the early 2020 (see Gregg and Rousseaux 2017).

The structure of the paper is as follows. Section 2 introduces the materials and methods used: the input PhytoDOAS and OC-PFT products with the theoretical description and updates to the algorithms provided; the independent PFT in situ observations and satellite retrievals; a method to evaluate the initial products; details on the synergistic algorithm SynSenPFT and its implementation using the OC-PFT and PhytoDOAS data products. Section 3 provides results and discussions on intercomparison of the initial OC-PFT and PhytoDOAS Chla products, introduces an example of the SynSenPFT product and discusses the SynSenPFT product evaluation. Summary and Outlook are presented in Section 4.

# 2. MATERIALS AND METHODS

# 2.1. Initial PFT Products

### 2.1.1. Multispectral Retrieval: OC-PFT

The OC-PFT, an abundance based approach developed by Hirata et al. (2011), was applied to multispectral-based satellite-derived TChla (OC-CCI version 2, https://rsg.pml.ac.uk/thredds/catalogcci.html, OC-CCI 2015) with revised empirical relationships between TChla and PFTs or PSCs to retrieve diatom, haptophyte and prokaryotic phytoplankton Chla. Being an empirical approach, the OC-PFT requires refinement when additional data become available to improve the retrievals for both global and under-sampled oceans, as shown by Soppa et al. (2014). Thus, the OC-PFT algorithm of Hirata et al. (2011) was revised using a larger and more evenly spatially distributed data set of in situ phytoplankton pigment data (Section 2.2.1) compared to Hirata et al. (2011).

Following Hirata et al. (2011) the Diagnostic Pigment Analysis (DPA) developed by Vidussi et al. (2001) and modified by Uitz et al. (2006) was used to derive PFTs and PSCs (microplankton, nanoplankton and picoplankton) from a large global in situ HPLC phytoplankton pigment data set (**Table 1** in 2.2.1, detailed description is provided in Supplementary Section 1). Based on the fractions of PFTs (f-PFT) and TChla in each in situ sample, a statistical model of the relationship between TChla and f-PFT was built. Such a statistical model when applied to satellite TChla data allows to retrieve the f-PFT on a global scale. The revised statistical models and the new ones proposed were found to represent well the relationship between the f-PFTs (diatom, haptophyte and prokaryote, in particular) and TChla (see Supplementary Section 2). Based on the results, the model of Soppa et al. (2014) for f-Diatoms with here revised parameters and the new models for f-Haptophytes and f-Prokaryotes based on Hirata et al. (2011), also with revised parameters, were chosen to retrieve the f-PFTs.

Given the models, their statistical parameters and OC-CCI TChla, the diatom, haptophyte and cyanobacteria were derived in a fractional form (f-PFT) and then in terms of abundance (mgChla/m<sup>3</sup> ). To retrieve the PFT abundance, the f-PFT is multiplied by the TChla value of each sample/pixel. This OC-PFT Chla data product is provided globally over the ocean with the TABLE 1 | List of databases, two French campaigns (Atalante-3, KEOPS, and Bonus Good Hope) and German research vessels (RV) Maria S. Merian, Meteor, Polarstern, Poseidon and Sonne.


(Continued)

TABLE 1 | Continued


Cruises with RV Polarstern to the Southern hemisphere are labeled as ANT, to the Northern hemisphere as ARK, with RV Poseidon as POS, with RV Maria S. Merian with MSM, with RV Meteor as M and with RV Sonne as SO. A campaign in Indonesian waters was conducted from a small fisher boat.

∼4 km (sinusoidal) spatial resolution over the period of August 2002–March 2012 on a daily time scale.

### 2.1.2. Hyperspectral-Based Retrieval: PhytoDOAS

The other PFT Chla data products considered in this study are based on the Differential Optical Absorption Spectroscopy (DOAS) method (Perner and Platt, 1979) exploited with respect to phytoplankton absorption properties (PhytoDOAS), as introduced by Bracher et al. (2009). The current PhytoDOAS PFT retrieval of Chla for cyanobacteria (that includes all prokaryotic phytoplankton), diatoms and coccolithophores was obtained with the algorithm PhytoDOAS version 3.3 applied to SCIAMACHY hyperspectral measurements of the radiance at the top of atmosphere (TOA) for the sensor's operation period of August 2002–March 2012. The cyanobacteria and diatom PhytoDOAS retrievals are based on the algorithm by Bracher et al. (2009). For coccolithophores the algorithm by Sadeghi et al. (2012) has been used. Both algorithms, however, have been slightly modified to obtain optimal results for the whole time series. In particular, the changes include the following:


using a radiative transfer model (RTM) simulated background spectrum as done in Dinter et al. (2015).


Within the current study we used seven days composites of the PhytoDOAS PFT Chla retrievals interpolated onto a 0.5◦ × 0.5◦ grid covering the global ocean on a daily basis. Data are available at http://doi.pangea.de/10.1594/PANGAEA.870486 (Bracher et al., 2017b).

# 2.2. Independent PFT Estimates Used for Evaluation

### 2.2.1. In situ Observations

The in situ data set used for the SynSenPFT satellite validation and improvement of OC-PFT approach was built with HPLC pigment data compiled from several databases and individual cruises (**Table 1**). The spatial distribution of the in situ data set is depicted in Figure S5 (Supplementary Section 3). Chla of PFTs from in situ phytoplankton pigment data was derived using the Diagnostic Pigment Analysis of Vidussi et al. (2001) and Uitz et al. (2006) modified as in Hirata et al. (2011) and Brewin et al. (2015). The weights were revised in accordance to this enlarged in situ pigment data set (**Table 1**). We refer the reader to the Supplementary Material (Section 1) for more details. The final version of the global data set is available at https://doi.pangaea.de/10.1594/PANGAEA.875879 (Soppa et al., 2017).

### 2.2.2. Size Factor

The estimate of the dimensionless size factor (Sf), following Ciotti et al. (2002) and Ciotti and Bricaud (2006), is based on the fact that the shape of the phytoplankton absorption spectrum flattens with increasing cell size ("packaging effect"). Sf varies between 0 and 1, which represent the extreme situations (100 % microphytoplankton and 100 % picophytoplankton, respectively). The Sf value thus provides information on the dominant size of the population: values close to 0 (1) indicate dominance of microplankton (picoplankton). Note that the contribution of nanophytoplankton is not explicitly taken into account. Also, for a given population, the shape of the absorption spectrum (and therefore the Sf value) can be affected by photoacclimation to ambient light. On the other hand, as the method is spectral-based, Sf variations are not constrained by Chla variations. The Sf maps have been derived from remote sensing reflectances data from daily OC-CCI (version 2) products covering the same time period as the OC-PFT data set. First, the total absorption coefficients were computed using the Quasi-Analytical Algorithm (QAA) (Lee et al., 2002) version 6. This step was followed by an optimization procedure (revised from Bricaud et al. 2012) which retrieves three independent variables: Sf and two parameters related to Colored Detrital Matter absorption properties. The Sf data were obtained at daily temporal resolution binned globally on a 4 km sinusoidal grid.

### 2.2.3. Phytoplankton Size Classes

The model of Brewin et al. (2010), using global parameters from Brewin et al. (2015), was used to compute the fractions of total chlorophyll for three PSC (pico- < 2 µm, nano- 2- 20 µm and micro-phytoplankton > 20 µm). The model is designed to estimate the fractions of the three phytoplankton size classes as a continuous function of TChla. Based on the work of Sathyendranath et al. (2001), the model assumes small cells are incapable of growing beyond a particular Chla, with an upper limit imposed possibly from a combination of bottom-up (e.g., nutrient control) and top-down (e.g., grazing) processes, and that, beyond this value, chlorophyll is added to a system solely by the addition of larger size classes of phytoplankton (see Raimbault et al. 1988; Chisholm 1992). For further details on the model, the reader is referred to Brewin et al. (2010, 2015). The model was applied to the same daily OC-CCI TChla data set as used for the OC-PFT product, for the period of August 2002 – March 2012, producing global data on the fractions of the three size classes. The PSC data (f-micro, f-nano and f-pico) and TChla were obtained with the same spatial and temporal resolution as used for OC-PFT and Sf. TChla data where then multiplied by f-PSC to obtain the respective Chla of PSC (c-PSC).

### 2.2.4. Phytoplankton Functional Types from the NASA Ocean Biogeochemical Model

The NOBM (Gregg and Casey, 2007) is a global biogeochemical model with coupled circulation and radiative models (Gregg, 2002). NOBM simulations combine assimilated global SeaWiFS and MODIS Chla data with global data sets on nutrient distributions, sea surface temperature and current conditions to calculate four PFTs: diatoms, coccolithophores, cyanobacteria (as defined in PhytoDOAS) and chlorophytes. NOBM PFT data span from 84◦ S to 72◦ N on a 1.25◦ by 2/3 ◦ of longitude/latitude grid. Data were obtained at daily and monthly temporal resolution at http://giovanni.gsfc.nasa.gov/giovanni/ for the same time period as other aforementioned products.

### 2.3. Initial (Input) Product Evaluation

Given the PFT Chla information based on two distinct algorithms, different not only with respect to temporal and spatial resolution but also with respect to underlying physical and/or statistical principles, it is essential to compare these OC-PFT and PhytoDOAS PFT data against each other to evaluate whether/where and to what extent one could expect the retrieval algorithms to provide similar (different) results. For this purpose, we consider the triple collocation (TC) analysis (Stoffelen, 1998) known as a powerful tool for global scale product evaluation and intercomparisons (Gruber et al., 2016).

The triple collocation method requires analysing three independent products (but, following Zwieback et al. 2012, can be extended to a larger amount of products) describing exactly the same variable state: Chla of PFT in our case. To complete the required triplet, the PFT Chla estimates of the NOBM (Gregg and Casey, 2007) were considered in addition to those of PhytoDOAS and OC-PFT products. Opposed to the PSC and Sf algorithms, the NOBM provides estimations of diatom, coccolithophores and cyanobacteria abundance in terms of Chla as the PhytoDOAS and OC-PFT do (except that OC-PFT provides haptophytes Chla instead of coccolithophores Chla). The TC method is briefly described in the following subsection.

### 2.3.1. Triple Collocation Analysis

The TC allows to estimate the absolute error variances (σ 2 εi ), or the root-mean-square deviation (RMSD, σε<sup>i</sup> ) of three collocated data sets (i = {1, 2, 3}, in our case i = {PhytoDOAS, OC-PFT, NOBM}) with unknown uncertainties and assuming uncorrelated errors. Among other assumptions underlying the method are the linearity, stationarity of the signal and errors, and independence of the error from the variability of the measured signal itself (orthogonality). Under the aforementioned assumptions, the σε<sup>i</sup> can be estimated from the unique terms covariance matrix (McColl et al., 2014) (Q11, Q12, Q13, Q22, Q23, Q33):

$$\sigma\_{\ell i} = \begin{bmatrix} \sqrt{Q\_{11} - \frac{Q\_{12}Q\_{13}}{Q\_{23}}}\\ \sqrt{Q\_{22} - \frac{Q\_{12}Q\_{23}}{Q\_{13}}}\\ \sqrt{Q\_{33} - \frac{Q\_{13}Q\_{23}}{Q\_{12}}} \end{bmatrix} \tag{1}$$

It is worth mentioning that the TC analysis is usually used for calibration purposes (Stoffelen, 1998; Vogelzang et al., 2011). The estimates provided by the analysis are normally so-called "unscaled" and include also any biases, if available, of the particular data product with respect to the "truth". The "truth" is determined by the "joint covariance" σ 2 2 observed by all three analyzed data products. Here we consider the "unscaled" uncertainties σε<sup>i</sup> as a merit of difference between the distinct features of PFT Chla temporal variability observed and described by the satellite retrievals (PhytoDOAS and OC-PFT) and NOBM numerical model over a particular time period.

### 2.4. Synergistic Product

The synergistic (SynSenPFT) product is obtained as Chla for diatoms, coccolithophores and cyanobacteria presented globally on a 4 km sinusoidal grid on a daily basis over the period of August 2002 – March 2012.

### 2.4.1. SynSenPFT Algorithm

The SynSenPFT combines OC-PFT and PhytoDOAS level-3 Chla products with an optimal interpolation (OI, Gandin and Hardin, 1965). In a generalized form, the OI method is formulated as following:

$$\mathbf{x}^a = \mathbf{x}^b + W(\mathbf{y} - H\mathbf{x}^b) \tag{2}$$

x <sup>a</sup> denotes state analysis, in our case SynSenPFT product; x b is a background, for the particular application it is the OC-PFT product. y refers to observations – PhytoDOAS. H is the observation operator projecting OC-PFT into PhytoDOAS space. W is a weight matrix reflecting data error statistics.

In terms of Kalman-type filtering, the synergistic estimates of x or so called state vector analysis x a is expressed as:

$$\boldsymbol{\alpha}(t\_n)^a = \boldsymbol{\alpha}(t\_n)^b + K\_n(\boldsymbol{\wp}\_n - H\mathbf{x}(t\_n)^b) \tag{3}$$

where x(tn) a and x(tn) <sup>b</sup> denote the analysis (SynSenPFT) and OC-PFT, respectively, at certain time t<sup>n</sup> and OC-PFT grid points, y<sup>n</sup> is the PhytoDOAS observations available at tn, and K is the so-called Kalman gain:

$$K\_n = P\_n^b H^T (H P\_n^b H^T + R)^{-1} \tag{4}$$

Here P b n and R are the OC-PFT and PhytoDOAS error covariance matrices, respectively.

As seen, the SynSenPFT is an update of OC-PFT with PhytoDOAS values weighted in accordance to our degree of belief to both initial-input data products. Note that within the current version of SynSenPFT algorithm the update is done for every subpixel of OC-PFT within a PhytoDOAS pixel (**Figure 2**). Thus, SynSenPFT in every OC-PFT sub-pixel on average is nudged toward PhytoDOAS values as close as allowed by the prescribed R matrix (considered diagonal for simplicity). The SynSenPFT spatial distribution within the PhytoDOAS pixel is based on the information carried by P b n that reflects OC-PFT spatial structure if the P b <sup>n</sup> matrix is estimated based on the OC-PFTs covariances within the PhytoDOAS grid cell.

As an illustration, **Figure 3** shows examples of SynSenPFT diatom Chla (on the OC-PFT sub-grid) within three different PhytoDOAS pixels. As seen, on average within a PhytoDOAS pixel, SynSenPFT Chla is closer to PhytoDOAS retrievals, subject to assumed PhytoDOAS error statistics (an absolute error value of 0.4 mgChla m−<sup>3</sup> , given on **Figures 3a–c**, or the relative error of 40% ( 0.15 mgChla m−<sup>3</sup> ), given on **Figure 3d**. Comparing **Figure 3c** and **Figure 3d** one can conclude about the sensitivity of SynSenPFT product to the assumed PhytoDOAS errors: the more accurate PhytoDOAS PFT product is assumed, the more it influences the SynSenPFT Chla estimates. In comparison with OC-PFT, the spatial variability of the SynSenPFT Chla will be smoother, however. In this study as absolute PhytoDOAS errors approximating the R matrix, we assume 0.4, 0.3, and 0.1 mgChla m−<sup>3</sup> for diatom, coccolithophores and cyanobacteria, respectively.

## 3. RESULTS AND DISCUSSIONS

### 3.1. Initial Product TC Analysis

Here we present the results of the TC analysis carried out with respect to OC-PFT, PhytoDOAS, and NOBM PFT data products. **Figure 4** shows an example of the spatial distribution of PhytoDOAS, OC-PFT, and NOBM unscaled uncertainties for diatom, coccolithophore (haptophytes for OC-PFT) and cyanobacteria Chla, calculated with the TC analysis for each 0.5◦ by 0.5◦ box based on PhytoDOAS, OC-PFT and NOBM PFT information over the period of 2003–2009. When comparing σ<sup>ε</sup> of PhytoDOAS diatom (**Figure 4A**) against those of OC-PFT (**Figure 4B**) and NOBM (**Figure 4C**) one can notice much higher PhytoDOAS diatom Chla temporal variability all over the World Ocean, except for Northern Hemisphere mid- and high latitudes and a tiny belt around the Antarctica, while the OC-PFT diatom Chla product reveals higher σ<sup>ε</sup> in the upwelling regions and in the Arabian Sea (**Figure 4B**). The σ<sup>ε</sup> of PhytoDOAS coccolithophores Chla (**Figure 4D**) also exceeds similar statistical estimates of NOBM coccolithophores (**Figure 4F**) and OC-PFT haptophytes products (**Figure 4E**), now even also in the high Northern latitudes, and the OC-PFT shows higher haptophytes Chla ranges in the Arabian Sea and upwelling regions (**Figure 4E**). However, NOBM has very low coccolithophores Chla σ<sup>ε</sup> (also compared to NOBM diatom Chla σε) all over low- and mid latitudes. The spatial distribution of

cyanobacteria σε<sup>i</sup> is depicted on **Figures 4G–I**. Opposed to OC-PFT, NOBM, and PhytoDOAS show larger cyanobacteria Chla deviation in the oligotrophic and equatorial regions with higher σ<sup>ε</sup> values in the PhytoDOAS data product (**Figure 4G**). These are the regions where we can expect most of the differences between the OC-PFT and synergistic cyanobacteria Chla.

### 3.1.1. Approximation of Prior Error Statistics

As concluded from subsection 2.3.1, the performance of the SynSenPFT strongly depends on the plausibility of the assumed quality of the input data products approximated by the R and P b n matrices. As mentioned above, P b <sup>n</sup> matrix can be (was) presented by the OC-PFT Chla covariances within a PhytoDOAS pixel. However, per se, we can say little about the quantitative estimates of PhytoDOAS errors to properly approximate the R matrix. The following is dedicated to the opportunity and challenges of the R matrix specification.

The PhytoDOAS error covariance matrix can be provided by using the TC analysis (Crow and van den Berg, 2010), since the TC analysis in general provides quantitative information about quality of the data product considered (Stoffelen, 1998; Gruber et al., 2016). Nevertheless, the estimates presented in **Figure 4** and discussed in 2.3.1 can be considered only as the merit of differences between the collocated products unless the quality of at least one data product is well-known. Not as an attempt to find a possible scaling for our TC estimates, but rather for an additional evaluation of our TC analysis, the OC-PFT TC unscaled uncertainties were compared against statistics of the OC-PFT match-up with in situ observations. **Figure 5** illustrates the mean absolute error (MAE) of OC-PFT diatom (**Figure 5A**), haptophytes (**Figure 5C**), and cyanobacteria (**Figure 5E**) Chla relative to in situ Chla calculated for the biomes determined following Hardman-Mountford et al. (2008). In addition, it presents the ratio of the MAE to the OC-PFT TC-based uncertainties (**Figures 5B,D,F**). The ratio values less than 1 indicate that the uncertainties obtained with the TC overestimate the product error statistics while the values larger than 1 would correspond to the situation with the TC uncertainties being underestimated. For instance, the TC-based OC-PFT unscaled uncertainties for diatoms (haptophytes) are

and/or with no satellite information available.

overestimated (underestimated) in the "high Chla" biome (Hardman-Mountford et al., 2008) covering, for instance, southwest mid-to-high northern latitudes, southwest Southern Ocean, the edge of equatorial upwelling regions, the Arabian Sea and the shelf seas of South-East Asia (**Figure 5**). In the "low-intermediate Chla" biome, the TC σ<sup>ε</sup> for diatoms and

haptophytes are in agreement with the MAE estimates. The MAE of OC-PFT diatom Chla is higher than TC σ<sup>ε</sup> in the "high-intermediate Chla" biome (for example, in equatorial areas outside the upwelling regions and in the Pacific, easten Atlantic and Indian Ocean sectors of the Southern Ocean). As seen from **Figure 5F**, the TC estimates for OC-PFT cyanobacteria are highly underestimated all over the ocean. That might indicate that PhytoDOAS overestimates cyanobacteria Chla. Following (Gruber et al., 2016) an additional evaluation of the TC estimates was done (not shown) with the fractional mean-squared-error (fMSEi) that can be also calculated within the frame of the triple collocation analysis. The fMSE<sup>i</sup> criterion explains the existence of white dots in **Figure 4**, correspondent to negative σ 2 εi , by too low values of the product noise to the true signal ratio, which shows that the joint variability of all three products does not always exist (Supplementary Section 4, Figure S6). Or due to underestimation of the variability of one collocated triple or because of some violations of the assumptions essential for TC, the joint variability might not represent the truth. Before treating the TC estimates as product uncertainties, a more detailed evaluation of the plausibility of our TC-based PFTs uncertainties as well as the product evaluation with respect to violating the assumptions underlying the method might be needed. For instance, errors of NOBM PFT estimates and OC-PFT can be correlated since both are based on multispectral satellite data from the SeaWiFS and MODIS sensors: OC-PFT algorithm input OC-CCI Chla product includes in addition to MERIS, also SeaWiFS and MODIS information; NOBM assimilates SeaWiFS and MODIS TChla.

The more detailed assessment of the TC analysis of SynSenPFT initial input product would allow in future to use the TC estimates for approximating R matrix. In this study, the values of the R diagonal are assumed constant and partly based on the current TC results. It is worth commenting on representation/representativeness errors due to 'physical' mismatch between the initial algorithms (Bracher et al., 2017a), and up-(down-)scaling the products when regridding. These errors also impact the TC estimates. In data assimilation applications (Losa et al., 2012, 2014; Yang et al., 2016) to account for such representation errors as well as for any possible errors in the P b n approximation, the assumed σε<sup>i</sup> is normally enlarged. So did we in our study: the assumed errors exceed the PhytoDOAS unscaled uncertainties discussed in 2.3.1.

### 3.2. Example of SynSenPFT Product

**Figure 6** illustrates the monthly mean SynSenPFT Chla product for diatom, coccolithophores and cyanobacteria in September 2006. **Figure 7** depicts absolute differences between SynSenPFT and OC-PFT averaged over the September 2006 and shows where the synergistically combined PFT information are influenced by PhytoDOAS product and over which regions OC-PFT dominates. For instance, by directly comparing **Figures 6A,C**, **7A,C** one can conclude that at high latitudes SynSenPFT diatom and cyanobacteria estimates contain mostly the OC-PFT signal (because of low dia-(cya-)PhytoDOAS coverage in these regions). Note that the spatial distribution of the mean absolute differences (MAD) between SynSenPFT and OC-PFT Chla corresponds to either PhytoDOAS or OC-PFT TC uncertainties patterns (**Figure 4**). We remind that the TC uncertainties are considered as a merit of differences between OC-PFT and PhytoDOAS (and NOBM) products, therefore they indicate per se the areas where the largest updates of OC-PFT Chla by values of PhytoDOAS are expected. Consequently, the dia-SySenPFT Chla differ from OC-PFT estimates most of all in the Southern Ocean (except for the very high latitudes), in the Arabian Sea and north of Australia. The mean differences reach 0.5 mgChla m−<sup>3</sup> . The differences between coc-PhytoDOAS and OC-PFT haptophytes Chla are also in agreement with the TC estimates, but hardly exceed 0.2 mgChla m−<sup>3</sup> , and are most apparent in the Equatorial Pacific, southwest of Africa, in the Arabian Sea and north as well as southeast of Australia. For the considered time period of September 2006 relatively high MAD values for cyanobacteria are distributed over the tropical areas with a maximum of 0.05 mgChla m−<sup>3</sup> in the equatorial upwelling systems, in the northwest part of the Indian Ocean and north of Australia.

# 3.3. SynSenPFT Product Evaluation

The SynSenPFT Chla products were evaluated by comparison with in situ observations, other satellite PFT/PSC products and the NOBM model simulations over the period of August 2002– March 2012.

### 3.3.1. Match-Ups with In situ Observations

Daily SynSenPFT products at ∼4 km spatial resolution were matched with samples from the global in situ validation dataset. The in situ dataset was matched with satellite data from OC-CCI Chla product (but restricted to the life time of SCIAMACHY) at daily temporal resolution and satellite values were retained when located at the same date within a distance of 4 km from the in situ measurement. Note: for OC-PFT model development

the in situ pigment data not matching the OC-CCI TChla were used, such that the validation data are independent of the data used to train the model. In the case of more than one in situ sample within one satellite pixel, respective match-ups were treated independently. To finally derive high quality match-ups, those were considered valid where the coefficient of variation of the 3 x 3 pixels of satellite TChla (OC-CCI) around each in situ was lower than 0.15. This is a similar quality control as applied by Werdell et al. (2007), but here we applied it to TChla instead of Rrs data. In addition, only match-ups above a threshold of 0.01 mgChla m−<sup>3</sup> were selected. The rationale for this threshold is that the surface Chla values encountered in the clearest ocean waters (south Pacific gyre) were found to be in the range 0.01–0.02 mg m−<sup>3</sup> (Morel et al., 2007). Therefore, values below 0.01 mg m−<sup>3</sup> may be considered as questionable.

To compare the satellite SynSenPFT products and in situ Chla of the PFTs the mean absolute error (MAE), root mean squared difference (RMSD), un-biased RMSD and bias (as formulated in Sá et al. 2015) were used. The determination coefficient, slope and intercept (type II regression based on log<sup>10</sup> data) were also computed. **Figure 8** and **Table 2** present the results of the validation of SynSenPFT Chla against in situ PFT Chla considering the data all over the global ocean. High R-square values around 0.5 were achieved for coc-SynSenPFT and dia-SynSenPFT with a slightly better slope, intercept, MAE, and RMSD for coc-SynSenPFT, but lower bias for dia-SynSenPFT. The dia-SynSenPFT RMSD and bias are comparable to the statistics presented by Brewin et al. (2017) for different optical water types in the North Atlantic (RMSD varying from 0.29 to 0.60, bias varying from −0.30 to 0.23). For all three PFTs, the estimated SynSenPFT bias is less than reported in the study by Gregg and Rousseaux (2017).

Looking at the grouping of diatom match-ups according to latitude (**Figure 8A**), we can note the underestimation of diatom Chla at high southern latitudes, which could be related to the general underestimation of TChla when applying standard algorithms to Southern polar areas (Johnson et al., 2013). It is worth keeping in mind that south of 65◦ most of the dia-SynSenPFT values are derived from OC-PFT since PhytoDOAS pixel based information was sparse at these latitudes and low SZA.

**Figure 8C** shows that correlation is not significant for cya-SynSenPFT to in situ PFT data which may be due to the limited spread of the in situ PFT as compared to the two other PFT products evaluated. Cya-SynSenPFT Chla reaches a maximum concentration at −0.8 log10Chla (0.18 mg Chla/m<sup>3</sup> ). This fact implies a high weighting toward OC-PFT, since this feature is an artifact of the OC-PFT approach. Nevertheless, the amount of in situ cya Chla exceeding 0.18 mg Chla/m<sup>3</sup> is quite small (see Supplementary, Figure S7). **Figure 8D** depicts the same as **Figure 8C** but with the density of cya-SynSenPFT match-ups as a background. From this figure panel one can see that the most frequent or dense matchups are located close to the 1:1 curve. As a result, the MAE and RMSD are lower for cya-SynSenPFT than for dia-SynSenPFT.

TABLE 2 | SynSenPFT match-ups statistics.


### 3.3.2. Comparison Against Satellite Sf and PSC and Model PFT

The comparison with other satellite-derived products and model simulations was performed by investigating time-latitude Hovmöller diagrams (longitudinal averaged from 180◦W to 180◦E). All analyses were based on monthly averages. Monthly averages of SynSenPFT were calculated from daily binned data averaged in log<sup>10</sup> space as Chla data are typically log-normally distributed (Campbell, 1995). Monthly averages of Sf and c-PSC were calculated from daily data onto a 0.25◦ spatial grid. Monthly NOBM PFT Chla were resampled to 0.25◦ spatial grid in log<sup>10</sup> space. Those intercomparison data representing Chla concentration, as the NOBM PFT and c-PSC products, were averaged in log<sup>10</sup> space, while Sf were averaged in linear space.

The Hovmöller diagrams are presented in **Figures 9**–**12**, on the left, together with the climatological annual cycle, on the right. Generally the NOBM model results data are available year round at every latitude between 72◦N to 80◦ S as opposed to the satellite products (SynSenPFT Chla, c-PSC, and Sf) which are limited by light availability and cloud-, ice-, glint, foam free conditions. While NOBM and SynSenPFT provide Chla for exactly the same PFT, c-PSC shows the Chla of the respective size class (micro, nano, pico), and Sf and 1-Sf show the contribution of pico- and microplankton, respectively, to total Chla.

When comparing c-micro and dia-NOBM (**Figures 9B,C**) directly, c-micro has in general higher Chla between 80◦N and 40◦ S as expected since c-micro include not only diatoms but also dinoflagellates. However, south of 45◦ S dia-NOBM is higher than c-micro. Dia-SynSenPFT (**Figure 9D**) follow rather strictly the patterns of c-micro but values are higher in low and mid latitudes

and the gyres are not as pronounced as for the c-micro and dia-NOBM. At the polar regions c-micro and dia-SynSenPFT show similar ranges. However, in the Arctic Ocean dia-SynSenPFT spread for longer in the year (Apr-Oct), while higher values are rather limited to Apr-Jun for c-micro. In the Southern Ocean dia-SynSenPFT and dia-NOBM have consistently higher Chla than c-micro. The 1-Sf (**Figure 9A**) shows a weaker seasonality and indicates high contribution of microplankton throughout the observed seasons for the Arctic which seems to agree more with the dia-SynSenPFT results than the c-micro. For the Southen Ocean region 1-Sf shows rather an inverse seasonality as opposed to Chla given by the other three products. The former peaks

at 40◦ S to 60◦ S in late winter to spring (Aug to Oct), while the others show highest Chla in summer (Dec-Feb) which is , e.g., in agreement with the diatom phenology studied by Soppa et al. (2016). It is recalled here that (1-Sf) represents a relative contribution of microplankton to total biomass, therefore its seasonality can be different from that observed for the other products which consider the magnitude of biomass in terms of Chla.

Similar distributions can be found between coc-SynSenPFT and c-nano (**Figure 10**). Values are lower in most parts for coc-SynSenPFT, which is probably related to the fact that coccolithophores are only part of c-nano. C-nano might also include green algae, Phaeocystis sp. and small diatoms for instance. However, coc-SynSenPFT values are a bit higher at 20◦N to 40◦N. Opposed to that, large differences are observed between c-nano and coc-SynSenPFT when compared with coc-NOBM (**Figure 10**). Especially south of 45◦ S coc-NOBM shows absents of coccolithophores which is in disagreement not only to coc-SynsenPFT, but also to findings by O'Brien et al. (2013). They show that coccolithophores have been sampled during many research campaigns up to 80◦ S. While coc-SynSenPFT and cnano show elevated values at equator with similar seasonality, coc-NOBM is low, but enhanced north and south of the equator as opposed to the two other products. However, it is worth pointing out that the NOBM is the product that is not purely satellite-based (though it assimilates some satellite data) so it may not be surprising that it shows the most different patterns. All three products (c-nano, coc-SynSenPFT and coc-NOBM) show similar seasonal cycles at around 60◦N.

Generally c-pico and cya-SynSenPFT (**Figure 11**) show very similar patterns and a low chl-a conc. range. Both products show clear maxima in the Arctic and nearly as high values in the Southern Ocean and at the equator, while values in the subtropics are low, especially in the gyres. They show a

similar seasonality at the mid- and high latitudes as for the cnano and coc-SynSenPFT products. Opposed to that, the cya-NOBM shows quite different patterns: values are very low at latitudes higher than the subtropics and show also a minimum at the equator. Oligothrophic areas at around 20◦ to 35◦ are seen by low values in all three products. However, there, cya-NOBM Chla increases from January till May, while for c-pico and cya-SynSenPFT an increase in Chla is observed from June till September (**Figure 11**, right panels). Again, concentrations of PSC, in this case c-pico are higher than cya-NOBM, mainly north of 40◦ S, south of 40◦ S and just south of 0◦ (**Figure 11**). Between 0◦ and 40◦N and 5◦ S and 40◦ S, cya-NOBM has larger concentrations. Cya-NOBM is nearly absent in latitudes higher than 40◦ as a result of the used cyanobacteria growth parametrisation with the growth rate decreased in cold waters (Gregg and Casey, 2007).

Sf (**Figure 12**) indicates clearly the dominance of picoplankton for the tropical waters which is reflecting what can be concluded from comparing globally the three c-PSC and the three SynSenPFT products among each other. The Hovmöller diagram for Sf (**Figure 12A**) also shows that phytopicoplankton contribution is higher (especially during summer) in the Antarctic than in the Arctic. Similar could be concluded by looking at the Hovmöller plots of f-Cya-SynSenPFT (**Figure 12B**) calculated as a ratio of cya-SynSenPFT product to the OC-CCI TChla. Though, for SynSenPFT it should be noted that the product is designed to identify three different PFTs which are only part of the whole phytoplankton community. Although cyanobacteria make up the largest fraction in the tropics, high diversity of cyanobacteria and other phytopicoplankton has been reported also in the Atlantic Arctic region (e.g., Fram Strait and the Greenland Sea, Díez et al. 2012).

When comparing f-Cya-SynSenPFT against Sf, we see that the observed dominance of the cyanobacteria in north tropical regions is weaker (not in Autumn however) and reveals more

pronounced seasonality for the SynSenPFT product. The f-Cya-SynSenPFT is lower than Sf also in the equatorial region, while f-Cya-SynSenPFT estimates exceed Sf in southern tropical areas. One can also notice differences in temporal variability of picoplankton dominance in the Southern Ocean. It is worth, however, emphasizing one more time that SynSenPFTs represent only a part of the whole pico-phytoplankton community.

Overall, all three SynSenPFT global products seem to reproduce the seasonality in their respective dominant phytoplankton size class (the c-PSC products), except that dia-SynSenPFT seems to be generally higher. Patterns of 1-Sf and Sf indicating the contribution of pico- and microplankton to the phytoplankton community seem also in agreement with the two former products, in terms of the spatial distribution. The temporal pattern is often different which may be due to differences between contribution of a size class and abundance. However, NOBM products, which identify the same PFTs as SynSenPFT, show for most regions quite different patterns and Chla ranges in the distibution of cyanobacteria and coccolithophores, while diatom results are closer to dia-SynSenPFT (and to c-micro). The model has parametrizations limiting the latitudinal extension of the two former groups (Gregg and Casey, 2007). Distribution of global HPLC marker pigments for PFTs (Peloquin et al., 2013), PFT abundance based on HPLC pigment data sets (Swan et al., 2016) and coccolithophore counts (O'Brien et al., 2013) clearly show presence of cyanobacteria and coccolithophores also in temperate and high latitudes. In this respect, these studies as well as the satellite-based study by Alvain et al. (2008) support the global patterns of the SynSenPFT, Sf and PSC.

# 4. SUMMARY AND OUTLOOK

A first version of a synergetic hyper- and multispectral-based satellite product for diatom, coccolithophore, and cyanobacteria (i.e., prokaryotic phytoplankton) Chla was developed globally (being binned at 4 × 4 km resolution) on a daily basis from August 2002 to April 2012. As input data OC-PFT Chla level-3 at the same resolution and daily 7-days composites of PhytoDOAS PFT Chla on a 0.5◦ by 0.5◦ resolution grid were used.

The SynSenPFT products were evaluated globally through validation against in situ PFT Chla data and intercomparison with satellite retrievals of Phytoplankton Size Classes Chla (c-PSC) of Brewin et al. (2015), Size Factor (Sf) of Ciotti and Bricaud (2006) and PFT Chla from the NASA Ocean Biogeochemical Model (NOBM) of Gregg and Casey (2007). When compared to in situ PFT data, good performance of SynSenPFT was achieved for all three products. Although cya-SynSenPFT does not correlate with global in situ match-ups (maybe due to the limited Chla range), nevertheless its RMSD and MAE were lower than for dia-SynSenPFT in the in situ validation. Also cya-SynSenPFT shows reasonable spatial and temporal patterns as detected by the other satellite-derived (especially c-pico) estimates and with results shown in literature based on in situ measurements.

Overall we can conclude that this first version of SynSenPFT products compare reasonably well with in situ data and also with other satellite products. Nevertheless, the current version of the synergistic cya-SynSenPFT as well as dia-SynSenPFT in the higher than 65◦ regions is mostly influenced by OC-PFT data because of large gaps in the PhytoDOAS version 3.3 diatoms and cyanobacteria products at high latitudes.

It is worth emphasizing that we do not introduce the SynSenPFT only as a PFT product, but also as an established dynamic system (see **Figure 13**) including several components within a network allowing the system to further educate and develop. Any possible future improvements of the SynSenPFT system's components would lead to improved quality of the synergistic PFT product. In the following we consider each (1–4) component in a perspective of the future development.

(1) Input initial product. To reduce the deficiencies of the PhytoDOAS diatom Chla product, the PhytoDOAS algorithm may be optimized by including the packaging effect when converting its DOAS fit factor to its Chla via extending the LUT. Accounting for the packaging effect is probably not appropriate for the coc-SynSenPFT and cya-SynSenPFT products, since a very low pigment packaging effect in coccolithophores and if any in cyanobacteria is expected. This may reduce the observed overestimation of dia-SynSenPFT as compared to other products.

The input OC-PFT and PhytoDOAS algorithms can be applied to various similar multi- and hyperspectral-based information from other satellite missions. For instance, OC-PFT is worth applying to OLCI (Sentinel-3), while PhytoDOAS product might be based on the information from OMI (Aura) or TROPOMI- and UVN-instruments onboard Sentinel-5P and Sentinel-4 and -5. Respectively, OMI products, and especially the hyperspectral Sentinel-PFT information might allow PhytoDOAS global coverage to be improved by a factor of 2-3 temporally with spatial resolution also being improved by a factor of 4 for OMI and 40 for the upcoming hyperspectral Sentinel instruments.

PhytoDOAS is currently sensitive to the number of retrieved PFTs as shown by the PhytoDOAS sensitivity study using radiative transfer simulations with SCIATRAN (Wolanin et al., 2016a) and the model study assessing the potential of the retrievals on remote sensing reflectance data (Wolanin et al., 2016b). The SCIAMACHY retrievals were limited to channel 3 cluster 15 of the sensor which represents the wavelength range from 425 to 529 nm. Above and below this range the pixel resolution is worse (240 by 30 km). The OMI and TROPOMI sensors allow for a continuous exploitation from the UV to 500 nm. Therefore, the algorithms may be further revised and extended to enable accurate estimates of enlarged number of PFTs simultaneously retrieved.


More precise information on data quality – measurement errors traced through particular retrieval algorithms as well as estimates of the any possible representation errors – would allow for better SynSenPFT algorithm performance due to more plausible approximation of the input product a priori error statistics.

There is also a need in additional metrics for a posteriori assessment of the SynSenPFT product as well as for initial input products evaluation: the PFT/PSC/Sf phenology

intercomparison on a weekly basis (for example) for various biogeochemical provinces (e.g., according to Longhurst 1998).

(4) SynSenPFT algorithm. The synergistic algorithm is to be further explored to include the PhytoDOAS level-2 information within a certain time window. The level-2 information can then be weighted relatively to the time interval between the date of the analysis and the date of the PhytoDOAS information used. A similar approach can be used to introduce a spatial radius of data influence. That would, however, require a calibration of the algorithm, since such a radius is spatially variable being dependent on the system dynamics which vary a lot between different regions. The synergistic algorithm can be extended with respect to augmenting input products (including, for instance, biogeochemical modeling). Details of the implementation and the algorithm design will depend on the model resolution, the output, nevertheless, could even represent a vertical structure of PFTs and cover the areas and time periods where (when) no satellite data are available.

# AUTHOR CONTRIBUTIONS

ABra is the initiator and leader of the SynSenPFT project, she contributed significantly to the writing process. SL implemented the SynSenPFT algorithm, compiled the first draft and updated the manuscript according the revisions by all the co-authors: ABra, MS, RB, ABri, TD, AW, JO, VR, and IP. MS produced the OC-PFT PFT, suggested the intercomparison design and wrote the most of the Supplementary Material. TD, AW, JO, VR, BG, and ABra were involved in the PhytoDOAS version 3.3 development. ABri and BG contributed with the Sf data product, and RB provided PSC data. ABri and RB as well as IP substantially participated in the PFT products intercomparison and discussion.

# REFERENCES


### FUNDING

This work was supported by the ESA SEOM SY-4Sci Synergy project (No 400112410/14/I-NB\_SEOM\_SY-4SciSynergy). Funding to SL was also supported by the SFB/TR 172 (AC)<sup>3</sup> "Arctic Amplification" subproject C03; DFG-Priority Program SPP 1158 "Antarktis" PhySyn BU2913/3-1, and by the Helmholtz Climate Initiative REKLIM (Regional Climate Change), a joint research project of the Helmholtz Association of German Research Centres (HGF).

# ACKNOWLEDGMENTS

We are thankful to Steven Delwart (ESA-ESRIN) for valuable and inspiring discussion throughout the SynSenPFT project. We thank the reviewers and Editor for their nice comments and constructive suggestions leading to the improved manuscript. We thank ESA for OC-CCI data, MERIS and SCIAMACHY data; NASA Global Modeling and Assimilation Office (GMAO) for NOBM PFT output, NASA Goddard Space Flight Center's Ocean Data Processing System (ODPS) for the MODIS and SeaWiFs data. We thank Dr. Rajdeep Roy (CSIR National Institute of Oceanography (NIO), Goa, India) and all the scientists and crews involved in HPLC data collection and analyses for providing their pigment data. PFT Chla data sets used and produced within our SynSenPFT study are available at https://doi.org/10.1594/PANGAEA.873210.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fmars. 2017.00203/full#supplementary-material


Series, Vol. 22, ed M. Suarez (Greenbelt, MD: NASA Technical Memorandum 2002-104606), 33.


package SCIATRAN. J. Quant. Spectr. Radiat. Transfer 133, 13–71. doi: 10.1016/j.jqsrt.2013.07.004


Differential Optical Absorption Spectroscopy (doas). Ocean Sci. 3, 429–440. doi: 10.5194/os-3-429-2007


sea ice data assimilation. Cryosphere 10, 761–774. doi: 10.5194/tc-10- 761-2016


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Losa, Soppa, Dinter, Wolanin, Brewin, Bricaud, Oelker, Peeken, Gentili, Rozanov and Bracher. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Reproduction of Spatio-Temporal Patterns of Major Mediterranean Phytoplankton Groups from Remote Sensing OC-CCI Data

Gabriel Navarro<sup>1</sup> \*, Pablo Almaraz <sup>1</sup> , Isabel Caballero<sup>1</sup> , Águeda Vázquez <sup>2</sup> and Isabel E. Huertas <sup>1</sup>

<sup>1</sup> Department of Ecology and Coastal Management, Institute of Marine Sciences of Andalusia (CSIC), Puerto Real, Spain, <sup>2</sup> Department of Applied Physics, Higher Engineering School, University of Cadiz, Puerto Real, Spain

### Edited by:

Astrid Bracher, Alfred-Wegener-Institute Helmholtz Center for Polar and Marine Research, Germany

### Reviewed by:

Alison Palmer Chase, University of Maine, United States Takafumi Hirata, Hokkaido University, Japan

> \*Correspondence: Gabriel Navarro gabriel.navarro@icman.csic.es

### Specialty section:

This article was submitted to Ocean Observation, a section of the journal Frontiers in Marine Science

Received: 15 March 2017 Accepted: 19 July 2017 Published: 02 August 2017

### Citation:

Navarro G, Almaraz P, Caballero I, Vázquez Á and Huertas IE (2017) Reproduction of Spatio-Temporal Patterns of Major Mediterranean Phytoplankton Groups from Remote Sensing OC-CCI Data. Front. Mar. Sci. 4:246. doi: 10.3389/fmars.2017.00246 During the last two decades, several satellite algorithms have been proposed to retrieve information about phytoplankton groups using ocean color data. One of these algorithms, the so-called PHYSAT-Med, was developed specifically for the Mediterranean Sea due to the optical peculiarities of this basin. The method allows the detection from ocean color images of the dominant Mediterranean phytoplankton groups, namely nanoeukaryotes, Prochlorococcus, Synechococcus, diatoms, coccolithophorids, and Phaeocystis-like phytoplankton. Here, we present a new version of PHYSAT-Med applied to the Ocean Colour—Climate Change Initiative (OC-CCI) database. The OC-CCI database consists of a multi-sensor, global ocean-color product that merges observations from four different sensors. This retuned version presents improvements with respect to the previous version, as it increases the temporal range (since 1998), decreases the cloud cover, improves the bias correction and a validation exercise was performed in the NW Mediterranean Sea. In particular, the PHYSAT-Med version has been used here to analyse the annual cycles of the major phytoplankton groups in the Mediterranean Sea. Wavelet analyses were used to explore the spatial variability in dominance both in the time and frequency domains in several Mediterranean sub-regions, such as the Alboran Sea, Ligurian Sea, Northern Adriatic Sea, and Levantine basin. Results extended the interpretation of previously detected patterns, indicating the dominance of Synechococcus-like vs. prochlorophytes throughout the year at the basin level, and the predominance of nanoeukaryotes during the winter months. The method successfully reproduced the diatom blooms normally detected in the basin during the spring season (March to April), especially in the Adriatic Sea. According to our results, the PHYSAT-Med OC-CCI algorithm represents a useful tool for the spatio-temporal monitoring of dominant phytoplankton groups in Mediterranean surface waters. The successful applications of other regional ocean color algorithms to the OC-CCI database will give rise to extended time series of phytoplankton functional types, with promising applications to the study of long-term oceanographic trends in a global change context.

Keywords: PHYSAT-Med algorithm, OC-CCI database, phytoplankton functional types, Mediterranean Sea, wavelet analysis

# INTRODUCTION

Since the launch of the Coastal Zone Color Scanner (CZCS) in the late 1970s, ocean color remote sensing has deeply improved our understanding of the ocean system by providing global estimations of the surface chlorophyll concentration (Chla), a parameter known to be a good proxy of phytoplankton biomass (e.g., McClain, 2009). Marine phytoplankton are located at the base of the marine food web (Chassot et al., 2010; and references therein), play a major role in the global biogeochemical cycles (Field et al., 1998) and participate actively in the regulation of the global climate (Sabine et al., 2004). During the last 40 years, observations of regional-to-global Chla data have been acquired by different ocean color sensors (IOCCG, 2012), such as Sea-viewing Wide Field-of-view Sensor (SeaWiFS), Moderate Resolution Imaging Spectroradiometer (MODIS), Medium-Resolution Imaging Spectrometer (MERIS) and Visible Infrared Imager Radiometer Suite (VIIRS). In order to extend the existing time series beyond that provided by a single satellite sensor, the European Space Agency (ESA) has recently generated the Ocean Colour—Climate Change Initiative (OC-CCI), a multisensor, global, ocean-color product mainly devoted to climate research (Storm et al., 2013) that merges observations from four different sensors: SeaWiFS, MODIS, MERIS, and VIIRS. As an ESA-funded CCI project, the OC-CCI focuses specifically in creating a consistent, error-characterized time-series of oceancolor products, with a strong focus in climate-change studies (Brewin et al., 2015). Remote-sensing reflectance (Rrs) data from MODIS-Aqua and MERIS are then band-shifted to match the wavelengths of SeaWiFS by using an in-water bio-optical model (e.g., see Mélin and Sclep, 2015). The main reason behind this choice is that SeaWiFS is widely considered as the highest quality sensor with the best match to in situ observations, and is commonly used in peer literature (Couto et al., 2016). This dataset improves the bias correction, thus reducing the sensitivity to medium-term changes and extending the method applicability beyond the lifetime of SeaWiFS. As a result, the current OC-CCI database allows for the examination of the spatial and temporal variability of surface Chla since September 1997 (Couto et al., 2016).

Even though remote sensing derived phytoplankton types does not provide a full description of the marine ecosystem, its spatio-temporal distribution (including phenology, Kostadinov et al., 2017), and identification of key groups give powerful insights on the dynamics of the marine food web and the ocean's role in climate regulation in the context of the global change (Bracher et al., 2017). This relevance was early recognized by Platt et al. (2006), who concluded that detection of phytoplankton from remote sensing images was a major challenge in ocean optics. Therefore, over the last decade, several remote sensing algorithms have been developed to characterize the global distribution patterns of phytoplankton functional types (PFT) or size classes (PSC; e.g., Sathyendranath et al., 2004; Alvain et al., 2005, 2008; Ciotti and Bricaud, 2006; Raitsos et al., 2008; Aiken et al., 2009; Bracher et al., 2009; Brewin et al., 2010; Kostadinov et al., 2010; Hirata et al., 2011; Uitz et al., 2012; see recent summary in Table 2 in (Bracher et al., 2017 and Table 3 in Mouw et al., 2017). A complete guide of the available approaches can be found in Mouw et al. (2017). Some of these algorithms are based on various spectral features, such as backscattering (e.g., Kostadinov et al., 2010), absorption (e.g., Ciotti and Bricaud, 2006; Bracher et al., 2009; Mouw and Yoder, 2010; Roy et al., 2013) or a hybrid between absorption and backscattering (Fujiwara et al., 2011). Other algorithms exploit second-order anomalies of reflectance spectra (Alvain et al., 2005, 2008), which is the case of the so-called PHYSAT that was first developed at a global scale by Alvain et al. (2005, 2008). The PHYSAT approach relies on the identification of specific signatures in the normalized water leaving radiance (nLw) spectra measured by an ocean color sensor (Alvain et al., 2005, 2008), thereby enabling the identification of nanoeukaryotes, haptophytes (a major component of the nanoflagellates), Synechococcuslike cyanobacteria, diatoms, Prochlorococcus, Phaeocystis-like phytoplankton, and coccolithophorids. The PHYSAT method has been successfully validated with phytoplankton in situ data and extensively used by many authors (e.g., Bopp et al., 2005; Arnold et al., 2010; D'Ovidio et al., 2010; Gorgues et al., 2010; Masotti et al., 2010, 2011; Alvain et al., 2012, 2013; Belviso et al., 2012; Demarcq et al., 2012; De Monte et al., 2013; Hashioka et al., 2013; Ben Mustapha et al., 2014; Thyssen et al., 2015).

Navarro et al. (2014) later proposed a regionalized version of the algorithm for the Mediterranean Sea (**Figure 1**), the PHYSAT-Med, using the MODIS era (2002–2013) for identification of nanoeukaryotes, Prochlorococcus, Synechococcus-like cyanobacteria and diatoms, which was compared with more than 3,000 high-performance liquid chromatography (HPLC) in situ measurements (see Table 3 in Navarro et al., 2014). The main utility of the PHYSAT-Med is that it allows for the tracking of specific features of phytoplankton community structure occurring in the basin, along with their associated bio-optical relationships that are heavily affected by continental inputs, such as desert dust events and rivers discharge (Bricaud et al., 2002; Claustre et al., 2002; Alvain et al., 2006; Loisel et al., 2011). Volpe et al. (2007) early suggested that the unique phytoplankton assemblages of the basin could alter its spectral signature, therefore being responsible for the peculiar color of the Mediterranean. Due to these characteristics, standard remote sensing approaches tend to either overestimate or underestimate Chla levels in the Mediterranean. In fact, Volpe et al. (2007) also showed that NASA SeaWiFS standard chlorophyll products are affected by an uncertainty in the order of 100%. Specific algorithms have been thus developed to retrieve Chla in the region, namely DORMA-SeaWiFS (D'Ortenzio et al., 2002), BRIC-SeaWiFS (Bricaud et al., 2002), MedOC4-SeaWiFS (Volpe et al., 2007), MedOC3-MODIS (Santoleri et al., 2008), and MedOC4ME-MERIS (Santoleri et al., 2008).

Furthermore, these bio-optical characteristics of the basin described above clearly indicate the necessity to use customized algorithms to detect PFT or PSC in Mediterranean Sea. Recently, Sammartino et al. (2015) described the temporal variability of PSC in Mediterranean Sea using the model proposed by Brewin et al. (2011). Di Cicco et al. (2017) presented a new regional algorithm to identify simultaneously the contribution of each

PSC and PFT group to the satellite estimates of total Chla concentration in the basin.

The Mediterranean (**Figure 1**) is the largest inland ocean basin on Earth, only connected to the rest of the world's oceans by the Strait of Gibraltar. It exhibits an oligotrophic regime (Krom et al., 1991), notwithstanding relatively external high inputs of essential nutrients (Ludwig et al., 2009; Huertas et al., 2012; Powley et al., 2016). Nevertheless, local physical structures generate convergences zones, which are reflected in the distinct biogeochemical properties of the two Mediterranean sub-regions, the Western and the Levantine. Thus, a notable decreasing gradient in Chla concentration is detected from the west to the east, which causes a significant longitudinal variation in primary production (Turley et al., 2000; Uitz et al., 2012). This gradient in oligotrophy is evidenced both by in situ measurements (Tanhua et al., 2013) and satellite data (D'Ortenzio and Ribera d'Alcalá, 2009). However, the seasonal evolution of Chla distribution still follows the typical succession of temperate regions, characterized by a phytoplankton biomass increase in late winter/early spring, a decrease during the summer season and a second smaller phytoplankton bloom in autumn (Siokou-Frangou et al., 2010; Sammartino et al., 2015).

Phytoplankton community structure in oligotrophic areas throughout the world's ocean is mainly composed by picoplankton and ultraplankton (Li et al., 1983; Brunet et al., 2006; Dandonneau et al., 2006). Nevertheless, the Mediterranean phytoplankton communities structures reveals a considerable variability over both temporal and spatial scales, and large dissimilarities in phytoplankton assemblage composition along with other microorganisms across the basin have been also highlighted (Siokou-Frangou et al., 2010). Many studies have pointed to the dominance of picoplankton as the fingerprint of the Mediterranean Sea and its overriding oligotrophy, but the occurrence of regional phytoplankton blooms cause the coexistence of numerous microalgal groups (Siokou-Frangou et al., 2010).

The satellite empirical model applied by Sammartino et al. (2015) encompassed this unusual and complex community structure in the Mediterranean Sea and allowed assessment of the spatio-temporal variability of the three phytoplankton size classes (micro-, nano-, and pico-plankton) during the entire SeaWiFS era (1998–2010). Previously, Navarro et al. (2014) had redefined the PHYSAT algorithm (Alvain et al., 2005, 2008) to the Mediterranean Sea's bio-optical characteristics to estimate the dominant functional phytoplankton types (Prochlorococcus, Synechococcus, diatoms, nanoeukaryotes, coccolithophorids, and Phaeocystis-like) from the MODIS sensor. More recently, Di Cicco et al. (2017) developed a new regional algorithm for satellite biomass estimates of PSC and PFT in Mediterranean Sea and assessed their accuracy with respect to global models, improving the uncertainty and the spread of about one order of magnitude for all phytoplankton classes.

Regarding the distribution of chlorophyll, low values (less 0.2 mg/m<sup>3</sup> ) are found over vast areas of the basin, with the exception of large blooms observed in late winter and early spring in the North Western Mediterranean (Siokou-Frangou et al., 2010). Mesoscale activity also increases the chlorophyll concentration mainly in the Alboran Sea, Balearic-Catalan Sea, Adriatic Sea and the South Eastern Levantine Sea, by about one order of magnitude for all phytoplankton classes. In other coastal areas close to major rives, such us the Po in the North Adriatic Sea, the Rhone in Gulf of Lions and the Nile in the Levantine Sea, and river discharge generates a large increase in chlorophyll levels (Siokou-Frangou et al., 2010). In the eastern basin, the Chla rarely exceeds 0.5 mg/m<sup>3</sup> , with minima as low as 0.003 mg/m<sup>3</sup> (Siokou-Frangou et al., 2010). Low biomass values are generally associated with the dominance of cyanobacteria, prochlorophytes and picoplankton-sized flagellates (Siokou-Frangou et al., 2010 and references therein), and represents 59% of the total Chla and 65% of the primary production. However, nanoflagellates are the dominant group in terms of cell numbers throughout most of the year in the Mediterranean Sea. Finally, observed increases in Chla correlate with decreases in the contribution of picoplankton and nanoplankton, and increases in diatom concentration during February and March (Siokou-Frangou et al., 2010).

Bracher et al. (2017) recently highlighted the limited applicability of global satellite algorithms to determine the composition of phytoplankton at a regional scale as one of the major gaps in satellite research. Accordingly, these authors suggest a roadmap for future developments in regionally adapted algorithms. The main goal of this paper is thus to bridge the gap diagnosed by Bracher et al. (2017) by updating and improving the original version of the PHYSAT-Med method (Navarro et al., 2014) with the new OC-CCI database. Firstly, the advantages of this new version are: (a) an increase in the temporal range (1997– 2015; (b) a decrease of the cloud cover due to the use of several ocean color sensors; (c) an improvement in the bias correction, thus reducing sensitivity to medium-term changes; and (d) the validation of the temporal range in the NW Mediterranean Sea using diagnostic pigments analysis (DPA, Vidussi et al., 2001). Secondly, wavelet analysis is applied to the retuned version in order to analyse the contributions of different temporal cycles of dominance variability of the major phytoplankton groups in the Mediterranean Sea.

# MATERIALS AND METHODS

### PHYSAT-Med OC-CCI Algorithm

The OC-CCI is a long-term, consistent and error-characterized dataset generated from merged normalized remote-sensing reflectance derived from four satellite sensors: SeaWiFS, MODIS, MERIS, and VIIRS (Storm et al., 2013; Jackson et al., in press). In this work, we have used OC-CCI v3.0, where more data have been included (VIIRS and SeaWiFS LAC). In addition, the bias correction has been improved, reducing sensitivity to mediumterm changes and extending the method to work beyond the lifetime of SeaWiFS. Daily level 3 remote sensing reflectance data (Rrs) at 412, 443, 490, 510, 555, and 670 nm and diffuse attenuation coefficient (Kd490) were downloaded from the OC-CCI website covering the period from January 1998 to December 2015 (**Figure 2**, step 1). These products were displayed on a regular 4 km grid, with an equi-rectangular projection with constant longitude and latitude steps. Error specification (RMSE and bias) is based on comparison with match-up in situ data and extrapolation to global scale ocean.

In a second step (**Figure 2**, step 2), the Chla concentration in the Mediterranean Sea was calculated using a regional algorithm (MedOC4, Mediterranean ocean color four-bands, Volpe et al., 2007) developed for the basin for SeaWiFS bands (or CCI),

$$\text{MedOC4} - \text{Chla} = \ 10^{\{0.4424 - 3.686R + 1.076R^2 + 1.684R^3 - 1.437R^4\}} \text{(1)}$$

where

$$R = \begin{bmatrix} \log\_{10} \left[ MAX \left( Rss\_{555}^{443}, \, Rss\_{555}^{490}, \, Rss\_{555}^{510} \right) \right] \end{bmatrix} \tag{2}$$

This bio-optical algorithm is based on a fourth-power polynomial regression between log-transformed Chla and log-transformed maximum band ratio (MBR). It is known that using multiple

Rrs ratios decreases the noise-to-signal ratio, thereby enhancing the algorithm's performance (O'Reilly et al., 1998). The MedOC4 algorithm was calibrated on a representative open-water biooptical dataset collected in the Mediterranean Sea, and is the best algorithm matching the requirement of unbiased satellite orophyll estimates (Volpe et al., 2007; Santoleri et al., 2008). At a global scale, the SeaWiFS algorithms have shown errors in the range of <5% for radiances and <35% for chlorophyll (Mueller and Austin, 1995; Gregg and Casey, 2004). The accuracy limit for chlorophyll using these standard algorithms has been shown to be unrealistic in Mediterranean Sea, yielding a severe overestimation (>70% for chlorophyll <0.2 mg/m<sup>3</sup> ; Volpe et al., 2007, 2012).

At the third step (**Figure 2**), the Rrs was converted to nLw using the nominal band solar irradiance (Fo, in mW cm−<sup>2</sup> µm−<sup>1</sup> ) for any specific spectral band (λ) of the SeaWiFS sensor (Gregg et al., 1993; Thuillier et al., 2003).

$$
\hbar L\boldsymbol{w}\_{\left(\lambda\right)} = \mathrm{Rrs}\_{\left(\lambda\right)}\,^\star F\_{o\left(\lambda\right)}\tag{3}
$$

During step 4 (**Figure 2**), a new Look-Up-Table (LUT, **Figure 3** and Table 1 in Supplementary Material) of nLwref(λ, Chla) was empirically generated for the Mediterranean Sea from a large dataset of OC-CCI Chla and nLw pixels for all daily images contained within the study period (January 1998 to December 2015). Turbid pixels (defined as nLw555>1.3 mW cm−<sup>2</sup> mm−<sup>1</sup> sr−<sup>1</sup> , Nezlin and DiGiacomo, 2005) were excluded in order to minimize the impact of high-suspended matter loads. Briefly, nLwref is calculated from nLw data, and the associated Chla computed from the MedOC4 algorithm within the concentration

range between 0.01 and 10 mg/m<sup>3</sup> (41 narrow intervals). This figure is similar to the one used in the development of the PHYSAT-Med algorithm (Navarro et al., 2014).

Once the new LUT (**Figure 3**, Table 1 in Supplementary Material) for the Mediterranean Sea was calculated using the regional MedOC4-Chla algorithm, the radiance anomalies [Ra(λ), see **Figure 3**, step 5] were computed for all daily OC-CCI wavelengths analyzed using Equation (4) for all available wavelengths (412, 443, 490, 510, 555, and 670 nm). Ra(λ) is an adimensional parameter independent of the Chla level, and hence also independent of the biomass. Ra(λ) thus represents the second order variation in nLw(λ) after removal of the first order effect of the Chla variation (Alvain et al., 2005):

$$Ra\_{(\lambda)} = {}^{nLw\_{(\lambda)}} < {}\_{nLw\_{(\lambda)}} \tag{4}$$

The analyses by Alvain et al. (2005) showed that for a given Chla concentration, the particle scattering variability explains the largest fraction of the remotely sensed Ra spectral variability, especially when focusing on Ra magnitude changes. The labellization step was performed using the thresholds of Ra for each of the six phytoplankton groups examined in PHYSATv2008 (see Table 5 in Alvain et al., 2008), which is specifically set up for SeaWiFS channels. These thresholds were used to process daily images to calculate daily PFTs map (**Figure 3**, step 6). For a spectrum to be associated with one group, all criteria must be fulfilled. Thresholds (Table 5 in Alvain et al., 2008) were fixed in order to avoid any overlapping. Pixels with nLw values that were not classified for any phytoplankton groups were cataloged as "unidentified (unid.)," and this can sum up a significant fraction (Navarro et al., 2014). PHYSAT-Med retrieves the dominant group for a given satellite image pixel (4 km) for Mediterranean Sea, where a given phytoplankton group is the major contributor to the radiance anomaly. From this database (near to 6,600 daily images), 10-day and monthly maps of dominant phytoplankton groups were obtained by calculating the phytoplankton group that was present more days during the integration period (10-day or monthly, respectively) at each geographical pixel, not including "unidentified" pixels. To estimate the proportion of each phytoplankton group in the entire basin and several Mediterranean sub-regions (Alboran Sea, Ligurian Sea, Northern Adriatic Sea, and Levantine basin in **Figure 1**, Bricaud et al., 2002), the number of the pixels of each PFT during 10-day or 1 month was calculated for each area in proportion to all the identified pixels, excluding the unidentified pixels. The box plot figures were created using Matlab <sup>R</sup> software (boxplot.m function), where the central mark corresponds to the median, the edges of the box are the 25th and 75th percentiles, the whiskers extend to the most extreme data points not considered outliers, and outliers are plotted individually. These statistics were calculated for each climatological month by considering all the values obtained during a particular month for the whole data series (from 1998 to 2015).

### In situ Validation

In order to validate the PHYSAT-Med output, we have compared the temporal variability of PFT obtained from remote sensing algorithm with diagnostic pigments analysis (DPA, Vidussi et al., 2001) obtained from HPLC in North-Western Mediterranean Sea. A total of 5,400 samples were collected from the basin and analyzed by HPLC (**Figure 1**). This dataset consisted of samples from the DYFAMED (Dynamics of Atmospheric Fluxes in the Mediterranean Sea) time series included in the MAREDAT global database of HPLC (Peloquin et al., 2013) and the BOUSSOLE (Buoy for the acquisition of a Long-Term Optical Time Series) program (i.e., Antoine et al., 2006). Details of HPLC methods used can be found in the aforementioned references. Here, we only considered samples limited to the first optical depth (Z90), which reduces the number of available pigment inventories to 1,615 samples and comprising the temporal range analyzed (1998–2015). The first optical depth was calculated using daily Kd<sup>490</sup> images from OC-CCI data [Z<sup>90</sup> = 1/Kd490], which is about 15–35 m on average in the Mediterranean Sea (D'Ortenzio and Ribera d'Alcalá, 2009). The OC-CCI Kd<sup>490</sup> product is computed from the inherent optical properties (IOPs) at 490 nm (Lee et al., 2005; Grant et al., 2016).

This comparison is based on many pigments specific to individual phytoplankton taxa or groups (i.e., Gieskes et al., 1988; Goericke and Repeta, 1993; Claustre and Marty, 1995; Jeffrey and Vesk, 1997). For instance, Divinyl Chlorophyll-a (dChla) is a typical marker of prochlorophytes (Goericke and Repeta, 1992; Claustre and Marty, 1995; Vidussi et al., 2001), whereas zeaxanthin (Zeax) is associated with cyanobacteria (Guillard et al., 1985). Fucoxanthin (Fuco) pigment is the principal marker of diatoms (Jeffrey, 1980). For nanoplankton cuantification, Vidussi et al. (2001) used three diagnostic pigments: alloxanthin (Allox), that is a pigment typical of the cryptomonads (Gieskes and Kraay, 1983); 19′ -hexanoyloxyfucoxanthin (HexFuco), whose concentration is related to prymnesiophytes and chromophytes nanoflagellates (Wright and Jeffrey, 1987); and 19′ -butanoyloxyfucoxanthin (ButFuco), a typical marker of chromophytes nanoflagellates (Wright and Jeffrey, 1987). Other pigments used in this method are total chlorophyll-b (TChlb, chlorophyll b + Divinyl-Chlorophyll b) and peridinin (Perid), which appears in small dinoflagellates (Jeffrey and Hallegraeff, 1987). This approach has been used at global scale (Uitz et al., 2006) and particularly in the Mediterranean Sea (i.e., Vidussi et al., 2001; Marty et al., 2002; Sammartino et al., 2015; Di Cicco et al., 2017; Mayot et al., 2017).

In this study, we have compared the in situ chlorophyll concentration of nanoplankton and diatoms using the method recently applied to the Mediterranean Sea by Di Cicco et al. (2017). Following the DPA procedure, originally proposed by Vidussi et al. (2001) and later refined by Uitz et al. (2006) to scale diagnostic pigments to Chla, it is possible to apply DPA-based approaches to satellite-derived Chla:

$$\text{Chla}\_{\text{diatomic}} = \left(\frac{1.60 \text{ } [Fauc]}{\Sigma DPW}\right) \ge \text{Chla} \tag{5}$$

$$\text{Chla}\_{\text{nano}} = \left(\frac{1.18 \text{ } [HaxFunc] + 0.57 \text{ } [ButFunc] + 2.70 \text{ } [Allo]}{\Sigma DPW}\right)$$

$$\ge \text{ Chla if TChla} > 0.08 \text{ } mg/m^3 \tag{6}$$

$$\text{Chla}\_{\text{nano}} = \left(\frac{12.5 \text{ } [TChla]}{\text{ } + 1.18 \text{ } [HaxFunc] + 0.57 \text{ } [ButFunc] + 2.70 \text{ } [Allo]}\right)$$

$$\ge \text{ Chla if TChla} < 0.08 \text{ } mg/m^3 \tag{7}$$

where

$$\begin{aligned} \Sigma DPW &= \ 1.18 \left[ HexFuco \right] + 0.57 \ [ButFuco] + 2.70 \ [Allo] \\ &+ \ 1.67 \left[ Perid \right] + 1.60 \ [Fuuco] + 0.88 \ [TCllb] \\ &+ \ 1.79 \ [Zeax] \end{aligned} \tag{8}$$

Alternatively, the method described by Alvain et al. (2005) was used to compare Prochlorococcus and Synechococcus time series, as this method uses the pigment ratio of divinyl-Chla and zeaxanthin, respectively, and has been applied previously for the Mediterranean Sea (Navarro et al., 2014):

$$P\_{rel} = \,^P \checkmark \, (\text{Chla} + d \text{Chla}) \tag{9}$$

where P is the measured pigment concentration (dChla or zeaxhantin), and Chla and dChla are the concentrations of chlorophyll-a and divinyl Chlorophyll-a, respectively.

As HPLC and PHYSAT-Med OC-CCI output data are measured in different units, we used Spearman's rank-order (non-parametric) correlations to assess the strength of the temporal association between both variables for each of the phytoplankton functional types. We used 10,000 bootstrap samples to construct 90% empirical confidence intervals for the correlations (Efron and Tibshirani, 1993). In order to check for possible time lags between the timing of seasonal blooms within the Ligurian Sea, as measured with both methods, we inspected the effect of different time lags in both variables on the strength of the association.

### Wavelet Analysis

Wavelet analysis has emerged as a tool for characterizing periodicities in non-stationary time series, as it decomposes a time series both in the frequency and time domains (Percival and Walden, 2000). In this study, wavelet analysis has been used to characterize the different periodic components of the variability in dominance of the major phytoplankton groups in the Mediterranean Sea across time. Wavelet analysis performs a time-scale decomposition of the signal by estimating its frequency characteristics as a function of time (Torrence and Compo, 1998; Grinsted et al., 2004; Winder and Cloern, 2010). In order to normalize time series data and obtain the wavelet power spectrum of the different phytoplankton groups, the continuous Morlet wavelet transform was applied by using the Matlab <sup>R</sup> toolbox provided by Torrence and Compo (1998) and Grinsted et al. (2004) (http://atoc.Colorado.edu/research/ wavelets/). The wavelet power spectrum identifies the periods that are the most important sources of variability across time. Additionally, it is possible to define a global wavelet spectrum, which identifies the variance associated to each period for a given time series, and is similar to Fourier spectra (Percival and Walden, 2000). Wavelet analysis was performed over the 10-day times series of nanoeukaryotes, Prochlorococcus, Synechococcuslike cyanobacteria and diatoms for the entire Mediterranean Sea and for four selected sub-regions (Alboran Sea, Liguarian Sea, Northern Adriatic Sea and Levantine sea).

# RESULTS AND DISCUSSION

# Validation

**Figure 4** shows the temporal variability of several diagnostic pigments and PFT in the Ligurian Sea, where DYFAMED and BOUSSOLE sampling stations are located. The comparison exercise covers all range of the PFT analyzed, except for zeaxanthin pigment for which there were no values during 1998. Prochlorococcus showed maximum values in autumn over several years, at the end of the stratification period (**Figure 4A**). The maxima found by the PHYSAT-Med OC-CCI algorithm were in close agreement with the maxima in the concentrations of the pigment ratio for dChla measured by HPLC method, which is indicative of prochlorophytes (Goericke and Repeta, 1992; Claustre and Marty, 1995; Vidussi et al., 2001). These results agree with the pattern reported by Vaulot et al. (1990) and Marty et al. (2002) estimated by flow cytometry and HPLC analysis, respectively. During summer, also coinciding with the stratification period, the dominant group is Synechococcus, and a maximum in zeaxanthin concentration is observed across the basin all years (**Figure 4B**). This pigment is associated with cyanobacteria (Guillard et al., 1985) and has been widely used to estimate Synechococcus concentration in the Ligurian Sea (Vidussi et al., 2001; Marty et al., 2002). However, during the spring bloom period, when the mixed layer depth is at its maximum (Marty et al., 2002), the diatom group and diatom Chla concentration also reached its highest value (**Figure 4C**). Finally, the nanoeukaryotes distribution presented maxima during winter, normally around January, coinciding with the maximum of nanoplankton chlorophyll concentration estimated

FIGURE 4 | Temporal percentage (10-day) of each PFT (black bars, left axis) and diagnostic pigments (DYFAMED–red dots and BOUSSOLE–green dots, right axis) in the Ligurian Sea. (A) Prochlorococcus and Divinyl-Chla. (B) Synechococcus and zeaxanthin. (C) Diatoms and Chladiatom. (C) Nanoeucaryotes and Chlanano.

using Equations 6 and 7 (**Figure 4D**; Di Cicco et al., 2017). Overall, the broad coincidence between PHYSAT-Med outputs and HPLC pigments in the temporal pattern suggests that the new version of PHYSAT-Med algorithm using OC-CCI v3.0 database is in agreement with the results obtained through longterm monitoring programs for phytoplankton distribution, at least in the Ligurian Sea area.

Even though the OC-CCI database provided per-pixel errors (RMSE and bias) for all OC-CCI products, this approach is not so common in retrieving phytoplankton functional types, except for the recent works published by Brewin et al. (2017) and Di Cicco et al. (2017) for the Mediterranean Sea. In fact, Di Cicco et al. (2017) showed the improvements obtained from the use of regional models with respect to the global models, with the reduction of bias being of about one order of magnitude. As we described above, the PHYSAT algorithm allows for the detection of dominant PFT. This approach is based on the analysis of the second order variation in nLw measurements after removal of the impact of Chla variation. Alvain et al. (2012) found acceptable results for diatoms (73%) and nanoeucaryotes (82%), but relatively low for Prochlorococcus and cyanobacteria (61 and 57% of successful identification, respectively). For PHYSAT-Med, Navarro et al. (2014) found similar results for Synechococcus and nanoeucaryotes (61 and 74%, respectively).

**Table 1** shows the results of the validation exercise. For both nanoeucaryotes and diatoms the correlation between the PHYSAT-Med OC-CCI and the HPLC data is relatively large, with narrow bootstrapped confidence intervals non-overlapping 0. For Synechococcus the correlation is weaker but, again, the confidence interval does not contain 0. In contrast, for Prochlorococcus the association between PHYSAT-Med and HPLC data is weaker, and now the 0 is included within the confidence interval. In this later case, the sample size is clearly lower. The reason might be that the largest Prochlorococcus abundance is located near the deep chlorophyll maximum, deeper than the first optical depth (Siokou-Frangou et al., 2010).

### Spatio-Temporal Patterns at the Basin Scale

**Figure 5** shows the monthly climatology (1998–2015) of the dominant phytoplankton groups in the Mediterranean Sea

TABLE 1 | Validation results of the PHYSAT-Med dataset with HPLC data from the Ligurian Sea.


Shown are the sample size of the time series used, the Spearman's rank-order correlation between both time series (rs) and the 90% bootstrapped confidence interval. 10,000 bootstrapped samples of the original series were used to construct the intervals.

obtained with the OC-CCI database. These patterns are similar to those obtained by Navarro et al. (2014) using the PHYSAT-Med and the MODIS imagery for the period comprised from July 2002 to May 2013. In addition, the analysis of the PFT is consistent with the previous knowledge of this area (Siokou-Frangou et al., 2010; Uitz et al., 2012; Sammartino et al., 2015; Di Cicco et al., 2017). It is evident that Synechococcus is the most abundant group detected at the basin scale and particularly during spring-summer months, whereas nanoeukaryotes seem to dominate during autumnwinter months. Prochlorococcus is preferentially distinguished during February and October in offshore waters, in opposition to diatoms that prevail in coastal areas, such as the Gulf of Lions, the Ligurian Sea and the northern Adriatic Sea, and mostly during the spring season. This last finding agrees with the microplankton distribution provided by Sammartino et al. (2015) and Di Cicco et al. (2017) who concluded that the fraction of microplankton significantly increases in the Northwestern Mediterranean Sea, reaching values from 30 to 57% (Sammartino et al., 2015). The presence of coccolithophorids in the basin is particularly evident along the Mediterranean coastline and particularly in the surroundings of the large river mouths (Ebro, Rhone and Nile) and in the Adriatic Sea (**Figure 5**). However, it is worthy to highlight that fluvial inputs of terrestrial matter or suspended solids may slightly mask the signals and affect phytoplankton distribution (Navarro et al., 2014). Even though PHYSAT-Med is also appropriate for detecting Phaeocyctis-like phytoplankton, no signal of this group was found in the current study. These spatiotemporal patterns were subsequently corroborated by the time series of monthly climatology (**Figures 6B**, **7B**, **8B**, and **9B**) although only nanoeukaryotes, Prochlorococcus, Synechococcuslike cyanobacteria and diatoms were considered because these groups were compared with in situ pigment markers (**Figure 4**).

The abundance of nanoeukaryotes in the Mediterranean Sea (**Figure 6A**) follows recurrent 12-months cycles across time, as suggested by the power of the wavelet spectrum at this cycle (**Figure 6C**). Some weaker 6-months cyclic components can be observed during certain particular years (2003, 2006– 2014). The mean annual cycle or the monthly climatology for this group (**Figure 6B**) showed a maximum percentage of abundance during the winter months, mainly November, December and January. The global wavelet spectrum (**Figure 6D**) demonstrated that the 12-months periodicity was highly significant, with a minor peak at 6-months also contributing to the variance. The amplitude of the seasonal variations of nanoeukaryotes at a basin scale is similar to that described by Sammartino et al. (2015) and Di Cicco et al. (2017), who found minimum values of abundance for nanoplankton during summer and maximum during winter, when the mixed layer depth (MLD) is also deeper (Siokou-Frangou et al., 2010). In the Levantine basin this group was the second most abundant in terms of Chla, but it was the main group in the western basin (Di Cicco et al., 2017). Nanoplankton make a dominant contribution (up to 43–50%) to total primary production throughout the year at the basin scale (Uitz et al., 2012).

For Synechococcus the wavelet power spectrum also revealed a persistent 12-months periodicity (**Figures 7C,D**), with virtually no secondary 6-months cycles. The higher percentages of Synechococcus abundance were effectively observed during the summer season (**Figures 4**, **5**, and **7B**), particularly in June and July, coinciding with the stratification period (Siokou-Frangou et al., 2010).

Interestingly, the temporal patterns of Prochlorococcus (**Figure 8A**) exhibited a less periodic fluctuation in dominance at the basin scale as compared to those of Synechococcus (**Figure 7A**).The continuous wavelet spectrum (**Figures 8B,C**) suggests that prior to 2002, no cyclic component dominated (the time series conformed to a white-noise process). From this year onwards, a 6-months periodicity pattern became apparent, particularly from 2007. An annual cycle also appeared during this period, but the power was smaller as suggested by the global wavelet spectrum (**Figure 8D**). This indicates that most of the temporal variance in the dominance of Prochlorococcus in the Mediterranean Sea occurs at different periodicities, perhaps dominated by a seasonal period. Synechococcus tends to be more abundant at the surface waters, whereas Prochlorococcus thrives mainly in the deepchlorophyll maximum (Marty and Chiavérini, 2002; Casotti et al., 2003).

The temporal patterns of diatoms were characterized by a robust periodicity of 12 months across time (**Figure 9**). In this case, the largest dominance was observed during spring, in agreement with the diatom blooms reported in the basin over this season (Marty and Chiavérini, 2002). In contrast, the minima occurred in September, coinciding with nutrient exhaustion.

# Spatio-Temporal Patterns at the Sub-Regional Scale

The PHYSAT-MedOC-CCI approach was also applied in four selected sub-regions in order to track the temporal evolution of phytoplankton groups at smaller spatial scales. To allow for meaningful comparisons, the chosen areas resemble those considered by Sammartino et al. (2015): the Alboran Sea, the Ligurian Sea, the North Adriatic Sea and the Levantine basin (**Figure 1**). **Figure 10** shows the monthly climatology in the percentage of dominance of the four phytoplankton groups for each sub-region. Nanoeukaryotes occurrence exhibited a marked

of the data and should not be considered. (D) Global wavelet spectrum for the 10-day time series.

longitudinal gradient, with a higher abundance in the western basin (Alboran Sea and Ligurian Sea) in relation to the Levantine basin. Nonetheless, a marked annual cycle is evident in all sub-regions, which is characterized by the presence of maxima over the winter months and a minimum in summer. This was confirmed by a regional wavelet analysis that clearly revealed a consistent 12-month periodicity (**Figure 11**). Interestingly however, a longitudinal increase in the importance of 6-months

FIGURE 7 | Temporal patterns of variability for Synechococcus for Mediterranean Sea. (A) Temporal percentage (10-day) of Synechococcus. (B) Box plot (red lines stand for the median, blue box spans from the first to the second quartiles, and black lines represent 5th and 95th percentiles, respectively) of monthly climatology of Synechococcus. (C) Continuous wavelet power spectrum for the 10-day time series. Red line indicates the cone of influence, and is the region affected by the edges of the data and should not be considered. (D) Global wavelet spectrum for the 10-day time series.

FIGURE 8 | Temporal patterns of variability for Prochlorococcus for Mediterranean Sea. (A) Temporal percentage (10-day) of Prochlorococcus. (B) Box plot (red lines stand for the median, blue box spans from the first to the second quartiles, and black lines represent 5th and 95th percentiles, respectively) of monthly climatology of Prochlorococcus. (C) Continuous wavelet power spectrum for the 10-day time series. Red line indicates the cone of influence, and is the region affected by the edges of the data and should not be considered. (D) Global wavelet spectrum for the 10-day time series.

periodicities also became apparent: both in the North Adriatic Sea and in the Levantine basin, recurrent seasonal periods contribute to the overall variability (**Figure 11**). Nevertheless, the overall abundance of this group kept values above 20%, with the exception of the Levantine basin over summer months. This temporal pattern resembles that of nanoplankton found in the selected areas by Sammartino et al. (2015), and reflects the constant contribution of this group to primary production, as previously reported (Vidussi et al., 2000, 2001; Uitz et al., 2012).

Prochlorococcus abundance in the four regions was higher over the late summer months and similar in terms of percentage between the Western and Eastern basins. This phytoplankton group is, however, less represented in the Northern Adriatic Sea. It is worthy to note that an additional winter peak (February) of Prochlorochoccus can be identified in the Levantine basin. This pattern of abundance at a sub-regional scale coincides with the wavelet analyses (**Figure 11**). The annual cycle of Synechococcus is also evidenced by the maximum of abundance in the four sub-regions during the summer months coinciding with the stratification period and when Chla concentration in the basin is low (Volpe et al., 2007). During this season, the temporal climatology of Synechococcus reached values close to 100% in the Levantine basin, in agreement with the values given by Sammartino et al. (2015) for picoplankton. These authors indicated that this size class (closely corresponding to Synechococcus) seems to cover homogenously the entire Mediterranean Sea, with percentages of abundance close to 70%, although a decreasing concentration gradient from west to east can be still observed, which was also revealed by our analysis particularly for Prochlorococcus. According to our analysis (**Figure 10**), Synechococcus dominated in the Eastern basin, where ultraoligotrophic conditions are present and particularly during summer (Siokou-Frangou et al., 2010). During this season, primary production by the picoplankton exhibits a maximum (Uitz et al., 2012. It is well known that due to their high surface/volume ratio, Synechococcus (and also Prochlorococcus) can cope optimally with nutrientsimpoverished environments (Le Quéré et al., 2005). The presence of Synechococcus in the Levantine basin has been widely reported (Uitz et al., 2012 and references therein), and the PHYSAT-Med OC-CCI clearly revealed its presence in the ultraoligotrophic Levantine basin and depicted a realistic annual cycle (**Figures 10**, **11**).

As expected, diatoms were the least abundant of the four phytoplankton groups analyzed in the Mediterranean subregions. In fact, with the exception of the Northern Adriatic Sea, the percentage of abundance of diatoms fell within the range of 10–20% in the Western sub-regions and in the Levantine basin during the whole year. A moderate spring maximum could be still detected, coinciding with the seasonal blooms normally described for this phytoplankton group along the Mediterranean (Marty and Chiavérini, 2002). As suggested by the wavelet analyses, the annual cycle for diatoms is rather less robust compared to other groups (**Figure 11**). These findings agree with previous studies, where higher

FIGURE 10 | Box plot for monthly climatology for each group in Mediterranean sub-regions. Each row represents each phytoplankton group (nanoeukaryotes, Prochlorococcus, Synechococcus, and diatoms) and each column corresponds to the same sub-region (Alboran Sea, Ligurian Sea, Northern Adriatic Sea, and Levantine Basin). In the box plot figure red lines stand for the median, blue box spans from the first to the second quartiles, and black lines represent 5th and 95th percentiles, respectively, of monthly climatology for each group.

FIGURE 11 | Continuous wavelet power spectrum for the 10-day time series for each group and four sub-regions. Each row represents each phytoplankton group (nanoeukaryotes, Prochlorococcus, Synechococcus, and diatoms) and each column corresponds to the same sub-region (Alboran Sea, Ligurian Sea, Northern Adriatic Sea, and Levantine Basin). Red line indicates the region of time and frequency affected by the edges of the data and should not be considered.

phytoplankton biomass, particularly diatoms, was found in the Adriatic Sea (Socal et al., 1999; Casotti et al., 2003). This group represented an important faction (14%) of phytoplankton only in winter (Socal et al., 1999). Nevertheless, the contribution of picoplankton typically exceeds that of microplankton most of the year except during the winter-spring bloom (Uitz et al., 2012).

According to our assessment using the new OC-CCI database, the most abundant phytoplankton group during the winter months in the Mediterranean Sea was the nanoeukaryotes, but particularly in the sub-regions of the Western Mediterranean, the Alboran and the Ligurian Seas. Regardless of a specific area, the convective mixing in winter over the basin that uplifts deep nutrients to the upper layer triggers the proliferation of the bigger phytoplankton groups (Marty and Chiavérini, 2010), which is evidenced here by the major presence of nanoeukaryotes in winter months (**Figures 5**, **10**). Our data also reproduce the basin-wide diatom peaks of abundance in spring (**Figure 9**), following the Mediterranean phytoplankton succession previously reported (Marty et al., 2002) and with a consistent pattern every year (**Figures 9**–**11**). It should be indicated, however, that our approach presents certain limitations in diatoms detection, as acknowledged by Navarro et al. (2014). Hence, it is likely that diatoms abundance may have been slightly underestimated across the basin although studies on extensive distribution of this group along the Mediterranean are scarce to allow for an accurate comparison. Conversely, Synechococcus and Prochlorococcus, which are more favored by stratification conditions during summer due to their better efficiency under nutrient depleted conditions, were successfully identified both at the basin (**Figure 5**) and sub-regional scales (**Figure 10**). Moreover, the well-known dominance of Synechococcus with respect to Prochlorococcus (Schauer et al., 2003), particularly in the ultraoligotrophic Eastern basin, was neatly reproduced in our study.

Overall, the spatio-temporal patterns obtained by applying the PHYSAT-Med to the satellite OC-CCI database are consistent with the previous distributions of the major phytoplankton groups observed in the Mediterranean Sea (Vidussi et al., 2000, 2001; Marty and Chiavérini, 2002, 2010; Marty et al., 2002; Siokou-Frangou et al., 2010; Navarro et al., 2014; Sammartino et al., 2015; Di Cicco et al., 2017). These results suggest that our approach is highly suitable at basin scale and in selected sub-regions. This new dataset for PFT could be an efficient tool for recording and understanding the response of the marine ecosystem to human pressures and thus for detecting eutrophication in the Mediterranean Sea (Vantrepotte and Mélin, 2010; Colella et al., 2016).

# REFERENCES

Aiken, J., Pradhan, Y., Barlow, R., Lavender, S., Poulton, A., Holligan, P., et al. (2009). Phytoplankton pigments and functional types in the Atlantic Ocean: a decadal assessment, 1995–2005. AMT Spec. Issue. Deep-Sea Res. II 56, 899–917. doi: 10.1016/j.dsr2.2008.09.017

# CONCLUSIONS

This work presents an updated version of the PHYSAT-Med algorithm that has been specifically developed using the OC-CCI database. This ESA initiative aims at gathering ocean color measurements from four sensors since 1997. The distribution of the major phytoplankton groups in the Mediterranean basin during a 18 years period was consistent with the previous knowledge on the distribution patterns of phytoplankton in the basin. In addition, the modeled distributions are in concordance with the distribution of HPLC pigments analyzed in NW Mediterranean Sea for the whole temporal range. The utility of the updated approach was confirmed by the temporal analysis using the wavelet spectrum, which allowed for the identification of shifting patterns of periodicities across time for the dominant phytoplankton groups. Therefore, the new version of the PHYSAT-Med is appropriate for assessing the shifting spatio-temporal patterns of the most abundant phytoplankton groups in the Mediterranean Sea.

# AUTHOR CONTRIBUTIONS

All authors contributed to the final manuscript: GN and IH were responsible for writing and organizing the manuscript. IC and GN implemented the PHYSAT-Med algorithm to new OC-CCI database. PA and AV performed the wavelet analysis. All authors discussed the results and drew the conclusions.

# FUNDING

This work was financially supported by the Junta de Andalucía Projects PR11-RNM-7722, PIE 201530I012 and the National Project CTM2014-58181-R.

# ACKNOWLEDGMENTS

The authors acknowledge the Ocean Colour Climate Change Initiative dataset, Version 3.0, European Space Agency, available online at http://www.esa-oceancolour-cci.org/, for providing access to remote sensing reflectance products. In situ HPLC dataset was obtained from BOUSSOLE Project and MAREDAT database. We are grateful to two reviewers for their invaluable scientific improvements.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fmars. 2017.00246/full#supplementary-material


for ecosystem changes during past and future decades? Prog. Oceanogr. 80, 199–217. doi: 10.1016/j.pocean.2009.02.001


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Navarro, Almaraz, Caballero, Vázquez and Huertas. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

, Giorgio Dall'Olmo1, 2

,

# Validation and Intercomparison of Ocean Color Algorithms for Estimating Particulate Organic Carbon in the Oceans

\*, Victor Martinez-Vicente<sup>1</sup>

### *Edited by:*

Hayley Evers-King<sup>1</sup>

Hervé Claustre, Centre National de la Recherche Scientifique (CNRS), France

### *Reviewed by:*

Emmanuel Boss, University of Maine, United States David Antoine, Curtin University, Australia

> *\*Correspondence:* Hayley Evers-King hek@pml.ac.uk

### *† Present Address:*

Tihomir S. Kostadinov, Division of Hydrologic Sciences, Desert Research Institute, Reno, NV, United States

### *Specialty section:*

This article was submitted to Ocean Observation, a section of the journal Frontiers in Marine Science

*Received:* 15 March 2017 *Accepted:* 21 July 2017 *Published:* 09 August 2017

### *Citation:*

Evers-King H, Martinez-Vicente V, Brewin RJW, Dall'Olmo G, Hickman AE, Jackson T, Kostadinov TS, Krasemann H, Loisel H, Röttgers R, Roy S, Stramski D, Thomalla S, Platt T and Sathyendranath S (2017) Validation and Intercomparison of Ocean Color Algorithms for Estimating Particulate Organic Carbon in the Oceans. Front. Mar. Sci. 4:251. doi: 10.3389/fmars.2017.00251 Anna E. Hickman<sup>3</sup> , Thomas Jackson<sup>1</sup> , Tihomir S. Kostadinov 4†, Hajo Krasemann<sup>5</sup> , Hubert Loisel <sup>6</sup> , Rüdiger Röttgers <sup>5</sup> , Shovonlal Roy <sup>7</sup> , Dariusz Stramski <sup>8</sup> , Sandy Thomalla9, 10, Trevor Platt <sup>1</sup> and Shubha Sathyendranath1, 2

, Robert J. W. Brewin<sup>1</sup>

<sup>1</sup> Plymouth Marine Laboratory, Plymouth, United Kingdom, <sup>2</sup> National Centre for Earth Observation, Plymouth, United Kingdom, <sup>3</sup> Ocean and Earth Science, National Oceanography Centre Southampton, University of Southampton, Southampton, United Kingdom, <sup>4</sup> Department of Geography and the Environment, University of Richmond, Richmond, VA, United States, <sup>5</sup> Helmholtz-Zentrum Geesthacht, Center for Materials and Coastal Research, Geesthacht, Germany, <sup>6</sup> LOG, Laboratoire d'Océanologie et de Géosciences, Centre National de la Recherche Scientifique, University of Littoral Cote d'Opale, University Lille, UMR 8187, Wimereux, France, <sup>7</sup> Department of Geography and Environmental Sciences, and School of Agriculture, Policy and Development, University of Reading, Reading, United Kingdom, <sup>8</sup> Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA, United States, <sup>9</sup> Southern Ocean Carbon and Climate Observatory, Council for Scientific and Industrial Research, Stellenbosch, South Africa, <sup>10</sup> Department of Oceanography, Marine Research Institute, University of Cape Town, Cape Town, South Africa

Particulate Organic Carbon (POC) plays a vital role in the ocean carbon cycle. Though relatively small compared with other carbon pools, the POC pool is responsible for large fluxes and is linked to many important ocean biogeochemical processes. The satellite ocean-color signal is influenced by particle composition, size, and concentration and provides a way to observe variability in the POC pool at a range of temporal and spatial scales. To provide accurate estimates of POC concentration from satellite ocean color data requires algorithms that are well validated, with uncertainties characterized. Here, a number of algorithms to derive POC using different optical variables are applied to merged satellite ocean color data provided by the Ocean Color Climate Change Initiative (OC-CCI) and validated against the largest database of in situ POC measurements currently available. The results of this validation exercise indicate satisfactory levels of performance from several algorithms (highest performance was observed from the algorithms of Loisel et al., 2002; Stramski et al., 2008) and uncertainties that are within the requirements of the user community. Estimates of the standing stock of the POC can be made by applying these algorithms, and yield an estimated mixed-layer integrated global stock of POC between 0.77 and 1.3 Pg C of carbon. Performance of the algorithms vary regionally, suggesting that blending of region-specific algorithms may provide the best way forward for generating global POC products.

Keywords: satellite ocean color, particulate organic carbon, algorithms, validation, essential climate variables

# 1. INTRODUCTION

Total particulate organic carbon (POC or Co) in the ocean is a key currency used in studies of both the biological export of carbon from the surface to the deep ocean, and the availability of food for marine organisms. The pool of POC in the ocean is relatively small (estimates include: 0.43 Pg C in the first lightattenuation depth—Gardner et al., 2006; 2.28 Pg C over a 200 m surface layer—Stramska, 2009). Despite the relative small size of the POC compartment, its components (phytoplankton, bacteria, zooplankton, and organic detritus) are responsible for large fluxes in the ocean, because of their high turnover rates. The organic tissue generated by photosynthesis in the sunlit ocean is either exported from the surface via the "biological pump" (Volk and Hoffert, 1985; Ducklow et al., 2001), transferred to higher trophic levels through the food chain, transformed into detritus, or recycled via the microbial loop, with some of it going into the pool of dissolved organic (DOC) and inorganic carbon (DIC). Organic particles are therefore involved in two important carbon fluxes in the ocean, primary production and export to either the deep ocean or the DOC and DIC pools, in addition to being an integral part of the marine food web. In addition to the components of POC arising from local sources, POC may be transported to a particular location from distant sources: for example, by currents that move POC horizontally in the ocean, or by transport of POC of terrestrial origin to the oceans by river outflow. Though POC is typically treated as a single pool, there is growing awareness of the importance of different particles, such as defined by their size, because of their variety of biogeochemical functions, and their effect on ocean optical properties. For example, it has been shown that around 40% of POC concentration in the oligotrophic regions may be associated with bacteria alone (Cho and Azam, 1990) and submicron detrital particles can also make a significant contribution to the POC pool (Mel'nikov, 1976). Similarly, relatively large particles (generally larger than a few micrometers) can play an important role in POC export (e.g., Boyd and Newton, 1995; Dall'Olmo et al., 2009). The importance of particles characteristics in determining the optical signal of POC has also been recognized.

The theoretical work of Stramski and Kiefer (1991), assuming spherical and homogenous particles, indicated that small particles can make an important contribution to the backscattering signal in the oceans. Further work has shown the impact of non-sphericity and intracellular structures on optical properties, particularly backscattering (Meyer, 1979; Kitchen and Zaneveld, 1992; Quirantes and Bernard, 2004, 2006; Clavano et al., 2007; Matthews and Bernard, 2013; Robertson Lain et al., 2017). Work by Cetinic et al. (2012) ´ linked variation in the beam attenuation coefficient with plankton community composition, and variability in particle backscattering with changes in particle composition due to remineralization. They also highlighted how measurement artifacts might influence the observed relationships between POC and optical properties. Further work has explored separation of the phytoplankton component in POC—through both indirect (Behrenfeld et al., 2005; Kostadinov et al., 2016) and direct methods (Graff et al., 2012, 2015). The contribution of phytoplankton to the POC pool leads to the covariance between chlorophyll a concentration ([Chl]) and POC concentrations, although some scatter exists in these relationships, as a result of variability in the phytoplankton community composition, physiological factors that can affect the carbon-chlorophyll ratio in phytoplankton, and the variable contribution of substances other than phytoplankton (including detritus and bacteria) to POC (see discussion and references within Stramska and Stramski, 2005; Sathyendranath et al., 2009).

POC is readily quantifiable by filtering seawater samples, and forms a key component of many biological ocean models. However, in situ samples are expensive to collect, leading to a scarcity of data that hinders efforts to both validate ocean models and develop a complete understanding of POC dynamics. Satellite ocean color data offers the opportunity to quantify POC at the global scale on an almost daily basis. Ocean color or more specifically the water-leaving radiance and corresponding remote sensing reflectance spectra, and derived [Chl] are recognized as Essential Climate Variables (ECVs) by the Global Climate Observing System (GCOS, 2011). This is in recognition of their importance for studying various biological variables and processes in the ocean. In fact, of all the oceanic ECVs that are amenable to remote sensing, ocean color is the only one that targets a biological property. In response to the GCOS requirements, the European Space Agency (ESA) Ocean Color Climate Change Initiative (OC-CCI) has generated a time series of merged satellite products for climate research, using data from the ESA satellite sensor MERIS (MEdium spectral Resolution Imaging Spectrometer) and NASA (National Aeronautics and Space Administration) satellite sensors SeaWiFS (Sea-viewing Wide Field-of-view Sensor) and MODIS-Aqua (Moderate-resolution Imaging Spectroradiometer-Aqua). The products include the normalized remote-sensing reflectances, Rrs at SeaWiFS wavelengths and [Chl], as well as some additional inherent optical properties (IOPs) such as absorption and backscattering coefficients of phytoplankton and other particulate matter, and diffuse attenuation coefficient for downward plane irradiance, K<sup>d</sup> at 490 nm. There is a recognized need in the user community for additional products from ocean color that deal directly with POC, including separation of the contribution of phytoplankton, and the size distribution of particles. Further, these products need to be regionally optimized and their uncertainties well characterized. Both of these requirements may be addressed via optical classification (e.g., Moore et al., 2009), whereby waters are classified according to their spectral or bio-optical properties. Optical classification allows specific algorithms to be applied to the different optical water types (resulting in a global, merged product) and provides provides a method (Moore et al., 2009; Jackson et al., in press) that can be used to calculate per pixel errors. Other methods exist: e.g., formal error propagation (Lee et al., 2010), or estimation of uncertainties based on modelobservation comparison (Maritorena et al., 2010). However, the users consulted within the OC-CCI project expressed a preference for uncertainties based on comparison with in situ data (Sathyendranath et al., in press).

Remote sensing of POC through ocean color radiometry requires the exploitation of some optical signal that is associated with the material. In fact, optically, the beam attenuation coefficient of particles (cp), particle scattering coefficient (bp), backscattering coefficient (bbp) and the attenuation coefficient of downward irradiance K<sup>d</sup> are all sensitive to particle abundance (to a first order), and to particle composition (through refractive index), size, shape and internal structure. It has been demonstrated that POC is correlated with in situ c<sup>p</sup> measured using transmissometers (Gardner et al., 1993; Bishop, 1999; Claustre et al., 1999; Stramska and Stramski, 2005), which has provided a robust optical method for measuring POC using in situ devices. Although b<sup>p</sup> and c<sup>p</sup> are not among the data products that are routinely retrieved from remote sensing, satellite-based algorithms exist for retrieving POC from all the optical properties listed above, as well as from remote-sensing reflectance values.

This paper compares five different algorithms for estimating POC concentrations, selected as being representative of varied approaches that are prevalent for POC retrieval from oceancolor data. Each algorithm is applied to different optical properties derived from satellite ocean color data, and each uses different formulations for linking the parameters to the POC concentration. Matchups between in situ measurements of POC and satellite ocean color allow for the validation, intercomparison of global performance, and estimation of uncertainties associated with the POC calculated using these algorithms.

# 2. METHODS

# 2.1. Collation of an *In situ* Database

For this study, POC concentration data were collected from a number of existing databases and from individual contributors. Databases collated included PANGAEA (https://www.pangaea.de/) and SeaBASS (Werdell and Bailey, 2005), and those compiled by Martiny et al. (2014) and the Biological and Chemical Oceanography Data Management Office (BCO-DMO, USA). Further data were included from the Atlantic Meridional Transect (AMT) (including data derived from both CTD and the ship's clean water supply) and other cruises in the Southern Ocean (see a description of the Good Hope line and associated data collection in Thomalla et al., 2017). Operationally, POC is defined as all the organic carbon that is retained on GF/F filters (nominal pore size of 0.7 µm). To measure POC, samples are collected on pre-combusted (450◦C) GF/F filters and dried overnight at 65◦C before analysis. To remove particulate inorganic carbon, filters are acidified either by adding low-carbon HCl directly or by overnight exposure to the fumes of a concentrated HCl solution in a desiccator. Filters are then dried, packed in pre-combusted tin capsules, combusted at 960◦C in an elemental analyser to convert the organic carbon in CO2. The liberated CO<sup>2</sup> is finally detected by thermal conductivity (Sharp, 1974). Acetanilide is used as a standard. The procedure for applying a blank however is not always consistent across studies, and as such could be a source of bias within the data set collated here. Cetinic et al. ´ (2012) (and references therein) have studied the consequences of different methodologies for treating POC blanks, summarizing that the effect of DOC adsorption on filters (if not accounted for with an adequate blank correction) can cause substantial bias at low POC concentrations. In the database used here, a large quantity of the samples from low POC regions (i.e., the oligotrophic gyres) are from the AMT cruise programme, where a multiple-volume intercept blank methodology is used to reduce potential bias from blanks. Where data was provided at depth, measurements were averaged over 10m to provide the "surface" value for the matchup. Optical weighting of these measurement were considered, however given the variability of the water types sampled and associated mixed layer depths, and the necessary assumptions to apply an optical model for this purpose, it was decided not to introduce additional sources of uncertainty for these data points.

# 2.2. Extraction of Satellite-*In situ* Matchups

Matchup extraction was based on the procedure developed for the OC-CCI. The daily, 4 km, sinusoidally projected OC-CCI version 2 data (Sathyendranath et al., 2016) were searched to find satellite data associated with each in situ data point. The OC-CCI data is a merged product from three sensors, each with a different overpass time. However these overpass times are generally around 12 p.m. ± 2.5 h, meaning a maximum time difference between the in situ and satellite data of <12 h. The OC-CCI data used contain all of the relevant optical and biogeochemical properties necessary for implementation of the different algorithms under consideration, as well as water class membership of each pixel which quantifies the similarities between the remote-sensing reflectance spectrum at that pixel and the characteristic mean and covariance spectra associated with each of the optical classes (Jackson et al., in press), see also the OC-CCI product user guide (http://www.esa-oceancolorcci.org/?q=documents). The OC-CCI data were interrogated to assess the availability of data covering the latitude and longitude of the in situ data point, on the same date as the in situ data collection. If the central pixel contained valid data, the data surrounding eight pixels are also extracted from the selected data (a 3 × 3 pixel box, corresponding to a 12 km × 12 km region) . The central value and the mean, median, and standard deviation, number of valid pixels out of the nine pixels, the optical class with the dominant membership, and the calculated POC products using various algorithms applied to the central and mean pixel values, are returned as output, along with the in situ data values and metadata.

# 2.3. Candidate Algorithms

Five different algorithms for determination of POC concentration were considered. For consistency of comparison, all algorithms were implemented using the appropriate variables from the OC-CCI product suite. Another reason for using the OC-CCI products is that a rigorous algorithm selection procedure for atmospheric correction (Müller et al., 2015) and in-water properties (Brewin et al., 2015) (including derivation of [Chl], IOPs and the diffuse attenuation coefficient), had been put in place, to ensure quality of products. Furthermore, being a merged product, OC-CCI coverage is higher than that available from single-sensor products, ensuring a higher number of match-up points with in situ data. However, we recognize that all the five candidate algorithms were initially developed, implemented and tested using other ocean-color products, and that any systematic differences between the OC-CCI products and the datasets used by the algorithm developers, could be a potential source of difference in performance. Therefore, in the description of the algorithms, we also provide details of how the algorithms were implemented in the original work.

We also note that there are differences in how the five algorithms compared here were developed and implemented in the original work. For example, two of the algorithms (algorithms A and B presented below) were derived solely from coincidentally collected in situ data, whereas Algorithm D (G06—described below) combined in situ measurements of POC and beam attenuation, with satellite-derived measurements of diffuse attenuation coefficient. Algorithms A and B are based on some 50 measurements, whereas the POC—beam attenuation coefficient relationship used in Algorithm D was based on over 3,000 measurements. Algorithm C (described below) relies on a large in situ database of [Chl] and backscattering ratio to estimate total particle scattering coefficient from particle backscattering coefficient, and then relies on an extensive literature review to find a conversion factor between particle scattering coefficient and POC. Here we use a common set of satellite data (OC-CCI) to compute POC using the different algorithms and compare the products against a common set of in situ data. Where there are differences between how the various steps in the algorithms were implemented in the original work, and how OC-CCI treated similar steps in its product generation, they are highlighted, since such differences could have potential impact on algorithm performance.

### 2.3.1. Algorithm A Based on Remote-Sensing Reflectance

This algorithm (designated Co(A)) proposed by Stramski et al. (2008) uses remote-sensing reflectance at 443 and 555 nm (equation 1) as inputs, and takes the following form:

$$\left(C\_o(A)(\text{mg m}^{-3}) = 203.2 \left[\frac{R\_{\text{rs}}(443)}{R\_{\text{rs}}(555)}\right]^{-1.034}.\right. \tag{1}$$

The model parameters were determined using some 53 pairs of co-located in situ measurements of both POC and spectral values of Rrs from oligotrophic and upwelling waters of the East Atlantic and the South Pacific. The authors have provided various fits to the models for different pairs of wavebands for Rrs and for different selections of data. The one we have used here is based on all data, and for the 443–555 waveband pair, as recommended by the authors. This algorithm is currently used by the NASA Ocean Biology Processing Group to generate the global POC data product from ocean color data. The Rrs values in the OC-CCI product suite were used as input to this algorithm for the validation exercise presented here. We note that a similar POC algorithm based on the blue-to-green reflectance ratio has been developed and validated with the data from the Southern Ocean (Allison et al., 2010). For the Southern Ocean algorithm the best fit coefficients of the power function of the same form as Equation (1) were determined to be 189.29 and -0.87. A power function was found to provide better error statistics compared with those using modified fits of those typically used in maximum band ratio algorithms for [Chl].

### 2.3.2. Algorithm B Based on Backscattering by Particles

This algorithm (Co(B)), also proposed by Stramski et al. (2008) uses bbp(555), the particulate backscattering coefficient at 555 nm (2), as input. The equation has the form:

$$\text{C}\_o\text{(B)(mg\,m}^{-3}) = 53606.7 \times \text{b}\_{\text{bp}}\text{(555)} + 2.468\,. \tag{2}$$

Stramski et al. (2008) tested two approaches for calculating bbp for the eventual determination of POC: firstly, the method of Maffione and Dana (1997) as refined by Boss and Pegau (2001) was used to derive spectral backscattering coefficient from in situ measurements of volume scattering function at 140<sup>o</sup> . Next, using a two-step empirical approach, Stramski et al. (2008) then calculated bbp by removing the backscattering coefficient for pure water [they used pure-water backscattering coefficients proposed by Buiteveld et al. (1994) and by Morel (1974), and reported that the difference between the measurements of bbw was likely within the range of errors associated with the measurements themselves, and therefore did not substantially impact the performance of the algorithm for deriving POC]. Then, bbp was empirically related to POC concentrations (Equation 2). Stramski et al. (2008) provide various fits to this equation, but the parameters selected above correspond to the results (number of observations = 54) excluding upwelling data, which provided better uncertainty metrics, and may better reflect the data within the database used here. Secondly, to provide the remote-sensing context, Stramski et al. (2008) used either a direct empirical relationship between the backscattering coefficient (b<sup>b</sup> ) and Rrs or the Quasi Analytical Algorithm (QAA) approach of Lee et al. (2002) to derive bbp from Rrs. They found that the comparison with measured bbp was improved considerably when an empirical correction based on their measurements of bbp was applied to the QAA algorithm. Thus, this algorithm uses a two-step approach: first, bbp is derived from Rrs, and then that bbp is used in Equation (2) to calculate POC.

In the computation of OC-CCI bbp products, the QAA model was used, along with Zhang et al. (2009) for the purewater backscattering coefficient (Sathyendranath et al., 2016). In QAA the spectral backscattering coefficient is calculated empirically using Rrs at 440 nm and at 555 nm. A power law is then used to calculate the particle backscattering coefficients at other wavelengths given the value of the spectral backscattering coefficient and bbp(555), estimated from an analytical relationship using Rrs(555) and the absorption coefficient (a(555)). These values of bbp were used to compute POC in the calculations presented here, without any empirical correction. For this algorithm a linear fit provided the best error statistics in the original study of Stramski et al. (2008).

### 2.3.3. Algorithm C Based on both Backscattering Coefficient and Chlorophyll-a Concentration

This algorithm (Co(C)) is based on a combination of bbp(490) and [Chl], and was proposed by Loisel et al. (2002) (3). They derived bbp and [Chl] from SeaWIFS data using the method of Loisel and Stramski (2000) and the OC4 algorithm (based on O'Reilly et al., 1998) respectively. They then used a relationship proposed by Twardowski et al. (2001) to compute the total particle scattering coefficient given the particle backscattering coefficient (bbp) and [Chl] (designated B in Equation 3). They then adopted a conversion value of 400 mg C m−<sup>2</sup> to go from total scattering coefficient to POC concentration. Combining these steps yields the following algorithm:

$$\text{C}\_{o}\text{(C)(mg\text{ m}^{-3})} = 41666.7 \times \text{b}\_{\text{bp}}\text{(490)} \times \text{B}^{0.25}.\tag{3}$$

Algorithm C is implemented here using the Lee et al. (2002) QAA approach (for bbp), and a more recent OC4 (v6) for [Chl].

### 2.3.4. Algorithm D Based on the Diffuse Attenuation Coefficient for Irradiance and Beam Attenuation Coefficient

This algorithm (Co(D)), based on Gardner et al. (2006), uses a two-step relationship relating Kd(490) to beam attenuation coefficient for particles (cp), and then the beam attenuation coefficient to POC:

$$c\_{\mathcal{P}} = \exp(1.124 \times \log 10 (K\_d(490) + 1.1361)),\tag{4}$$

and

$$\text{C}\_{\text{o}}(D) \text{(mg m}^{-3}) = 12 \times \text{(31.7} \times \text{c}\_{\text{p}} + \text{0.785)}.\tag{5}$$

In Gardner et al. (2006), Kd(490) is obtained from SeaWIFS data where the algorithm of Mueller (2000) is used, based on water-leaving radiances at 490 and 555 nm. Gardner et al. (2006) used an extensive database (number of observations = 3,462) of concurrent measurements of POC and beam attenuation coefficient from Atlantic, Pacific and Indian Oceans, to derive the parameters of Equations (4) and (5) empirically.

The Kd(490) values used in the comparison presented here are obtained from the OC-CCI version 2 data, which uses the method of Lee et al. (2005).

### 2.3.5. Algorithm E Based on Spectral Backscattering Coefficient:

This algorithm (Co(E)) was developed by Kostadinov et al. (2009) and Kostadinov et al. (2016). It correlates the slope (η) of backscattering as a function of wavelength to the slope (ξ ) of the particle size distribution (PSD) that is assumed to follow a powerlaw. The method has three steps: firstly, η is calculated from the spectral bbp values at 490, 510, and 555 nm extracted from the OC-CCI matchup data (for which OC-CCI uses Lee et al. (2002)). Note that in Kostadinov et al. (2009) and Kostadinov et al. (2016), bbp is instead retrieved using the formulation of Loisel and Stramski (2000). Secondly, look-up-tables (LUTs) are used to retrieve the parameters of the PSD, namely the slope (ξ ) and the differential number concentration at a reference diameter of 2 µm, (No), given η and bbp at 443 nm. The LUTs are constructed using theoretical forward simulations using Mie code (Bohren and Huffman, 1983). Finally, to compute Co, the PSD is integrated to calculate particle volume in the 0.5 to 50 µm diameter range, and then volume is converted to carbon using existing allometric relationships derived from phytoplankton cultures (Menden-Deuer and Lessard, 2000), assuming biogenic origin for all the scattering particles. An empirical correction is applied to N<sup>o</sup> (based on PSD validation statistics) to achieve more realistic absolute carbon concentration values (Kostadinov et al., 2016).

# 2.4. Separation of Matchups by Optical Water Class and Calculation of Uncertainties

The OC-CCI product suite includes memberships of each pixel in 14 optical classes, following the fuzzy logic classification methodology of Moore et al. (2009), with some modifications as described in Jackson et al. (in press). The memberships of the 14 optical water classes associated with the satellite matchups were extracted alongside the radiometric and biogeochemical properties required for the validation. Each matchup point was then assigned to the optical class that had the dominant membership in the central match-up pixel. The statistical analyses were carried out for the global dataset as well as for subsets of the data grouped according to dominant optical class. These uncertainty metrics per optical class were then used to assign uncertainties at each pixel, by calculating the weighted average of the metrics associated with each of the water classes, with the membership of the classes in that pixel providing the weighting function.

Statistical analysis used in the assessment of each algorithm was based on that used by OC-CCI (see Brewin et al., 2015). The Kolmogorov-Smirnov test for normality of the in situ matchup data showed a significant deviation of normality for log<sup>10</sup> transformed and un-transformed data (p <0.001). Therefore, for completeness, the statistical analysis was conducted for both log<sup>10</sup> transformed using parametric tests and for un-transformed data using non-parametric, rank-based, statistics. Statistical metrics computed were:


To provide an indication of the stability of the statistics and to compute confidence intervals on them, bootstrapping (Efron, 1979; Efron and Tibshirani, 1993) with random re-sampling and replacement was used to construct 1,000 different datasets from which confidence intervals were computed for some statistics. Statistics were computed for the whole dataset, and also after segregating the data according to dominant water class, at the central matchup pixel.

# 3. RESULTS

# 3.1. Distribution of Validation Data and Matchups

Geographic distribution of the data in the in situ data base is shown in **Figure 1**, with the color scale representing the average POC concentration (mg m−<sup>3</sup> ) over measurements made in the top 10 m (total N = 63, 704, depth averaged N = 19, 282). **Figure 2** shows the in situ data with valid satellite data matchups (N = 3891), colored according to the concentration of POC (mg m−<sup>3</sup> ) measured. A number of expected patterns can be observed. Firstly, the number of valid matchups is much reduced relative to the total number of data points in the in situ database. This is a result of several factors including: the averaging of the in situ data over the top 10m, elimination of data points outside of the OC-CCI period (1997–2012); and failure of matchup when there were no satellite observations corresponding to the date and location of the in situ sampling. Other factors are spatially heterogenous, such that there is some regional skew in the likely success of obtaining a matchup, for example cloud cover has a spatially and temporally variable influence on matchup availability, such that in some areas it is less likely that a matchup will be found, e.g., in the tropics, or during winter in the mid latitudes—where satellite coverage was relatively poor. There is a high concentration of data in the Atlantic—as a result of the substantial AMT cruise data. Similarly, there is a high concentration of data from some coastal regions, particularly in the northern hemisphere. Observed concentrations of POC follow an expected distribution, with higher values in coastal and shelf regions, and lower concentrations in the oligotrophic gyres (**Figure 2**). A bimodal distribution can be observed in the in situ data, as a result of the high numbers of data from the AMT cruises which go predominantly through the oligotrophic gyres, and from coastal regions, which tend to be predominantly high-POC areas (**Figure 3**).

## 3.2. Algorithm Performance

The histogram of the in situ data frequency distribution is replicated to a degree by all the algorithms (**Figure 3**). The histogram of Algorithm A is very similar to that of the in situ data, with both histogram shapes and peak locations reproduced. Algorithm B tends to overestimate the POC values relative to in situ data at the lower end of POC concentrations, resulting in the first peak in the histogram being offset toward the right of the figure (i.e., toward higher POC concentrations). By contrast, Algorithm C underestimates POC slightly, at the lower end. The range of estimates provided by Algorithm D is narrower than that of the in situ data. Algorithm E also has a narrower range, and in general underestimates POC concentrations. Both algorithms D and E show significant shifts in both peaks of the distribution relative to those of the in situ data.

As could be expected from the histograms (**Figure 3**), there is generally a high correlation between the algorithm estimates and in situ measurements of POC, with r values between 0.75 and 0.82 for all algorithms tested (**Figures 4A–E**). It is of note that all algorithms show a small cloud of underestimated concentrations associated with high in situ POC. These points are from different data sources, and from different regions, and as such, outliers do not appear to be related to any systematic errors in the in situ measurements. Furthermore, not all of these substantial underestimates are associated with common data points across all the algorithms. It is important to note, however, that although we compare the satellite-derived POC with in situ POC over a very broad range, extending to very high values >1,000 mg m−<sup>3</sup> , the original formulations of the algorithms were not, in general, implemented with such high POC data.

The scatter plots and uncertainty statistics for Algorithm A shown in **Figure 4A** and **Table 1** suggest that this algorithm, on average, performs better than the other algorithms shown in **Figures 4B–E**. Algorithm A has the smallest bias, a linear fit that is closest to the 1:1 line, with a slope of 0.92 and a relatively small intercept value (the second smallest after Algorithm C), and also the highest (albeit slightly) correlation coefficient and the smallest values (albeit slightly) of the Root Mean Square Difference. The statistical parameters are consistently good for Algorithm A, though the results for Algorithm C also present some advantages. The performance of Algorithm C has a slope of 1 and intercept closest to zero, but has a slightly higher RMSD, CRMSD and bias compared with Algorithm A (**Figure 4C**). Algorithm B has relatively small bias, however other parameters are clearly inferior compared with Algorithm A (**Figure 4B**). Some statistical parameters associated with the performance of Algorithm D are significantly inferior compared with those associated with algorithms A and C, especially the slope and intercept of linear fit which indicate a large deviation of this fit from the 1:1 line (**Figure 4D**). As a result, low values of POC are overestimated, and high values underestimated, compared with the in situ data. The statistical performance for Algorithm E is poorest with significant negative bias and the best fit regression deviating greatly from the 1:1 line (**Figure 4E**). Performance statistics are summarized for all algorithms in the first section of **Table 1**. A statistical analysis was also conducted for the nontransformed data, and provided in the second section of **Table 1**.

FIGURE 1 | Geographic distribution of Particulate Organic Carbon (POC) measurements within the in situ database. The color scale represents the averaged POC concentration (mg m−<sup>3</sup> ) over the top 10 m.

## 3.3. Performance Per Water Class

To further understand algorithm performance, the matchups were separated by their optical water class. The total number of matchups per water class, and their distribution spatially, is shown in **Figures 5**, **6** respectively. The lower water classes are associated with oligotrophic regions, such as in the gyres, whilst the higher classes correspond to progressively more turbid shelf and coastal waters. Statistical performance of the algorithms across the different water classes is summarized in **Figure 7**.

Performing the statistical analysis across the different water classes reveals some similarities in performance across all algorithms, and some consistency with the overall performance (**Figure 7**). Algorithms A, C, and D show little differences between them in terms of RMSD across the water classes, with the exception of water class 14 (i.e., the most optically complex waters) (**Figure 7A**). The RMSD associated with Algorithm E follows the same broad pattern as algorithms A and D across the water classes, although its RMSD is substantially higher than those algorithms in all instances. Algorithm B shows slightly

FIGURE 3 | Histograms summarizing distribution of in situ POC data and associated satellite based estimates for (A) Algorithm A (Stramski et al., 2008, Rrs based), (B) Algorithm B (Stramski et al., 2008, bbp based), (C) Algorithm C (by Loisel et al., 2002) , (D) Algorithm D (by Gardner et al., 2006), (E) Algorithm E (by Kostadinov et al., 2016).



For log10: r<sup>s</sup> is Spearman's correlation; slope and intercept for were calculated with a Type-II linear regression model (Major Axis) and the statistics provided have uncertainty estimates (95% confidence interval), derived from 1,000 bootstrap realizations. For untransformed data: r<sup>p</sup> is Pearson's correlation; 9, δ and 1 are provided with uncertainty estimates (95% confidence interval), derived from 1000 bootstrap realizations; slope and intercept for were calculated with a Reduced Major Axis regression model; MAPD is the median absolute percent deviation between predictions and observations and is a measure of bias, and IQR is the interquartile range of the absolute percent error, and is a measure of precision. Bold italic numbers are the best results for each statistic, for some is the highest value (e.g., r<sup>s</sup> or rp), for some is the lowest (e.g., 9, δ, 1, Intercept and MAPD) and for some is the closest to one (e.g., Slope).

higher RMSD than the other algorithms in some of the more oligotrophic water classes (1–5), then largely follows the same patterns as algorithms A and D. The cloud of outlier points observed in the overall comparisons are associated with water classes 6, 7, and 8, where RMSD is relatively high for most of the algorithms. The estimates of bias for each of the water classes is consistent with the results from the global application e.g., Algorithm A shows very little bias, and this is consistent across water classes, whilst Algorithm E generally underestimates across all classes (**Figure 7B**). Algorithm C shows slight negative bias across most of the water classes, except 12 and 13, whilst algorithms D and B show slight (larger) positive bias in the more oligotrophic classes. Algorithm B shows large positive bias in water class 13, as with Algorithm C; however, both algorithms show negative bias in class 14. The center-pattern (or unbiased) RMSD in **Figure 7C** shows the largest uncertainties are associated with water classes 6, 7, and 8 whilst the largest differences in algorithm performance across the water classes are found in water classes 12, 13, and 14. The regional variability in algorithm performance, which can be associated with the optical water classes, is discussed further in Sections 3.4, 3.5, and 4.2 below.

### 3.4. Mapped Products

In addition to application to the matchups points, the POC algorithms can also be applied to global satellite data to compare algorithm performance at synoptic scales. POC concentrations

were estimated by applying algorithms A-E to a sample set of OC-CCI monthly products from May 2005 (**Figure 8**). All algorithms produce the broad patterns that were observed in the in situ measurements and would be expected to be associated with POC, i.e., increased POC associated with regions of high [Chl] in upwelling zones, lower concentrations in the oligotrophic

**271**

Stramski et al. (2008) (bbp), (C) Loisel et al. (2002), (D) Gardner et al. (2006), (E) Kostadinov et al. (2016), and (F) POC associated with an extracted transect through the Atlantic at 20<sup>o</sup> W for each algorithm, and the associated [Chl] from the OC-CCI data.

gyres, and higher concentrations in turbid shelf and coastal regions. However, there are some notable differences between the POC concentrations estimated by the different algorithms. Algorithm A and C perform similarly (**Figures 8A,C**). Algorithm B (**Figure 8B**) produces estimates that are generally higher relative to Algorithm A and C, particularly at low POC concentrations. Algorithm D (**Figure 8D**) underestimates at higher POC concentrations relative to all other algorithms, whilst at low concentrations its estimates are generally higher than algorithms A and C, and lower than algorithm B. In contrast, Algorithm E (**Figure 8E**) estimates lower POC concentrations relative to the other methods. A transect, extracted along from 20<sup>o</sup> west, shows the regional differences in algorithm estimates for POC, and the associated OC-CCI [Chl] for reference (**Figure 8F**). Histograms of these products (not shown), show a similar range to the in situ data used for validation (≈ 10–1,000), though values greater than 1,000 mg m−<sup>3</sup> are scarce in the satellite products (though higher values

monthly composite OC-CCI data from May 2005 (A) Stramski et al. (2008) (Rrs), (B) Stramski et al. (2008) (bbp), (C) Loisel et al. (2002), (D) Gardner et al. (2006), (E) Kostadinov et al. (2016), and (F) the associated OC-CCI water classes.

are more frequent with Algorithm C than with Algorithm A). Also, the pronounced bimodal frequency distribution of the in situ data is absent in the satellite products, which show a unimodal distribution. This difference between the frequency distribution of in situ data and satellite products could have had an impact on the comparative statistics presented here.

### 3.5. Mapped Uncertainties

Each algorithm has uncertainties associated with its performance for each water class, calculated from the validation exercise. These values can be used to estimate uncertainties for pixels outside of direct matchup locations, using a weighted average based on the percent membership to each of the classes. This procedure was applied to the data in **Figure 8** to calculate

OC-CCI water classes.

per pixel RMSD and bias. As could be expected from the performance of the algorithms across the substantial in situ data set, Algorithm A (Rrs based algorithm of Stramski et al., 2008) shows low RMSD and bias when uncertainties are calculated (**Figures 9**, **10**). Algorithms B and particularly Algorithm C show higher RMSE, particularly in the gyre regions, consistent with the distribution of the matchupbased estimates in **Figure 3**. Algorithm D has low RMSD

and bias in the oligotrophic gyres, but relatively higher values appear in the more productive (upwelling and coastal) areas. Algorithm E shows high RMSE throughout the image. Highest positive bias estimates are associated with algorithm B, relating to overestimation in the gyre regions (**Figure 10B**). Bias for Algorithm E (**Figure 10E**) shows negative bias globally, consistent with the general trend toward underestimation shown in **Figures 4E**, **8E**.

from May 2005.

# 4. DISCUSSION

# 4.1. Variability in Algorithm Performance As a Result of Input Satellite Data, Choice of Optical Models, and Regional Optical Properties

The strength of the relationships between bio-optical properties and POC concentration has been quantified in the original studies where the algorithms examined above were formulated, and in some cases, validated against satellite data. To the extent that some of the satellite data used in the original studies are included in the match-up dataset used here, the impact on the results is likely to be small because the size of the match-up data used here is much larger than that used in any of the previous studies. Furthermore, the comparisons presented here are based on a common satellite product suite (OC-CCI). However, further insights are gained from the bigger in situ data base assembled for this study, and from the climate-quality data set provided by OC-CCI.

Results from the OC-CCI based validation are generally consistent with the observations made by Stramski et al. (2008). The in situ data used by Stramski (Stramski et al., 2008) were limited to the south eastern Pacific and eastern Atlantic Oceans and covered a POC range of 12—270 mg m−<sup>3</sup> , whilst the range for the data used here cover a broader range (2.7–8,097 mg m−<sup>3</sup> ). The waters sampled by Stramski et al. (2008) ranged from upwelling to oligotrophic, with significant contributions of mineral particle matter to the particle assemblage at some stations. Consistent with the results here, Stramski et al. (2008) found that empirical relationships between Rrs and POC performed better than two-step approaches where an inherent optical property (IOP) is derived from the Rrs and then related to the POC. They also indicated relatively better performance of the Rrs relationship over that derived from bbp, highlighting the uncertainties in the derivation of bbp as one source of error in estimation of POC from IOPs. Additionally, the relationships between POC and IOPs would be expected to vary as a result of the particle size distribution (PSD), the refractive index of particles, and the fractional POC concentrations within different particle types in the assemblage. For example, significant variations in the POC-specific backscattering coefficient has been reported for different water bodies of the Southern Ocean (see Figure 1 in Stramski et al., 1999). Whilst variability in the POC-specific backscattering introduces uncertainty in total POC estimates, a better understanding of the relationship between particle characteristics and IOPs has the potential to provide further insight into the composition of the POC pool, and therefore to improve POC algorithms. Hence, it is important to pursue this line of algorithm development, even if the current performance of these methods might not be as good as that of some more empirical approaches. The line of investigation that accounts for the contributions of different types of POC to their optical properties is already yielding fruit (Stramski et al., 2008).

The algorithm of Loisel et al. (2002) (Algorithm C) is also a two-step approach, drawing on the relationship between b<sup>p</sup> and POC, via a relationship between bbp and [Chl]. Though Loisel et al. (2002) did not directly validate their POC estimation from SeaWiFS data, they found a good match between retrieved bbp and that measured in situ in a previous study (Loisel et al., 2001). Loisel et al. (2002) did indicate variability in the bbp:[Chl] relationship, linked to changes in the particulate pool; they highlighted the variable influence of small particles consisting of dead cells, grazers, and minerals. Gardner et al. (2006)

(Algorithm D ) also uses a two-step approach, exploiting the relationship between the beam attenuation coefficient (cp) and POC. This relationship was shown to be strong, when in situ POC was compared with transmissometer profiles. Although no fully-validated algorithm exists for routine derivation of c<sup>p</sup> from satellite ocean color measurements, Gardner et al. (2006) showed in situ c<sup>p</sup> was strongly correlated with [Chl] (r = 0.845–0.897) and Kd(490)) (r = 0.846-0.878) derived from SeaWIFS data over different oceanic regions.

(2006), (F) POC estimated using Kostadinov et al. (2016).

Algorithm E, by Kostadinov et al. (2016), addresses some sources of variability between optical properties and POC, such as the influence of the particle size distribution, which was also identified as being important by Stramski et al. (2008). The method of Kostadinov et al. (2016) uses spectral values of bbp to derive a PSD, which is then converted to POC (and phytoplankton carbon) using allometric relationships. The focus of the Kostadinov et al. (2016) paper was on phytoplankton carbon, computed as 1/3 of POC. Relationships between the phytoplankton carbon estimated from in situ PSD measurements and direct analytical determinations, showed r values between 0.5 and 0.714, depending on the limits of integration of the PSD, with wider limits resulting in the lower r. As discussed by Stramski et al. (2008), Kostadinov et al. (2016) also notes the impact of uncertainties in retrieved backscattering arising from both measurement and theory. In particular, assumptions of sphericity and homogeneity used in Mie theory are likely to be violated in real seawater particle assemblages, particularly for backscattering and in coastal and more productive areas (which are included in the database used here). For a more detailed discussion of the sphericity and homogeneity assumption, see Kostadinov et al. (2009) and refs. therein. Future work needs to focus on developing and more widely adopting bio-optical models that relax the Mie assumptions (e.g., Quirantes and Bernard, 2004, 2006; Clavano et al., 2007; Matthews and Bernard, 2013; Robertson Lain et al., 2017). Understanding of PSD variability, how it relates to backscattering, and how

used in algorithms to derive [Chl].

particle composition affects scattering over broad marine regions are required to develop further such detailed mechanistic approaches.

General sources of error associated with any ocean-color product include differences introduced by choice of sensor, sensor calibration, and the atmospheric correction procedure used to retrieve Rrs. In addition to these, a further consideration, particularly in the cases where algorithms use IOPs, is the methods used to derive the IOP product from the Rrs data. The OC-CCI processing uses the Quasi-Analytical Algorithm (QAA) of Lee et al. (2002) to calculate IOPs, including the bbp values used in this study. The original study by Kostadinov et al. (2016) used the method of Loisel and Stramski (2000) to estimate bbp. Stramski et al. (2008) also used different formulations to calculate bbp from Rrs, finding a corrected version of QAA produced a better estimate of bbp, and a strong relationship with POC (r = 0.933). The effect of the choice of method to derive bbp on the POC estimates requires further consideration, which goes beyond the scope of this study, as this IOP is particularly poorly understood and validated (Lee et al., 2002). The differences in algorithm performance across the different water classes indicate that regional variability in performance of the different algorithms can be expected. This is confirmed in the mapped regional distribution of uncertainties (**Figures 9**, **10**). These results suggest that algorithms either need to be selected carefully for applications in different regions, or that a selection of optimal algorithms may have to be blended for a global product (as done in Jackson et al., in press). This point is also raised in Stramski et al. (2008), where different formulations are provided for global application, and excluding upwelling data. Uncertainties in the underlying satellite data may also be responsible for a portion of this variability: for example, an IOP model may be more or less suitable to derive backscattering. It should also be noted that there can also be uncertainties in the in situ data and the validation process that can affect the assessment of uncertainties in algorithm performance. Ideally, multiple replicates will be taken to quantify uncertainties in the in situ measurement, and instruments will have a well-characterized calibration history, and be processed with community endorsed methodologies. For POC, the issue of blank correction was already highlighted in Section 2.1. Uncertainties resulting from variable methods used for the in situ data collated for this study may influence the results presented here, particularly at low POC concentrations. In terms of comparison to matchups, further uncertainties can be introduced by comparing values at different scales, i.e., point measurements may not represent the average over a pixel (in this case of 4 km in size). These uncertainties will limit the ultimate accuracy to which any satellite based product can be derived and validated. However, issues of spatial mismatch are beginning to be addressed with the use of underway systems (for example, Brewin et al., 2016).

Despite the difficulties highlighted above, the overall performance of the algorithms studied here is encouraging. Percentage error estimates based on the OC-CCI methodology show how well these algorithms can generate products suitable for the needs of the scientific community. For example, the percentage errors associated with the Stramski et al. (2008) Rrs algorithm applied to OC-CCI data in May 2005 (**Figure 11**), show that a majority of pixels fall within an error range that is widely accepted by the ocean color community for [Chl] (30%; GCOS, 2011).

### 4.2. Variability in the Ratio of Particulate Organic Carbon to Chlorophyll-a

Further perspective on the performance of the different algorithms can be gained by considering the covariance between POC and [Chl]. The relationship between the in situ POC data and satellite [Chl] is shown in **Figure 12A**, where the color indicates the associated dominant optical water class. These data then forms the background for each of the subsequent panels of **Figure 12**, which show the relationship between the POC estimated by each algorithm, and the satellite [Chl]. Algorithm A shares a commonality in method with the algorithms used to derive satellite [Chl], in that the same reflectance ratios are used to derive POC, and [Chl] (at lower concentrations); hence it shows a very constrained relationship in this domain of the parameter space (**Figure 12B**). Other algorithms capture the scatter in the POC:[Chl] relationship to a greater or lesser degree, though offsets can be seen, associated with the behavior identified in the validation exercise, i.e., overestimation of POC relative to lower [Chl] in the case of Algorithm B (**Figure 12C**), and underestimation of the ratio at low [Chl] using POC from Algorithm C (**Figure 12D**—though it should also be noted that this algorithm is also dependent on [Chl] to derive POC). As with Algorithm A, Algorithm D shows a relatively constrained relationship between POC and [Chl]. Algorithm E produces similar variability between POC and [Chl] as seen in the in situ data, in terms of shape and scatter of the curve, but the bias of this algorithm toward lower estimates of satellite derived POC is clear (**Figure 12D**).

The ratio of POC to [Chl] is important in the context of the discussion here for two reasons. Firstly, this ratio is important in the context of biogeochemical modeling, and the ecological and physiological processes that influence this ratio. Secondly, empirical relationships between POC and chlorophyll have been developed, which can be applied to satellite derived estimates of [Chl]. As mentioned above, these algorithms are typically similar to those employing blue:green reflectance ratios (e.g., Algorithm A from Stramski et al., 2008), and as such were not initially considered in the algorithm intercomparison here. **Figure 13A** shows a number of these empirical relationships, against a background of the same in situ POC and [Chl] data as shown in **Figure 12**. **Figure 13B** shows POC estimated using these [Chl]-based algorithms on OC-CCI [Chl] as a function of the blue-green Rrs reflectance ratio. The same reflectance ratio is employed by Stramski et al. (2008) to derive POC, and is also used in a number of empirical [Chl] algorithms. A linear regression of the in situ POC concentrations, against the satellite derived [Chl], results in an r 2 value of 0.70. Using the various relationships shown in **Figure 13A** to estimate POC based on the satellite [Chl] returns r 2 values between 0.63 and 0.69, lower than those returned for all the other algorithms assessed. The [Chl] based approaches show in **Figure 13** produce RMSD values (between 0.27 and 0.47) and bias (between −0.03 and 0.117) in the same range as the other algorithms.

Even though to first order Chl and POC are positively correlated in the global ocean, a residual scatter in the relationship remains (e.g., in satellite observations—**Figure 12A**, and in situ observations as well—e.g., Kostadinov et al., 2012). Ideally, a POC algorithm should be able to retrieve POC independently of [Chl] and capture the variable POC/[Chl] ratio correctly. Note that this ratio can vary due to both variability in the fraction of living phytoplankton carbon in the total POC pool, due to the physiology and photoacclimation of the phytoplankton component of POC (Geider, 1987; Geider et al., 1998; Behrenfeld et al., 2005), and species specific differences among phytoplankton themselves (Stramski, 1999). Therefore, independent knowledge of total POC, living phytoplankton carbon, and [Chl a] should be the goal of future bio-optical algorithm development.

## 4.3. Estimates of Total Pools of Carbon

The OC-CCI archive can be used to estimate total pools of POC in the mixed layer, taking into account interannual and regional variability, which is well captured by this merged dataset. Algorithms A-D were applied to the monthly OC-CCI version 2 data, and the values integrated over the mixed-layer depth (derived from MIMOC, Schmidtko et al., 2013), assuming homogeneity over the mixed layer. These were then averaged over all the months and for all the years of the OC-CCI version 2 (1998-2012) to provide estimates of the average standing pool of POC as follows: Algorithm A: 0.86 Pg C, Algorithm B: 1.3 Pg C, Algorithm C: 0.87 Pg C, Algorithm D: 0.77 Pg C. These are larger than the estimate of Gardner et al. (2006) and smaller than the estimate of Stramska (2009). Comparison of these estimates with those of phytoplankton carbon pools estimated in a parallel study (Martinez-Vicente et al., in review), indicates that phytoplankton carbon represents between 17 and 48% of the total POC pool. Whilst this ratio shows considerable variability, the often assumed value of 1/3 for phytoplankton carbon:POC falls within this range. High levels of variability in the phytoplankton carbon to POC ratio were also observed in situ by Graff et al. (2015). Satellite based estimates calculated by Kostadinov et al. (2016) (using a different set of mixed layer depth values) suggest a phytoplankton carbon standing stock of around 0.24 Pg C, implying a corresponding POC stock of around 0.72 Pg C when using the 1/3 assumption. Kostadinov et al. (2016) showed the estimated phytoplankton standing stock to be similar to estimates derived from the application of both the Stramski et al. (2008) bbp based algorithm, combined again with a 1/3 assumption and the method of Behrenfeld et al. (2005) to SeaWIFS data, and to model estimates from the Coupled Model Intercomparison Project 5 (CMIP5). The estimate of phytoplankton carbon standing stock from Kostadinov et al. (2016) is similar to that estimated by other size class based approaches, such as that of Roy et al. (2017) which used size classes derived from absorption to estimate a total phytoplankton carbon stock of 0.26 Pg C. Though the global estimates of POC from the different approaches assessed here are quite similar to each other, the differences are more pronounced at smaller scales, as can be seen in **Figure 8F**.

# 5. CONCLUSIONS

A variety of POC algorithms were applied to matchup pixels extracted from the satellite OC-CCI ocean color data, and validated against in situ data. The database used here represents the largest collection of in situ POC data available, to the author's knowledge. The five algorithms showed strong predictive capacity for estimating POC, with Algorithm A (based on Rrs—Stramski et al., 2008) and C (based on Loisel et al., 2002) performing well across the broad range of the in situ dataset. Algorithms A and C performed consistently across different water types as defined in the OC-CCI data. From the water class based validation, errors can be estimated per pixel. For Algorithm A and C, the errors were mostly within the range requested by the user community. These results suggest a maturity in POC algorithms and their suitability for production of long term time series for climate related studies. However, several key points of development are highlighted from the inter comparison of the different algorithms and the various studies reviewed here. Greater knowledge of the composition of the particulate pool, and how it affects the IOPs of the oceans, may allow increased accuracy of POC algorithms (within the constraints of the sensitivities of current satellite ocean color radiometry), as well as providing further information on different types of particles, many of which play important roles in water quality and ocean biogeochemistry. To support this aim, further in situ data should be collected, including additional measurements to provide detail on phytoplankton community size structure, physiology, and photoacclimation. Further, it is recommended that future work seeks to use consistent methodology for blank correction of POC measurements, and clarify any trends in the low POC region which may be influenced by these uncertainties. Further understanding of the sources of variability between POC and optical parameters can then be incorporated in to future, semi-analytical algorithms. New understanding of these relationships may also inform future sensor development (e.g., hyperspectral sensors) and optical modeling techniques.

# AUTHOR CONTRIBUTIONS

HE: All the calculations, preparation of figures, lead on writing the manuscript. VM: Project manager for Pools of Carbon project; development of matchup processing and statistical analysis. RB: Provision of code for statistical analysis based on OC-CCI methodology. GD: Provision of in situ data, content on bbp uncertainties and impact of particle sizing. AH: Perspectives on community requirements, particularly for ecosystem modeling. TJ: Provision of code for calculation of per optical water class uncertainties, based on the OC-CCI methodology. HK: Collation of the in situ database. TK: Algorithm provider, content on algorithm performance relating to particle size distributions. HL: Algorithm provider, content on regional variability in algorithm performance. SR: Input on statistical analysis and relative contributions of phytoplankton to the POC pool. RR: Collation of the in situ database. ST: Data

# REFERENCES


contribution and guidance on POC data for the Southern Ocean. DS: Algorithm provider, content on history of POC algorithm development, and interpretation of comparative analysis of different algorithms. TP: Project leader, scientific advice. SS: Development of the concept and work plan, guidance of HEK, review and rewriting of various sections of manuscript. All authors reviewed and provided comments on the draft manuscript.

# FUNDING

The POCO project is funded by the European Space Agency (ESA) under the program of Science Exploitation of Operational Missions (SEOM) following Contract: 4000113692/15/I-LG. This study is a contribution to the Ocean Color Climate Change Initiative of the European Space Agency, and to the activities of the National Center for Earth Observations (NCEO) of the Natural Environmental Research Council (NERC) of UK. This study is also a contribution to the international IMBER project and was supported by the UK Natural Environment Research Council National Capability funding to Plymouth Marine Laboratory and the National Oceanography Center, Southampton. AMT data were supported by the UK Natural Environment Research Council National Capability funding to Plymouth Marine Laboratory and the National Oceanography Center, Southampton. This is contribution number 309 of the AMT programme. TK was supported on NASA grant #NNX13AC92G and by the Division of Hydrologic Sciences, Desert Research Institute. The contribution of HL was funded by the CNES/TOSCA program in the frame of the COYOTE project.

# ACKNOWLEDGMENTS

The authors would like to thank Peter Regner for his support and management of the POCO project. We would like to thank Oliver Fisher for his contributions to the project during his student internship. The authors would like to thank the participants of the Color and Light in the ocean from Earth Observation (CLEO) workshop, for their valuable discussions on POC, which contributed substantially to refining the approaches presented in this work. The authors would also like to thank the two reviewers who provided detailed and constructive comments which substantially improved this manuscript.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Evers-King, Martinez-Vicente, Brewin, Dall'Olmo, Hickman, Jackson, Kostadinov, Krasemann, Loisel, Röttgers, Roy, Stramski, Thomalla, Platt and Sathyendranath. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Phytoplankton Group Identification Using Simulated and In situ Hyperspectral Remote Sensing Reflectance

### Hongyan Xi\*, Martin Hieronymi, Hajo Krasemann and Rüdiger Röttgers

Department of Remote Sensing, Institute of Coastal Research, Helmholtz-Zentrum Geesthacht, Center for Materials and Coastal Research, Geesthacht, Germany

### Edited by:

Astrid Bracher, Alfred-Wegener-Institute Helmholtz Center for Polar and Marine Research, Germany

### Reviewed by:

Maycira Costa, University of Victoria, Canada Aleksandra Wolanin, Helmholtz-Zentrum Potsdam Deutsches Geoforschungszentrum (GFZ), Germany

> \*Correspondence: Hongyan Xi hongyan.xi@hzg.de

### Specialty section:

This article was submitted to Ocean Observation, a section of the journal Frontiers in Marine Science

Received: 15 March 2017 Accepted: 07 August 2017 Published: 22 August 2017

### Citation:

Xi H, Hieronymi M, Krasemann H and Röttgers R (2017) Phytoplankton Group Identification Using Simulated and In situ Hyperspectral Remote Sensing Reflectance. Front. Mar. Sci. 4:272. doi: 10.3389/fmars.2017.00272 In the present study we investigate the bio-geo-optical boundaries for the possibility to identify dominant phytoplankton groups from hyperspectral ocean color data. A large dataset of simulated remote sensing reflectance spectra, Rrs(λ), was used. The simulation was based on measured inherent optical properties of natural water and measurements of five phytoplankton light absorption spectra representing five major phytoplankton spectral groups. These simulated data, named as C2X data, contain more than 10<sup>5</sup> different water cases, including cases typical for clearest natural waters as well as for extreme absorbing and extreme scattering waters. For the simulation the used concentrations of chlorophyll a (representing phytoplankton abundance), Chl, are ranging from 0 to 200 mg m−<sup>3</sup> , concentrations of non-algal particles, NAP, from 0 to 1,500 g m−<sup>3</sup> , and absorption coefficients of chromophoric dissolved organic matter (CDOM) at 440 nm from 0 to 20 m−<sup>1</sup> . A second, independent, smaller dataset of simulated Rrs(λ) used light absorption spectra of 128 cultures from six phytoplankton taxonomic groups to represent natural variability. Spectra of this test dataset are compared with spectra from the C2X data in order to evaluate to which extent the five spectral groups can be correctly identified as dominant under different optical conditions. The results showed that the identification accuracy is highly subject to the water optical conditions, i.e., contribution of and covariance in Chl, NAP, and CDOM. The identification in the simulated data is generally effective, except for waters with very low contribution by phytoplankton and for waters dominated by NAP, whereas contribution by CDOM plays only a minor role. To verify the applicability of the presented approach for natural waters, a test using in situ Rrs(λ) dataset collected during a cyanobacterial bloom in Lake Taihu (China) is carried out and the approach predicts blue cyanobacteria to be dominant. This fits well with observation of the blue cyanobacteria Microcystis sp. in the lake. This study provides an efficient approach, which can be promisingly applied to hyperspectral sensors, for identifying dominant phytoplankton spectral groups purely based on Rrs(λ) spectra.

Keywords: ocean color, remote sensing, phytoplankton spectral groups, light absorption, extreme case-2 waters

# INTRODUCTION

Phytoplankton play a fundamentally key role in oceans, seas, and freshwater basin ecosystems, as well as in related biogeochemical cycles. Phytoplankton communities are characterized by large taxonomic diversity that strongly determines their role in the ecosystem and their biogeochemical functioning (Uitz et al., 2015). The aquatic environment, whether inland, coastal, or open-ocean waters, is rarely comprised of a single algal class (IOCCG, 2014). Different phytoplankton groups adapt to environmental conditions such as high or low light, temperature, nutrient availability, and turbulence level (Aiken et al., 2008). Specific phytoplankton groups are characterized by some specific pigments—biomarkers—and can, thus, be identified from pigment inventories derived from in situ samples (Alvain et al., 2005). Recently, different bio-optical and ecological models have been developed for identifying phytoplankton functional types (PFTs), phytoplankton taxonomic composition, and specific phytoplankton species (e.g., Craig et al., 2006; Astoreca et al., 2009) by means of light absorption spectra, spectral response based on reflectance anomalies, backscatter-based derivation of the particle size distribution, phytoplankton abundance, or through look-up table of Rrs(λ) that incorporates the range of absorption and scattering variability (e.g., Ciotti and Bricaud, 2006; Alvain et al., 2008, 2012; Hirata et al., 2008; Bracher et al., 2009; Kostadinov et al., 2009; Mouw and Yoder, 2010; Brewin et al., 2015; Lorenzoni et al., 2015). Two recent review articles provide an overview of the different methodological approaches, remote sensing algorithms, and a gap analysis for obtaining phytoplankton diversity from ocean color (Bracher et al., 2017; Mouw et al., 2017). One technical requirement for better phytoplankton identification comprises the utilization of hyperspectral ocean color data over the full visible range between 400 and 700 nm. A limited traceability of uncertainties in connection with phytoplankton group information for all water types has been identified as a current gap of knowledge (Bracher et al., 2017).

With recent advances in optical measurements and future improvements in satellite sensors, approaches of phytoplankton group discrimination have been proposed based on various types of data from in situ measurements, model simulations and satellite sensors (Hunter et al., 2008; Lubac et al., 2008; Nair et al., 2008; Taylor et al., 2011; Isada et al., 2015). The rapid development of hyperspectral sensors allows providing more comprehensive remote sensing data of water reflectance spectral properties, attributable to the full range of visible light, i.e., to more wavebands, and higher spectral resolution. The increasing quantity of hyperspectral satellite missions, from existing Hyperion (Folkman et al., 2001), CHRIS (Barnsley et al., 2004), and HICO (Corson et al., 2008) (terminated in 2014) to the expected missions such as EnMAP (Foerster et al., 2015), PRISMA (Meini et al., 2015), HyspIRI (Lee et al., 2015), HYPXIM (Michel et al., 2011), and PACE (Gregg and Rousseaux, 2017), has and will provide much potential for applications of hyperspectral satellite data in aquatic ecosystems (Guanter et al., 2015; Xi et al., 2015). Band placement for improving PFTs retrieval from remote sensing data was investigated by analyzing dominant spectral features in the absorption spectra of the PFTs determined with different methods, with recommendations of using continuous hyperspectral data as they will provide better results (Wolanin et al., 2016). Attempts on hyperspectral identification and differentiation of phytoplankton taxonomic groups have been carried out with various approaches (e.g., Bracher et al., 2009; Torrecilla et al., 2011; Sadeghi et al., 2012; Uitz et al., 2015; Xi et al., 2015; Kim et al., 2016). Progresses achieved so far have not only provided recommendations on the directions into which more effort need to be put, but also suggested the constraints and difficulties lying in these approaches. Our previous study has shown that identification of phytoplankton taxonomic groups is successful when using light absorption spectra, but the identification performance varies in different water types when using remote sensing reflectance, Rrs(λ), as variability in water optical components changes Rrs(λ) spectra significantly, both in magnitude and spectral shape (Xi et al., 2015). Light absorption spectra of phytoplanktonic algae are determined by pigment composition and pigment cell concentrations, both can alter e.g., with light condition during growth (photoacclimation). Modeling and identification approaches that are based on phytoplankton absorption features also need to take these intra-taxa and intra-species variability into account, but are due to computer performance issues usually based on just a few single spectra representing a taxonomic or spectral group.

Given that a commonly used parameter obtained directly from hyperspectral Earth observation sensors is the remote sensing reflectance of the water surface, we focused on phytoplankton identification using Rrs(λ) only. In a former study (Xi et al., 2015) we have also shown that absorption features of pure water in Rrs(λ) affect the identification performance when phytoplankton concentration is low. In the present study, based on five standard absorption spectra representing five phytoplankton spectral groups, an extensive dataset including 10<sup>5</sup> Rrs(λ) spectra was simulated using HydroLight with various water optical conditions. This simulated dataset is part of a database compiled within the ESA SEOM C2X project (C2X, 2015). An identification approach is proposed to determine phytoplankton groups with the use of the C2X database. The objectives of this study are (i) to test the skill of the identification, (ii) to investigate how and to what extend other water optical constituents impact the accuracy of this identification, and (iii) to show the applicability of this approach in natural waters using in situ data.

## DATA AND METHODS

### Absorption Data

In order to obtain spectral absorption coefficient of different phytoplankton groups, 128 cultures of various algal species from six major phytoplankton taxonomic groups were prepared. Cultures had been prepared from 68 different species, these included 19 diatom species [Heterokontophyta (Bacillariophyceae)], 13 species of dinophytes [Dinophyta (Dinophyceae)], four species of prymnesiophytes [Haptophyta (Prymnesiophyceae)], three species of cryptophytes [Cryptophyta (Cryptophyceae)], 23 species of chlorophytes [Chlorophyta (Chlorophyceae, Picocystophyceae, and Trebouxiophyceae)], and six species of cyanobacteria (Cyanophyceae). Culture preparation, growth and light conditions are detailed in Xi et al. (2015) and included different light conditions for each species to introduce some spectral variability for each species. The absorption coefficient spectrum of each culture, aph(λ) (m−<sup>1</sup> ), was measured with a Point-Source Integration-Cavity Absorption Meter (PSICAM) following the procedures outlined in Röttgers et al. (2007). All measurements were done at least in triplicate against pure water as the reference. The PSICAM offers accurate determinations of the absorption coefficient without errors induced by light scattered on the algal cells. aph(λ) spectra were measured and area-normalized in the full spectral range of photosynthetically active radiation, i.e., 400–700 nm (Xi et al., 2015).

# Datasets of Simulated Remote Sensing Reflectance

In-water radiative transfer simulations have been carried out using HydroLight (version 5.2; Sequoia Scientific, Inc., USA; Mobley, 1994). The numerical model computes radiance distributions and other related quantities such as remote sensing reflectance, Rrs(λ), for any given water body. Optical properties of the homogeneous water body are varied in a controlled light environment, i.e., clear maritime atmosphere, moderate wind, and the sun is at its zenith. Two datasets of Rrs(λ) were modeled using HydroLight's "Case-2" model, assuming the same external conditions but differ in the number of representative spectra for the phytoplankton spectral groups. Thus, they have some similarities but are quasi-independent. The so-called C2X dataset (from ESA's Case-2 Extreme Water Project) that is based on five phytoplankton absorption spectra representing five spectral (taxonomic) groups is used as the standard database and the second one, which comprises optical situations based on 128 phytoplankton absorption spectra from cultures, is for testing; detailed descriptions of the datasets are provided in Hieronymi et al. (2017) and Xi et al. (2015), respectively (the test dataset used here contains more different CDOM absorption and nonalgal particles, NAP, concentrations than in the previous study of Xi et al., 2015). Basic information about the HydroLight input for the two datasets is provided in **Table 1**.

The main feature of the C2X database is that it covers most water types, from clearest oceanic Case-1 waters to CDOMdominated (extreme absorbing) and sediment-dominated (extreme scattering) Case-2 waters. For example, the total (organic and inorganic) particulate backscattering coefficient at 510 nm, bbp(510), varies between 0.0007 and 15.4 m−<sup>1</sup> and the combined absorption coefficient of detritus and gelbstoff at 412 nm, adg (412), is between 0.004 and 120.2 m−<sup>1</sup> . On the

TABLE 1 | Specifications of the two used Rrs(λ) datasets, C2X database and test data, both simulated with the "Case-2" model of HydroLight (with references in Mobley and Sundman, 2013).


Two values in square brackets refer to a range.

basis of the Xi et al. (2015) study, five fundamental spectral shapes of chlorophyll-specific absorption were selected. The five absorption spectra (of different species of algae) are supposed to have the highest potential for identification of these five different spectral groups from a remote sensing reflectance spectrum. These normalized spectra are shown in **Figure 1A** and stand for: (1) a "brown spectral group" representing Heterokontophyta, Dinophyta, and Haptophyta, (2) a "green spectral group" representing Chlorophyta, (3) a group for Cryptophyta, (4) a blue-green cyanobacteria, and (5) a red cyanobacteria. The first four spectra are absorption spectra from single cultures that are close to the mathematical mean for all spectra of cultures from this group (from the 128 measured culture absorption spectra). As an example, the absorption spectrum for the "brown spectral group" was chosen from all the cultures in the brown group in **Figure 1B**. These culture-spectra are realistic as very similar spectra can be found in the HZG in situ database (unpublished data). The spectrum of the red cyanobacteria was obtained from field measurements in the Baltic Sea during a bloom of cyanobacteria (most likely Nodularia sp.). Culture spectra (e.g., a red Synechococcus sp.) of this type mostly exhibit much higher phycobilin-related absorption peaks around 570 nm. In order to account for natural variability in the simulation for the C2X database, the actually used aph ∗ (λ) spectra are always mixtures from two of the five groups with individual contributions of 80 and 20%, respectively. The total phytoplankton absorption, aph, is related to the spectral chlorophyll-specific absorption and chlorophyll a concentration (denoted as [Chl] hereafter), aph (λ) = aph ∗ (λ) × [Chl]. The natural variability of phytoplankton absorption is very high (e.g., Bricaud et al., 2004); and the full range of observed natural variability is included in the simulations (**Figure 2**). Basis for estimating distributions, ranges, and covariances of optical properties and concentrations are several in situ datasets (e.g., Valente et al., 2016), but mainly our HZG in situ data from the North and Baltic Sea. The simulated data have been compared with in situ observations, e.g., bbp(510) and adg (412) vs. different reflectance band ratios (Hieronymi et al., 2016), and we generally found a good agreement. But we have also found some discrepancies partly related to plausible measuring uncertainties and possibly due to model simplifications. In this context, it should be mentioned that the model assumptions for spectral scattering properties are identical for all five phytoplankton groups, i.e., the particle backscatter fraction depends on chlorophyll a concentration (Twardowski et al., 2001), but not on algae-specific (back-) scattering properties.

For the HydroLight simulations, the considered Rrs(λ) is fully normalized, i.e., the sun is at zenith and the viewing angle is perpendicular; the water is infinitely deep; inelastic scattering, i.e., Raman scattering and Chl and CDOM fluorescence, are taken into account. Nonetheless, how inelastic scattering processes and their natural variability influence the results is

FIGURE 2 | Phytoplankton absorption coefficient at 440 nm, aph(440), vs. chlorophyll a concentration [Chl] used in simulations for the C2X database. The trend line is also shown in comparison to that by Bricaud et al. (2004).

representing the "brown spectral group."

out of scope of this work. Ultimately, the C2X database built with HydroLight simulations includes in total 1 × 10<sup>5</sup> Rrs(λ) spectra with five phytoplankton groups with various water optical conditions; while the test dataset includes 15,360 Rrs(λ) spectra, for 120 different water conditions ([Chl] varys from 0.1 to 100 mg m−<sup>3</sup> , [NAP] from 0 to 50 g m−<sup>3</sup> , and CDOM from 0 to 2 m−<sup>1</sup> ) with 128 phytoplankton absorption spectra (**Table 1**). The corresponding concentration values were listed in **Table 1**, and rational of the water condition settings was described and discussed in Xi et al. (2015) where this dataset was firstly used.

### Phytoplankton Group Identification

The general scheme of the identification approach is illustrated in **Figure 3**. At first, all Rrs(λ) spectra are area-normalized and then second-order derivative is calculated. Details on the normalization and derivative transformation are described in Xi et al. (2015). To identify the corresponding phytoplankton groups in a test data set, each Rrs(λ) spectrum in the test data set is compared to all spectra in the C2X database using the similarity index (SI) as an angular distance (Millie et al., 1997):

$$SI = 1 - \frac{2}{\pi} \times \arccos\left(\frac{\mathbb{x}\_1 \cdot \mathbb{x}\_2}{|\mathbb{x}\_1| \, |\mathbb{x}\_2|}\right) \tag{1}$$

where x<sup>1</sup> is a second-derivative spectrum of Rrs(λ) in the C2X database, and x<sup>2</sup> is one in the test dataset. The SI is a number between 0 and 1, where 0 indicates no similarity and 1 indicates perfect similarity between the two spectra. It is noteworthy that only the second-derivative spectra of Rrs(λ) in the range of 420– 620 nm was used for SI calculation to minimize the influence of noises at shorter wavelengths, where reflectance is often low, and

that of strong water absorption features at longer wavelengths (Xi et al., 2015).

This approach produces 10<sup>5</sup> SI values for each spectrum of the test dataset. The first 20 spectra in the C2X database providing highest SI values for each test spectrum are selected and their corresponding known phytoplankton spectral group is recorded. The group that is dominant in these 20 spectra is taken as the identified spectral group for this test spectrum. For each test spectrum one of the five spectral groups is identified as being dominant. Each taxonomic group in the test data is represented by five to 48 Rrs(λ) spectra (from 5 to 48 different cultures), and all taxonomic groups are categorized into five spectral groups, the spectral group identification accuracy is thus determined by calculating the percentage of the correctly identified Rrs(λ) spectra in each spectral group. In the end, given that there are 120 water optical conditions in the test data, 120 values for the identification accuracy of each group are calculated.

# In situ Rrs(λ) Data of Lake Taihu

An investigation campaign was carried out from 5th to 17th October 2008 in Lake Taihu (China). A set of in situ Rrs(λ) spectra was obtained by measuring the water-leaving radiances and sky radiances with a dual channel spectrometer, ASD FieldSpec Pro Dual VNIR (FieldSpec 931, ASD Inc., USA), following NASA ocean optics protocols (Mueller et al., 2003). When performing the measurements, the viewing angels of the two channels from the water surface at the zenith angle and the azimuth angle were 40◦ and 135◦ , respectively. Radiances of a 25 cm by 25 cm plaque with 25% reflectivity, water and sky radiances (each preceded by a dark offset reading) were measured and repeated five times. The measurements were performed at a location that minimized shading, reflections from superstructure, ship's wake, foam patches, and whitecaps. Moreover, the location was also pointed away from the sun to reduce the sunglint effect. Upon the upward radiance (Lu), sky radiance (Lsky), gray plaque radiance (Lplaq), and the water-air interface reflectivity determined based on the lake state at that time, Rrs(λ) were calculated referring to the method proposed by Mobley (1999). Details of the approaches for radiance measurements and Rrs(λ) calculation are illustrated in Ma et al. (2006). Water samples were taken simultaneously with the spectrometer for lab measurements of Chl, NAP, and CDOM concentrations. Absorption spectra by the total particles and the NAP were determined by quantitative filter technique (QFT) method (Mitchell, 1990) and aph (λ) was obtained by subtracting aNAP(λ) from ap(λ). aCDOM(λ) was also measured spectrophotometrically in a 10 cm cuvette using 0.7 mm Whatman GF/F-filtered water sample pads by the same UV-2401 spectrophotometer. More details on the above determinations are described in Xi (2011).

As one of the biggest freshwater lakes in China, Lake Taihu covers an area of 2427.8 km<sup>2</sup> with highly varying water quality from area to area. Water types in Lake Taihu are mainly classified into two categories: optically deep waters (ODWs) and optically shallow waters (OSWs) (Xi, 2011). ODWs cover most area of the lake with highly eutrophicated and turbid waters and frequent occurrence of cyanobacteria blooms, while the southeastern area is mostly OSWs with clear waters and abundant aquatic plants. Data used here are from ODWs only, as Rrs(λ) from OSWs has much influence from the submerged aquatic plants and the lake bottom and are thus not suitable for use in this study. Due to the large area of the lake, water optical conditions in ODWs are also diverse. Variations of the water components are known: [Chl] varied from 4.0 to 180 mg m−<sup>3</sup> , [NAP] from 9.5 to 95 g m−<sup>3</sup> and CDOM from 0.4 to 1.7 m−<sup>1</sup> . For the present study, Rrs(λ) spectra together with other optical parameters for 66 stations in ODWs are obtained. This additional "Taihu dataset" is used to test the applicability of the presented approach in natural waters.

# RESULTS

### Spectral Analysis of C2X Reflectances

The C2X database contains simulated Rrs(λ) spectra that included as model input a standard absorption spectrum for each of five different phytoplankton spectral groups. These Rrs(λ) spectra show different spectral features reflecting various water optical conditions. Prior to utilizing the C2X data in the identification approach, Rrs(λ) spectra in the database are firstly normalized and transformed to the second derivative spectra. To have an general understanding on the C2X database, representative Rrs(λ) spectra of the five phytoplankton groups and their second derivatives are selected for a few water cases. For each water case, five Rrs(λ) spectra with similar water optical conditions representing the five phytoplankton groups are chosen. **Figure 4** shows examples of Rrs(λ) spectra and their second derivative spectra, for different phytoplankton groups in five water cases. The five water cases are however not exhaustive. According to Hieronymi et al. (2017), 13 different water optical classes in total are classified by a fuzzy logic classification approach, but they are not completely included here as this study is not focusing on water type interpretation. Only examples of five water cases are chosen to show spectral variations in different scenarios. These five water cases possess the following conditions:


The corresponding water optical conditions of the five water cases and the variation of water optical components are listed in **Table 2**. Note that the ranges of "low," "moderate," and "high" concentrations are a bit varying from case to case which may cause slight difference in the spectral magnitude as well as the chlorophyll fluorescence. Water case (1) represents clear Case-1 waters (**Figure 4A**) where phytoplankton, CDOM and pure water are the main contributors to the Rrs(λ); phytoplankton absorption contributes to the suppression at 440 nm and the reflectance in the blue spectral region is high; low scattering and the high absorption by water at red and near infrared wavelengths results in low reflectance in this region. In absorbing waters such as water case (2) where CDOM is moderate but other concentrations are low (**Figure 4D**), reflectance is lower in the blue band suggests a strong CDOM absorption, and the peak at about 682 nm is due to the fact that Chl fluorescence was included in the simulations. In scattering dominated waters as water case (3) (**Figure 4G**), NAP are the dominating component; the reflectance is high in the whole visible region and the maximum is shifted to longer wavelength with the increase of NAP concentrations; peaks and troughs attributable to pigment absorption are suppressed. In high [Chl] waters as water case (4), the contribution by different phytoplankton pigments in the Rrs(λ) spectra is clearly seen (**Figure 4J**), suggesting that it is relatively easy to identify phytoplankton groups in such waters. Scattering by NAP results in higher reflectance at longer wavelengths (>550 nm), therefore when sediment load is extremely high, as shown in **Figure 4M**, the reflectance shows an increasing pattern with wavelengths in the visible region. In NAP-dominated waters, little spectral difference can be observed among the different phytoplankton groups (**Figure 4N**); absorption and scattering by sediments mask the algae pigment features. This masking effect is a generally known limitation and uncertainty source for remote sensing of biomass in turbid Case-2 waters (e.g., IOCCG, 2000).

Though phytoplankton groups exhibit distinct spectral features in some water cases, the corresponding second derivative spectra of Rrs(λ) in **Figure 4** (third column) show much variation in different water cases even for the same dominating phytoplankton group, due to the different contribution of other water optical constituents to the reflectance spectra. This indicates a possible difficulty in identifying phytoplankton groups for highly variable natural waters, by only inter-comparing the reflectance spectra without references. Given that, our theoretical basis is the C2X database that in the following is used as a lookup table (LUT) of standard reflectance spectra with information about the dominating phytoplankton groups, so that any test spectrum can be spectrally compared to the LUT and a certain phytoplankton group can be allocated to it. With the use of the simulated test data, the performance by the LUT identification approach can be evaluated for various water optical conditions.

# Phytoplankton Spectral Group Identification

In order to investigate how accurate the phytoplankton groups can be identified under different water optical conditions, Rrs(λ) spectra of the test data are compared with those in the C2X database by following the identification approach described above. Identification accuracy based on the 120 water conditions is generated for each spectral group via the proposed approach. The identification accuracy is presented for each spectral group as a function of the different water conditions using a ternary plot (**Figure 5**). Since a triangular diagram displays the proportion of three variables that sum to a constant, absorption coefficients by phytoplankton and NAP are used here to represent [Chl] and [NAP], respectively. CDOM itself is typically represented by the absorption of CDOM. The sum of these three absorption coefficients can be normalized to be 1, ignoring the absorption

FIGURE 4 | Representatives of Rrs(λ) spectra (first column), the corresponding area-normalized Rrs(λ) spectra (second column) and second derivative spectra (third column) for five phytoplankton spectral groups and different water conditions. (A–C) Low [Chl], [NAP], and CDOM; (D–F) low [Chl] and [NAP], moderate CDOM; (G–I) low [Chl], moderate [NAP] and CDOM; (J–L) high [Chl], moderate [NAP] and CDOM; and (M–O) moderate [Chl], extremely high [NAP], and moderate CDOM. Note that the second derivative spectra were only for 420–620 nm, i.e., the spectral range used in the identification approach.

TABLE 2 | Corresponding variations of [Chl], [NAP] and CDOM for Rrs(λ) spectra in Figure 4.


of pure water. Their proportions for all the 120 water conditions can be well displayed in a ternary plot. [Chl] and [NAP] are thus transformed to the corresponding absorption coefficients at 440 nm, aph(440) and aNAP(440), and used together with aCDOM(440). Four hundred and forty nanometers is chosen as all components do significantly absorb light at this wavelength, and contribution by pure water is low. The transformation from [Chl] to aph(440) is based on the relationship shown in **Figure 2**: aph(440) = 0.06 × [Chl]0.728 and aNAP(440) = aNAP ∗ (440) × [NAP], where the mass-specific NAP absorption at 440 nm,

FIGURE 5 | Ternary plots showing the identification accuracy of phytoplankton groups (A) Cyanobacteria (red and blue), (B) Chlorophyta (green spectral group), (C) Cryptophyta (red), and (D) Brown spectral group in different water conditions with respect to fractions of phytoplankton, NAP, and CDOM absorption at 440 nm: Contour lines indicate the accuracy of identification. Colors of dots indicate Chl concentrations; sizes of dots indicate NAP concentrations.

aNAP ∗ (440) = 0.0615 m<sup>2</sup> g −1 as shown in **Table 1**. Different colors and sizes of the dots are used to represent the actual [Chl] and [NAP] respectively, as the position in the ternary plot only shows each relative contribution.

Contour lines of the identification accuracy are plotted for each spectral group (**Figure 5**). The contour lines indicate different distribution of the identification accuracy for the different groups. However, a main finding in common is that low identification (50% contour line) for all groups located in the plot area where [NAP] is high (bigger dots) and [Chl] (blue dots) is low. However, this is not always true. If taking a further look on the plots one can see that the identification accuracy is also dependent on the absorption contribution of each water components but not only on concentrations. In **Figure 5**, for simplification, blue and red cyanobacteria are combined, as they show distinct spectral features compared to other groups and their identification results are highly similar. Cyanobacteria blue and red (combined in **Figure 5A**) show a relatively distinct contour pattern: the contour lines are roughly parallel with low Chl contributions, e.g., the identification rate of 90% is approximately at aph(440) taking up only 10% of the total nonwater absorption (aph+CDOM+NAP), and the 99% contour line is between 10 and 20%, meaning that cyanobacteria can be successfully identified when aph(440) takes up more than 20% of the total non-water absorption. The approach also performs well on the green spectral group (Chlorophyta), as **Figure 5B** shows that the identification rates falls in 90% only when [Chl] is extremely low [aph(440) is <5%] and [NAP] is as high as 50 g m−<sup>3</sup> , in all other conditions chlorophytes are correctly identified. The identification contour lines for cryptophytes in **Figure 5C** show more variations for different water conditions, due to the fact that there are only five cultures for Cryptophyta in the test dataset and one culture showed higher similarity with red cyanobacteria and is thus misidentified at some water conditions. This has lowered 20% of the identification rate. The overall performance of identifying species of the brown spectral group show that 90% of the cultures from Heterokontophyta, Dinophyta, and Haptophyta are correctly identified as being brown when aph(440) contributes more than 20% and aNAP(440) contributes <60% to the total non-water absorption (**Figure 5D**).

## Applicability Test Using Taihu In situ Dataset

Though theperformanceofusing the simulated testdataset showed satisfactory results at specific water conditions, the ultimate goal of our approach is to identify phytoplankton groups by using reflectance spectra of natural waters. A set of in situ Rrs(λ) spectra of Lake Taihu is taken to perform an additional test on the applicability of the proposed approach. The second derivative spectra of in situ Rrs(λ) in 420–620 nm are compared with that in C2X database to produce highest SIs leading us to find the corresponding dominating phytoplankton groups. Results show that only two phytoplankton spectral groups are found, blue cyanobacteria and Chlorophyta, In the examined 66 stations, for 52 (80%) the dominating phytoplankton are identified as cyanobacteria (blue) and 14 (20%) as chlorophytes.**Figure 6**shows theclassifiedspectraofinsituRrs(λ), thecorrespondingnormalized Rrs(λ)in400–700nm,and the secondderivative spectra. Itisclearly seen that most spectra identified as Chlorophyta exhibit higher reflectance in 500–600 nm and lower chlorophyll fluorescence peaks around 690 nm, indicating higher sediment concentrations and lower Chl concentrations. On the contrary, spectra that are identified as cyanobacteria show distinct absorption peaks at 675 nm and more pronounced fluorescence (**Figure 6B**). **Table 3** lists the identification results for Lake Taihu when stations are selected by different [Chl], showing that the identification rate of cyanobacteria increases with the increasing minimum [Chl]: 90% when [Chl]>10 mgm−<sup>3</sup> ,98%when [Chl]>20 mgm−<sup>3</sup> , and100% when [Chl] > 30 mg m−<sup>3</sup> .

It has been known in the context that [Chl] has an order of two in magnitude, and water optical conditions highly varied from station to station in Lake Taihu. The information we had from the campaign was that in waters where [Chl] was roughly 30 mg m−<sup>3</sup> or higher, cyanobacteria aggregated obviously and were the dominating group. Whereas, lower [Chl] waters could either be green algae or cyanobacteria dominated according to the identification. To explore whether the two identified groups are relating to the absorption contributions of each water component, proportions of aph(440), aNAP(440), and aCDOM(440) to the total non-water absorption at 440 nm were statistically summarized for cyanobacteria identified stations and green algae identified stations, respectively (**Table 4**). The overall [Chl] and contribution of aph(440) at cyanobacteria dominating stations are higher than that at green algae dominating stations. Mean aph(440) contribution is 16.1% for cyanobacteria while only 9.8% for green algae. aNAP(440) contribution shows the opposite with aph(440), with lower mean value (58.1%) for cyanobacteria but slightly higher for green algae; no difference is found in CDOM contribution between the two groups. These statistical results in **Table 4** also reveal that CDOM has little

FIGURE 6 | Phytoplankton-group-classified spectra of (A) in situ Rrs(λ) in Lake Taihu in 400–700 nm, (B) the corresponding area-normalized Rrs(λ), and (C) second derivative spectra in 420–620 nm.

TABLE 3 | Phytoplankton groups identification in Lake Taihu with different [Chl] ranges.


influence in the identification, in agreement with results from the simulated test dataset. Lower phytoplankton contribution in green algae dominating stations might suggest that the identification of green algae is less accurate.


# DISCUSSION

### Simulated C2X Data

Generally, the simulated reflectance spectra in the C2X database are plausible and fit to in situ measurements. For example, many lakes have signal-dominating CDOM fractions and exhibit reflectance spectra as shown in **Figure 4D** (e.g., Eleveld et al., 2017) and **Figure 4M** shows similar spectra as measured in a turbid estuary (e.g., Knaeps et al., 2012). Nonetheless, the simulations are based on some spectral assumptions that may lead to inaccuracy and therefore uncertainties in the group identification. One simplification regards the scattering properties of phytoplankton; some natural variability is included in the simulations, but due to the lack of reliable specific information, all phytoplankton groups are modeled with the same scattering assumptions. But due to different particle shapes and size distribution, it is evident that the spectral scattering properties vary (e.g., Morel, 1987; Evers-King et al., 2014; Harmel et al., 2016). Robertson Lain et al. (2017) showed the potentially considerable influence of different phytoplankton phase functions on modeled remote sensing reflectance over the entire visible range. A second point of model uncertainties, particularly in the range 650–700 nm, regards inelastic scattering effects such as CDOM and phytoplankton fluorescence; in nature, the quantum yield efficiency of phytoplankton varies significantly depending on nutrient- and light-availability and algae species (e.g., Greene et al., 1994). However, in the HydroLight simulations, the standard quantum yield efficiency was used.

### Identification Approach and Its Skill

The proposed approach was chosen for phytoplankton group identification based on the idea of whether we can identify phytoplankton groups by only knowing Rrs(λ), which is a directly obtained parameter from satellite sensors. All other inversion models require information on water inherent optical properties and the retrieval accuracy can be various (e.g., Werdell et al., 2014; Wang et al., 2016). In our previous study, we have made a performance comparison between using Rrs(λ) directly and using QAA-inverted absorption spectra from Rrs(λ) for phytoplankton group differentiation (Xi et al., 2015). Results show that the inverted absorption spectra performed less precise compared to the Rrs(λ). Due to the retrieval algorithm constraints, pigment information in the derived absorption spectra might be lost or distorted as theoretical or empirical relationships between the IOPs and AOPs are normally used in the retrieval algorithm. Upon the simulated extensive Rrs(λ) database, a direct comparison in spectral shapes between the second derivatives of a test Rrs(λ) spectrum and that of the spectra in the database is carried out by the current approach, omitting the knowledge of optical properties as well as the retrieval errors introduced by inversions. The benefit of this approach would be that we provide a straightforward way allowing us to know the dominant phytoplankton group (if it is one of the five) once Rrs(λ) is obtained from either in situ measurements or hyperspectral satellite data.

Skill of the proposed approach varies in different water conditions. Results derived by using the test dataset for various water optical properties indicated that the identification accuracy was highly subject to the water optical conditions; the identification was effective for waters with high phytoplankton contribution but less effective in NAP dominated waters, whereas CDOM has little influence even when it is extremely high. It is not only in agreement with the results by Xi et al. (2015) that phytoplankton groups differentiation is unsuccessful in waters with [Chl] lower than 1 mg m−<sup>3</sup> , but also suggested the low efficiency in high [NAP] waters. However, regarding the optical boundaries of the successful identifications, it should be clarified that the accuracy of the identification is not only dependent on the concentrations of water components but also on the contribution of absorption by each water component to the total absorption. That means the identification accuracy can possibly be high both in case 1 clear waters and in highly turbid productive (phytoplankton abundant) waters. There are no clear concentration boundaries. And the optical boundaries in terms of absorption contribution are nicely shown in the ternary plots (**Figure 5**). Regarding this matter, ternary plots exhibit clearly the contour line distribution of the identification accuracy for all water optical conditions generated by 120 points representing 120 water optical conditions. We can roughly wrap up some general findings from the ternary plots, that are—the identification accuracy is higher than 90% when the absorption by phytoplankton is taking up more than 20% and the NAP contribution is <60% to the total absorption; for the groups of cyanobacteria and green algae, the identification accuracy is even higher at the above boundaries.

Though there were only 120 water optical conditions considered in the test dataset, it included most of the natural aquatic environment from clear to moderate turbid and productive waters. The findings have provided us a basic knowledge that the proposed approach for phytoplankton group identification performs well except for waters where phytoplankton contribution to the overall absorption is low and for NAP dominated waters. Regarding the five spectral groups included in C2X database, they are not exhaustive but are chosen to represent natural common groups. In addition, due to modeling and computing constraints, the phytoplankton groups that were taken into account in the simulation have to be representative and the number of the groups should be as low as possible to allow extensive simulations. This database is however adjustable, when absorption features of other phytoplankton groups (or species with typical features) are important.

# Phytoplankton Group Identification in Lake Taihu

To test the applicability of the proposed approach in natural waters, in situ remote sensing reflectance data obtained from Lake Taihu were used. Though lacking the information of dominant phytoplankton species or pigment analysis for this campaign in October 2008, previous investigations on the phytoplankton community and composition in Lake Taihu can be taken as reference. Chen et al. (2003) revealed that phytoplankton groups commonly observed in Lake Taihu are cyanobacteria, Chlorophyta, Bacillariophyta, and flagellates. A study on phytoplankton community structure succession in Lake Taihu from 1992 to 2012 by Deng et al. (2014) showed that Cryptomonas (Cryptophyta) was the dominant species in spring during the early 1990s. Dominance then shifted to Ulothrix (Chlorophyta) in 1996 and 1997. However, Cryptomonas again dominated in 1999, 2000, and 2002, with Ulothrix regaining dominance from 2003 to 2006. The bloom-forming cyanobacterial species Microcystis sp., a typical blue cyanobacteria, dominated in 1995, 2001, and 2007–2012. Another study revealed that Microcystissp. is the most commonly seen cyanobacteria species, approximately taking up 85% of algae biomass and forming algal blooms each summer (Zhu et al., 2007). More importantly, a year-long investigation in dominant phytoplankton species from October 2008 to October 2009 conducted in the lake showed that Microcystis sp. dominated in October 2008, when our in situ data was collected, contributing more than 90% of total biovolume in most area of the lake, coexisted with minor portion of the cyanobacteria Dolichospermum flos-aquae and the diatom Cyclotella meneghiniana (Ai et al., 2015). Our identification results showed good agreement with these investigations, except that chlorophytes were identified as the dominant group at some stations when [Chl] was moderate. Comparison in absorption contributions between cyanobacteria and green algae identified stations shows that phytoplankton contribute <10% on average to the total non-water absorption at green algae identified stations (**Table 4**), leading to lower identification accuracy as indicated in **Figure 5**. It is highly likely that this is a misinterpretation, as still cyanobacteria were codominant.

The in situ data of Lake Taihu are used as a first example. Coinciding data of spectral reflectance and information of phytoplankton taxonomic composition are still quite rare. However, this first example has given optimistic outcome and the fact that the approach is applicable in this optically complex lake. More datasets in different water types are under collection and processing, with expectations to testify further the identification approach in more natural waters.

# CONCLUSIONS AND OUTLOOK

A database of Rrs(λ) spectra, C2X database, based on five phytoplankton groups was built using HydroLight simulations for various water optical conditions. A similarity-index approach was proposed to identify phytoplankton groups, using remote sensing reflectance spectra only, by spectrally comparing an input test spectra with the Rrs(λ) in C2X database. The performance of the approach was tested using another simulated Rrs(λ) dataset with 128 spectra of phytoplankton algae from six taxonomic groups arranged into five spectral groups. For 120 water optical conditions, the identification was high at most occasions except for waters with a low phytoplankton contribution and for waters dominated by NAP. Whereas, the influence of CDOM is less pronounced and only significant at extremely high level. Though the proposed approach was based on simulated datasets, its applicability in natural waters was also tested by using in situ Rrs(λ) spectra from Lake Taihu, China. Despite of possibly wrong identification of chlorophytes that could not be validated, cyanobacteria were successfully identified in Lake Taihu as a dominating group in high [Chl] waters, proving the applicability of the approach in natural waters when a single group is dominating.

The current approach is only capable of identifying spectral groups that are already presented in the C2X database. However, the database can be expanded by running the same HydroLight simulations with different absorption characteristics of other phytoplankton groups. The following aspects might be worth to investigate: (1) more validation with in situ data and measurements; (2) applicability in extreme events such as floating algae in highly turbid waters; (3) determination on the required or lowest spectral resolution for a wider use of hyperspectral Rrs(λ); and (4) examination of using hyperspectral satellite data in consideration of influences of radiometric, spectral, and atmospheric effects on Rrs(λ) from the space.

# AUTHOR CONTRIBUTIONS

HX carried out the data analysis and the approach development with guidance from RR. MH simulated the hyperspectral remote sensing reflectance for the C2X database and the test data using HydroLight. HX drafted the main manuscript and MH refined the simulation part in Data and method. MH, RR, and HK all provided comments and suggestions and revised the manuscript.

# FUNDING

The work of HX is supported by the project "Preparations for the scientific use of the EnMAP data: Coastal and inland waters (50EE1257)." EnMAP is funded under the DLR Space Administration with resources from the German Federal Ministry of Economic Affairs and Energy. MH is partly supported by ESA through a Living Planet Fellowship. The C2X database was established by MH in the ESA SEOM C2X project (ESA ESRIN/C-No. 4000113691/15/I-LG). We thank ESA for sponsoring the publication.

# ACKNOWLEDGMENTS

We are thankful to Stephen Gehnke for measuring absorption spectra of various phytoplankton cultures. The in situ data of Lake Taihu was collected and measured by the Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences. We thank Dr. Hongtao Duan and Prof. Yuanzhi Zhang for providing in situ dataset under the support of the Provincial National

# REFERENCES


Science Foundation of Jiangsu of China (BK20160049) and the National Key Research and Development Program of China (2016YFC1402000). The Open Fund at Jiangsu Key Laboratory of Remote Sensing of Ocean Dynamics and Acoustics in NUIST (KHYS1403) is also acknowledged. We finally thank the two reviewers for providing constructive comments to improve the manuscript.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Xi, Hieronymi, Krasemann and Röttgers. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# An Exact Solution For Modeling Photoacclimation of the Carbon-to-Chlorophyll Ratio in Phytoplankton

### Thomas Jackson<sup>1</sup> \*, Shubha Sathyendranath1, 2 and Trevor Platt <sup>1</sup>

<sup>1</sup> Plymouth Marine Laboratory, Plymouth, United Kingdom, <sup>2</sup> National Centre for Earth Observation, Plymouth Marine Laboratory, Plymouth, United Kingdom

A widely-used theory of the photoacclimatory response in phytoplankton has, until now, been solved using a mathematical approximation that puts strong limitations on its applicability in natural conditions. We report an exact, analytic solution for the chlorophyll-to-carbon ratio as a function of the dimensionless irradiance (mixed layer irradiance normalized to the photoadaptation parameter for phytoplankton) that is applicable over the full range of irradiance occurring in natural conditions. Application of the exact solution for remote-sensing of phytoplankton carbon at large scales is illustrated using satellite-derived chlorophyll, surface irradiance data and mean photosynthesis-irradiance parameters for the season assigned to every pixel on the basis of ecological provinces. When the exact solution was compared with the approximate one at the global scale, for a particular month (May 2010), the results differed by at least 15% for about 70% of Northern Hemisphere pixels (analysis was performed during the northern hemisphere Spring bloom period) and by more than 50% for 24% of Northern Hemisphere pixels (approximate solution overestimates the carbon-to-chlorophyll ratio compared with the exact solution). Generally, the divergence between the two solutions increases with increasing available light, raising the question of the appropriate timescale for specifying the forcing irradiance in ecosystem models.

### Edited by:

Laura Lorenzoni, University of South Florida, United States

### Reviewed by:

Christoph Voelker, Alfred-Wegener-Institut für Polar- und Meeresforschung, Germany Greg M. Silsbe, University of Maryland Center For Environmental Sciences, United States

> \*Correspondence: Thomas Jackson thja@pml.ac.uk

### Specialty section:

This article was submitted to Ocean Observation, a section of the journal Frontiers in Marine Science

Received: 01 March 2017 Accepted: 21 August 2017 Published: 08 September 2017

### Citation:

Jackson T, Sathyendranath S and Platt T (2017) An Exact Solution For Modeling Photoacclimation of the Carbon-to-Chlorophyll Ratio in Phytoplankton. Front. Mar. Sci. 4:283. doi: 10.3389/fmars.2017.00283 Keywords: photoacclimation, phytoplankton, carbon-to-Chlorophyll, photo-physiology, primary production

# 1. INTRODUCTION

When quantifying the standing stock of marine phytoplankton or its rate of change, various metrics can be used, depending on the application envisaged. The possibilities include cell count, cell volume, carbon content, nitrogen content and chlorophyll concentration. Primary production (rate of production of organic material by phytoplankton through photosynthesis) is typically measured in carbon units, a convenient measure in studies of the global carbon cycle. It is also a practical unit in calculations of fluxes of material through the food chain or through the water column. On the other hand, chlorophyll-a concentration is by far the most commonly-used measure of phytoplankton abundance. There are many reasons for this choice also, including its principal role in the photosynthetic apparatus and in primary production; its presence in all types of phytoplankton, either in its common form or as derivates such as divinyl chlorophyll-a; and the ease with which it can be measured at a variety of scales, from single cells in the laboratory to ocean-basin scales using remote sensing by satellites.

The carbon-to-chlorophyll ratio, necessary to convert between these two common measures of phytoplankton biomass, is a dynamic, and highly-variable property of phytoplankton. Phytoplankton growing in high-light environments need to absorb only a small fraction of the available light, and they adapt to the ambient light field by reducing their pigment quota, resulting in a high carbon-to-chlorophyll ratio. The opposite is true in low-light conditions, for example in deep chlorophyll maxima in the ocean gyres, where chlorophyll concentration increases relative to the carbon concentration (Cullen, 1982, 2015; Morel and Berthon, 1989). Estimating such changes in carbon-to-chlorophyll ratio in response to variations in available light, i.e., due to photo-acclimation, is not a trivial task, but it is an essential step in many biogeochemical models. As reviewed by Halsey and Jones (2015), nutrients can also play a role in carbon-to-chlorophyll variations, although the sign of the change depends on the nutrient in question, with some nutrients being utilized for the production of pigments and others for photosystem reaction centers.

The links between carbon-to-chlorophyll-a ratios, photosynthesis and photo-acclimation are discussed in the works of Platt and Jassby (1976) and Geider (1987). Subsequently, Geider et al. (1996, 1997) developed a mechanistic model of photo-acclimation that has become commonly used to assign the chlorophyll:carbon ratio of phytoplankton populations in ecosystem models (Hickman et al., 2010; Dutkiewicz et al., 2015; Laufkötter et al., 2015). In a further development, Geider et al. (1998) dealt with the possible variations in photosynthetic parameters with nutrients and temperature. But the approximation used to derive the solution to the photoacclimation model (Geider et al., 1997) still limits the range of irradiance levels for which the solution holds. Some authors have addressed this problem by a numerical solution to the Geider et al. (1997) model rather than the approximation (e.g., Li et al., 2010), while others have imposed a numerical upper limit on the C:Chl ratio (Butenschön et al., 2016) to constrain model output.

Here, we present an exact solution that dispenses with the need for an approximation, removes the existing limitation and is therefore universally applicable. We examine conditions under which the differences between the approximate solution and the exact solution become significant, and discuss some of the implications for implementation of the model to compute carbon-to-chlorophyll ratios under natural environmental conditions. We show that, in some instances, the differences between the exact and approximate solutions depend on the assumptions in the model regarding the time scales on which photo-acclimation occurs in phytoplankton.

### 2. DATA

To demonstrate some applications of the new solution, a variety of datasets were used, which are described here briefly.

Monthly, climatological Photosynthetically Available Radiation (PAR) data from SeaWiFS (Frouin et al., 2002) are used for demonstrating an application of the new solution at large scales (http://oceancolor.gsfc.nasa.gov/cms/atbd/par). We used monthly composites to minimize data gaps. Climatological mixed-layer depth (MLD) was obtained from de Boyer Montégut et al. (2004) and also re-gridded onto a 9 km grid to match the input PAR data.

We used mean values of photosynthesis-irradiance parameters (the assimilation number P B <sup>m</sup> and the initial slope α B , where the superscript B indicates normalisation to biomass B, in chlorophyll units; see **Table 1**) organized by season and by ecological provinces (as defined by Longhurst et al. 1995), from Mélin and Hoepffner (2004), which were then re-gridded, with a 30 × 30 pixel smoothing filter, to 9 km resolution to match the PAR data. These parameters can be used to calculate the chlorophyll-normalized production (P B ) at any value I of photosynthetic irradiance (PAR), in the absence of photoinhibition, as described by Platt et al. (1980):

$$P^B = P\_m^B \left( 1 - \exp(\frac{-\alpha^B I}{P\_m^B}) \right) \,. \tag{1}$$

The P B <sup>m</sup> and α B values allow the calculation of the photoadaptation parameter I<sup>k</sup> , defined as P B m/α<sup>B</sup> . Surface Chl-a concentration from the Ocean Colour Climate Change Initiative (OC-CCI) dataset, Version 2.0 (European Space Agency, available online at http://www.esa-oceancolour-cci.org/) and the spectral light-transmission model of Sathyendranath and Platt (1988) were used to compute Kd, the diffuse attenuation coefficient for photosynthetically-active radiation for the mixed layer. The daily average irradiance in the mixed layer (Im) was computed as

$$I\_m = \frac{\overline{I}\_0}{K\_d Z\_m} (1 - \exp(-K\_d Z\_m)),\tag{2}$$

where I<sup>0</sup> is the daily (24 h) average PAR at the sea-surface and Z<sup>m</sup> is the mixed-layer depth (Platt et al., 1991; Cloern et al., 1995).

An in-situ bio-optical dataset of particulate organic carbon (POC), chlorophyll, and photosynthesis-irradiance parameters (Sathyendranath et al., 2009) was also used in this work. This dataset lacked information on PAR and MLD, which were filled in using the climatological data mentioned above.

## 3. EXACT SOLUTION FOR THE CHLOROPHYLL-TO-CARBON RATIO (θ) IN THE GEIDER ET AL. (1997) MODEL

According to Geider et al. (1997), the chlorophyll-to-carbon ratio, θ, is a function of irradiance I:

$$
\theta^2 = \theta\_m a \left( 1 - \exp\left(-\frac{\theta}{a}\right) \right),
\tag{3}
$$

where (θm) is a prescribed model parameter, corresponding to the maximum attainable value of θ. The above equation is equivalent to equation A12 in Geider et al. (1997), noting that there is a typographical error in the equation, such that the denominator of the argument to the exponential term should be a, and not α B I. For conditions of balanced growth, Geider et al. (1997) point out that their parameter kchl, which represents the maximum proportion of photosynthesis that can be directed to chlorophyll-a synthesis, would be equivalent to the parameter θm. We have applied the equivalence here, such that the solution would be valid only for balanced growth. The model development also assumes that the specific respiration rates of carbon and chlorophyll are either negligible or equal to each other.

We note that a = P C <sup>m</sup>/(α B I), where P C <sup>m</sup> is the carbon-specific, light saturated photosynthesis. By definition, P C <sup>m</sup> = P B <sup>m</sup>θ, such that a = P B <sup>m</sup>θ/(α B I). Substitution into Equation (3) gives:

$$\theta^2 = \theta\_m a \left( 1 - \exp\left( -\frac{\theta \alpha^B I}{P\_m^B \theta} \right) \right) . \tag{4}$$

Applying the equivalence I<sup>k</sup> = P B m/α<sup>B</sup> , we get

$$
\theta^2 = \theta\_m a \left( 1 - \exp\left(-\frac{I}{I\_k}\right) \right),
\tag{5}
$$

and setting I/I<sup>k</sup> = I∗, a dimensionless irradiance, the equation becomes

$$\theta^2 = \theta\_m \frac{\theta}{I\_\*} \left( 1 - \exp\left( -I\_\* \right) \right) \,. \tag{6}$$

Solution for θ is obtained by simplifying the equation above:

$$\theta = \frac{\theta\_m}{I\_\*} \left( 1 - \exp\left( -I\_\* \right) \right). \tag{7}$$

The solution expresses θ as a function I∗, such that the chlorophyll-to-carbon ratio can be calculated explicitly as a function of the dimensionless scaled irradiance (I∗). Note that the carbon-to-chlorophyll ratio χ = 1/θ. As I<sup>∗</sup> tends to zero, the exact solution (Equation 7) tends to θm. As I<sup>∗</sup> tends to infinity, the solution tends to zero. However, this limit for high values of I∗ is approached very slowly, well beyond reasonable values of I∗ that might be expected in the natural environment. The solution remains well-constrained for plausible values of I∗.

We note that the same solution is obtained when, instead of substituting P C <sup>m</sup> = P B <sup>m</sup>θ, we make the equivalent change of α<sup>B</sup> = α C /θ. The key to solution is consistency: both parameters have to be normalized to the same quantity, carbon or chlorophyll, it does not matter which. The solution is indifferent to the choice as (apart from θ) it contains only the dimensionless quantity I∗. However, ecosystem models are often formulated to use carbonnormalized P C <sup>m</sup> as input, along with α B , in which case, Equation 7 becomes (see also Li et al., 2010):

$$\theta = (\langle \theta\_m P\_m^C \rangle / \langle I \alpha^B \theta \rangle)(1 - \exp(\langle -I \alpha^B \theta \rangle / P\_m^C)). \tag{8}$$

In this context, θ can be retrieved from the above equation iteratively.

It is is also possible to calculate the sensitivity (relative) of θ to changes (relative) in I∗; and we find

$$\left| \left( \frac{d\theta}{\theta} \Big/ \frac{dI\_\*}{I\_\*} \right) \right| = \frac{\left( \exp(-I\_\*)(1+I\_\*) - 1 \right)}{\left( 1 - \exp(-I\_\*) \right)} \le 1,\tag{9}$$

such that the relative error in θ will not be greater than that in I∗.

### 3.1. The Approximate Solution

 

Geider et al. (1997) provided an approximate solution for θ using the first three terms of the Taylor expansion of exp (−θ/a):

$$
\theta^2 = \theta\_m a \left( 1 - 1 + \frac{\theta}{a} - \frac{\theta^2}{2a^2} \right) \,. \tag{10}
$$

For comparison with the exact solution (Equation 7), we can rearrange terms in the approximate solution, such that it is also expressed as a function of I∗. Following an initial simplification:

$$
\theta^2 = \theta\_m a \left(\frac{\theta}{a} - \frac{\theta^2}{2a^2}\right); \theta = \theta\_m \left(1 - \frac{\theta}{2a}\right). \tag{11}
$$

We can then substitute for a = P B <sup>m</sup>θ/(α B I) to find

$$
\theta = \theta\_m \left( 1 - \frac{I\_\*}{2} \right). \tag{12}
$$

Geider et al. (1997) noted that the approximation holds for only for I∗ < 1. This limitation is overcome by the analytic solution for θ (Equation 7), which is valid for all values of I∗.

The approximate solution (Equation 12) and the exact solution (Equation 7) are identical and equal to θ<sup>m</sup> as I<sup>∗</sup> tends to zero. But the approximate solution θ becomes zero when I<sup>∗</sup> = 2, and becomes negative for higher values. Hence the limitation with using the approximate solution for high values of I∗.

### 3.2. Effects of Nutrients and Temperature

We see from the exact solution (Equation 7) that θ depends on P B m through I<sup>k</sup> . In the Geider et al. (1998) model, effects of nutrient limitation and ambient temperature on P B <sup>m</sup> are accounted for, as follows:

$$P\_m^C = P\_{ref}^C \frac{N}{N + K\_N} f(T),\tag{13}$$

where P C ref is the maximum C-specific rate of photosynthesis at a reference temperature, T is the ambient temperature, f(T) is the Arrhenius function, N is the nitrate concentration and K<sup>N</sup> is the half saturation constant for nitrate uptake.

P B <sup>m</sup>, defined as P C <sup>m</sup> × θ, therefore contains implicitly the effects of temperature and nutrients on photosynthetic rates. Consequently, Equation 7 accounts for their effects on θ through P B <sup>m</sup>. Since P B <sup>m</sup> is more readily measured in the field than P C <sup>m</sup>, the new solution facilitates the study of C:Chl ratio in the natural environment.

### TABLE 1 | Definitions of symbols.


### 4. RESULTS

# 4.1. Comparison between Exact and Approximate Solutions

### 4.1.1. Theoretical Comparison

The approximate solution (Equation 12) and the exact solution (Equation 7) for 1/θ = χ are shown in **Figure 1** for three values of θm: 0.005, 0.01 and 0.02 (corresponding to carbonto-chl ratios of 200, 100, and 50). For low values of I∗ the exact and approximate solutions are practically indistinguishable from each other. But as I∗ approaches and exceeds 0.8, the deviation between them becomes significant. For I∗ close to 2.0 the approximate solution for θ tends to zero and the inverse of θ (the carbon-to-chlorophyll ratio, χ) tends to infinity, whereas the exact solution remains stable. **Figure 1A** shows that the absolute error is dependent on both θ<sup>m</sup> and I∗. However, the relative error (**Figure 1B**) is independent of θm. The approximation overestimates the carbon-to-chlorophyll ratio by around 15% when I<sup>∗</sup> = 0.8, by 50% at I<sup>∗</sup> = 1.235 and by 100% at I<sup>∗</sup> = 1.478.

### 4.1.2. A practical example

To see whether the differences between the exact and approximate solutions are likely to be significant under conditions encountered in the natural environment, we made some calculations at the global scale, using a combination of satellite and in situ data. The sequence of images in **Figure 2** shows the input data fields (daily mean irradiance at the surface, mixed-layer depth, photoadaptation parameter I<sup>k</sup> and chlorophyll-a concentration) and resultant daily mean irradiance in the mixed layer (Im) and I<sup>∗</sup> for May 2010, where in this instance I<sup>∗</sup> = Im/I<sup>k</sup> . Of the valid ocean pixels in **Figure 2F**), 70.3% in the Northern hemisphere (which at the time would be the hemisphere of greater phytoplankton growth due to the spring bloom) have I∗ values greater than 0.8, such that for these pixels the difference between the approximate and exact solutions would be greater than 15%. The error in the approximate solution is greater than 50% in some 24% of the Northern hemisphere pixels. During November a similar situation occurs in the Southern hemisphere, with I∗ values greater than 0.8 in 61.5% of pixels (results not shown).

This demonstrates that phytoplankton in the surface oceans are frequently exposed to conditions in which the difference between the approximate and exact solution for θ is significant, and worth accounting for.

### 4.2. Computation of Phytoplankton Carbon in the Ocean

In this section, we first impliment the analytic solution using the in situ bio-optical data to compute phytoplankton carbon at the observation points. Since it is known that θ<sup>m</sup> varies with phytoplankton type (Geider et al., 1997), we assigned values of θ<sup>m</sup> according to phytoplankton size classes. First, based on the work of Brewin et al. (2010), the chlorophyll-a concentration at each data point was used to estimate the proportions of the three phytoplankton size classes (micro-, nano- and pico-plankton) present in the sample. Next, based on the C:Chl ratios given in Sathyendranath et al. (2009) for different phytoplankton types sampled in the natural environment, θ<sup>m</sup> was set to 0.05, 0.02, 0.008 for micro-, nano- and pico-phytoplankton, corresponding to a minimum C:Chl ratio of 20, 50 and 125 for each size class. These values are consistent with θ<sup>m</sup> values reported by Geider et al. (1997) for various phytoplankton species in culture and also by Li et al. (2010) in the natural marine environment. The θ<sup>m</sup> for the populations was then computed as a weighted sum of the three components of the population. As θ<sup>m</sup> dictates a maximum Chl:C ratio, it also sets a minimum C:Chl ratio. The photosynthesis-irradiance parameters (P B <sup>m</sup> and α B ) in the database were then used to compute I<sup>k</sup> (in situ) and the daily average I<sup>∗</sup> for the mixed layer, given the daily average I<sup>m</sup> for the layer.

For each sample in the in situ dataset taken at a depth within the climatological mixed-layer depth (410 samples), we calculated the C:Chl ratio χ using I<sup>∗</sup> and θm, and then multiplied χ by the chlorophyll concentration measured in situ to estimate total phytoplankton carbon (Cp). **Figure 3** shows measured POC plotted against computed phytoplankton carbon (Cp). The model imposes no upper limit on the C:Chl ratio. Therefore, if the model parameters were incorrectly assigned, it could lead to many C<sup>p</sup> values being greater than the measured POC, which would clearly indicate an overestimation of phytoplankton carbon, since it should not exceed POC concentration. The C<sup>p</sup> estimated using the analytical solution and estimated θ<sup>m</sup> exceeds total POC in only 4 of the 410 points. Most of the Cp:POC ratios lie in the range of ≈10–70% with a mean of 31%, which is consistent with existing in situ measurements from the Atlantic and Pacific oceans (Martinez-Vicente et al.,

relative difference between the approximate and the exact solutions as a function of I \* .

2013; Graff et al., 2015), suggesting that θ values are not grossly underestimated either. The results using the approximate solution are significantly higher (I∗ > 0.8 and difference > 15%) in 130 of the 410 in situ measurements. The differences when using the in situ I<sup>k</sup> values were greater than when the calculations were performed using the province-based average I<sup>k</sup> values, demonstrating that sometimes, the errors from the approximate solution are reduced when using broadly-averaged fields of I<sup>k</sup> , since averaging eliminates extreme values.

As the calculations yielded plausible values of phytoplankton carbon when compared with measured POC values, we applied the method to the I∗ map and the satellite-derived chlorophyll field shown in **Figure 2** to produce global maps of C:Chl ratio and Cp. The results are compared with the approximate solution to the Geider et al. (1997) model and with the method of Sathyendranath et al. (2009) (see **Figure 4**), which implemented the equation C<sup>p</sup> = 64B 0.63, where B is Chlorophylla concentration (see their **Figure 1B**). As expected, the C:Chl ratios from the exact solution are lower than those from the approximate solution, with the largest differences occurring in regions of high I∗. The corresponding C<sup>p</sup> values are also lower for the exact solution. The distribution of C<sup>p</sup> values using the analytical solution appears more natural than those using the approximate solution, with fewer artificial boundaries present in the output fields.

The exact solution for C<sup>p</sup> is also closer (smaller mean absolutedifference) than the approximate one to the results from the empirical approach of Sathyendranath et al. (2009), but some of the similarities have to be attributed to the use of θ<sup>m</sup> values from Sathyendranath et al. (2009) in this work. Both the exact solution and the method of Sathyendranath et al. (2009) show the anticipated increase in C:Chl ratio toward the subtropical gyres (associated with the dominance of pico-plankton in these areas), although the magnitudes differ. Similarly, in both these examples, the C:Chl ratio decrease toward the Southern Ocean. The similarities in patterns are encouraging. However, the exact solution provides a lower range for the C:Chl ratio globally, when compared with the outputs from the method of Sathyendranath et al. (2009). This is to be expected as the averaging of I<sup>k</sup> by province and by season removes extreme values, as well as any small-scale variability that might otherwise be present in a dynamic assignment of I<sup>k</sup> . On the other hand, we recognize that the method of Sathyendranath et al. (2009) is purely empirical and was designed to provide something of an upper limit to the carbon-to-chlorophyll ratio, whereas the Geider et al. (1997) model has a strong mechanistic basis and is able to account for the effects of photo-acclimation on θ. Clearly, more work is required to reconcile the differences between the empirical and theoretical approaches.

# 4.3. Application in Marine Ecosystem Models

In addition to the remote-sensing applications demonstrated above, the Geider et al. (1997) model is also used extensively in marine ecosystem models (Laufkötter et al., 2015). But to estimate the impact that the exact solution might have on the calculated fields of carbon-to-chlorophyll ratio, we have to consider the time scales over which light is averaged, before carbon-to-chlorophyll ratio is computed in the models. For example, in the European Regional Seas Ecosystem Model (ERSEM), the instantaneous light field is used to compute θ at each time step of the model (Butenschön et al., 2016). The common time step for ERSEM is 15 min. But other models, such as the "Darwin" model developed at MIT, perform these calculations at longer time steps (Dutkiewicz et al., 2015). A model with a 24 h time step might use daily-averaged light fields. Calculations that use short time-steps would have a greater range in I∗ values, relative to those that use daily averages.

Values of I

\* around 0.8 or greater (yellow and warmer colors) will give a significant difference between the approximate and exact solutions for C:Chl.

An example of a calculation of θ done at a 2-h time-step is shown in **Figure 5**, where results are plotted for optical depths of zero (surface) to 4. Note that one optical depth is the depth at which light is reduced to 1/e of the initial value, and that only 1% of the surface value remains at an optical depth of 4.6. In this example, we used a fixed I<sup>k</sup> value of 50 Watts m−<sup>2</sup> , and a noontime maximum value of I at the surface of 400 W m−<sup>2</sup> , and set θ<sup>m</sup> = 0.01. The total daily irradiance was allowed to vary, over a 12-h day, as described by a sine function. At noon, I∗ values of 1.0 or greater occur even down to the first optical depth and the errors in the approximate solution are high in the surface waters for a large portion of the day. The value of irradiance averaged over 24 h at the optical depth of 1 (dashed lines shown for comparison) is well below the peak values seen at noon; and as expected, the difference between the exact and approximate solutions is reduced, though still significant (over 20%), for this case. Even in this instance, the errors would increase toward the surface, as average light increased. This is consistent with the findings of Moore et al. (2006) that for surface populations, the peak irradiance can be significantly higher than the measured I<sup>k</sup> .

# 5. DISCUSSION AND CONCLUSION

In this paper we have presented a new, exact solution for the Geider et al. (1997) model for estimating the C:Chl ratio in phytoplankton as a function of a dimensionless irradiance scaled to the photoadaptation parameter, I<sup>k</sup> . The result is directly applicable to remote-sensing and modeling of marine ecosystems, as demonstrated here, but finds further applications in modeling phytoplankton physiological properties, growth rates and stoichiometry (Sathyendranath et al., 2009; Dutkiewicz et al., 2015; Laufkötter et al., 2015). Using an in situ bio-optical database and the model, we have computed phytoplankton carbon, and shown that the derived ratios of phytoplankton carbon to POC were plausible.

The Geider et al. (1997) model was initially conceived to be implemented with P C <sup>m</sup> and α B as inputs. The work presented here provides a new exact solution to the model. The advantage of the solution is that it allows the Geider et al. (1997) model to be implemented in any instance where there are direct measurements or indirect estimates of I<sup>k</sup> . So the starting point for implementation of the new solution would be estimates of I<sup>k</sup> or

FIGURE 3 | Comparison of phytoplankton Carbon estimates using the approximate and exact solution with in situ Ik data from around 400 samples, mostly from the N.W Atlantic region. (A) Calculated Phytoplankton Carbon (θ ∗ B) in relation to POC measured for the BIO samples using the exact solution. Red, orange, yellow and green lines correspond to phytoplankton carbon equalling 100, 75, 50, and 25% of POC respectively. The θm values are calculated using an estimate of the community size structure calculated using the method of Brewin et al. (2010). (B,C) show the absolute and % difference between results from the exact and approximate solutions.

(2009) globally for May 2010. The I \* and Chl input fields can be seen in Figure 2.

due to values of I

P B <sup>m</sup> and α B . In this regard, the new solution takes the Geider et al. (1997) model in a new direction. However, in ecosystem models that are implemented with with P C <sup>m</sup> and α B as inputs, the value of θ can be found from the exact solution iteratively (note that Li et al., 2010 have also proposed a numerical solution). The extra computation required for an iterative solution would certainly

\* exceeding the limit of the Taylor expansion.

be worth the effort, especially for I∗ > 0.8, when errors in the approximate solution begin to be greater than 15% (**Figure 1**). Irradiance is a fundamental driver of phytoplankton growth, and phytoplankton employ a suite of strategies in response to the range of irradiance conditions in the global oceans. Some groups of cyanobacteria have genetically diversified into "high-light" and "low-light" variants (Moore et al., 1998) taking advantage of the stable irradiance conditionsin the central gyres. In more dynamic regions it is essential for algae to be able to respond to changes in the light environment. Here we have presented a refinement of the Geider et al. (1997) mechanistic model of carbon-tochlorophyll ratio allowing a smooth response in phytoplankton C:Chl ratios across a greater range of irradiance conditions. This allows a more accurate calculation of model results across a

Geider et al. (1997) give two solutions for the Chl:C ratio, both for balanced growth. One of them assumes that the chlorophylla losses due to respiration are zero (R <sup>B</sup> = 0) or that the chlorophyll-a specific degradation has the same dependence on specific growth rate as cellular carbon specific respiration (R <sup>B</sup> = R <sup>C</sup> = µξ , where µ is growth rate and ξ is the cost of biosynthesis). This is the option that has been pursued here, since it would be appropriate for use in models of gross primary production using photosynthesis-irradiance parameters that have already been corrected for respiration. If, instead, we were to use the model for the case where carbon respiration was not zero, an equivalent solution would exist, provided that a correction term were applied to θ<sup>m</sup> as suggested by Geider et al. (1997). But, given the uncertainties in θm, and given that the correction term is typically found to be small, we can assume that the model discussed here is sufficient to cover such conditions as well, under our current state of knowledge. A more pertinent question is at what time scales the condition of balanced growth might be met. In fact, acclimation from one light level to another will take place over a finite period, with Geider et al. (1986) and Raven and Geider (2003) suggesting that the

complete range of spatial and temporal scales.

appropriate time scale for acclimation is of the order of hours to days, implying that balanced growth would hold on daily time scales. Moore et al. (2003, 2006) have provided examples where photoacclimation timescales were longer than those for surface mixing, and Talmy et al. (2013) highlighted the importance of surface irradiance, depth of mixing, and light attenuation using a resource allocation based model of photoacclimation. It is also apparent that when numerical models are run at short time steps (less than an hour), it will be increasingly important to account in some manner for non-balanced growth during the transition phase.

The solution for C:Chl can produce both high C:Chl values, in line with those exceeding 300 observed in cultures (Cloern et al., 1995), and the low values (25–70) observed in ocean samples (Riemann et al., 1989). That said, a suitable θ<sup>m</sup> is essential to obtain the correct result. In the example presented here (**Figure 3**), a three-component model of phytoplankton size classes is used in the assignment of θm. Although this allows a dynamic estimation of θ<sup>m</sup> it is still derived from fixed values for each group. Refinements in the estimation of θ<sup>m</sup> would also result in improved estimates of the realized C:Chl values.

Our application of the model at large scales using remotesensing data (**Figure 2**) utilized average estimates of I<sup>k</sup> (by season and province), whereas in reality the values would be more variable. Dynamic assignment of parameters would lead to a greater range of I∗ values, increasing the potential for errors when using the approximate solution for θ. The concept of dynamic estimates of photosynthesis parameters using environmental variables, has been discussed by Platt and Sathyendranath (1993, 1995), Saux-Picart et al. (2013), and Silsbe et al. (2016).

The computed carbon-to-chlorophyll ratio depends strongly on available light. It raises the question of what would be the appropriate value of I to use in the calculations, given that phytoplankton experience changes in available light over a variety of time scales. These include changes at time scales of seconds, as the sun rises and sets and as clouds pass, to seasonal scale changes dictated by the Earth's declination. In addition, phytoplankton are at the mercy of vertical movement of the water column due to, for example, turbulence, internal waves or upwelling. But what would be the appropriate time scales for acclimation of carbon-to-chlorophyll ratio? As noted above, previous studies have indicated that it is of the order of 1 day. But further information on this point would be valuable. A related matter, from a modeling perspective is that the photosynthetic response of phytoplankton to available light is instantaneous. So it is clear that computation of photosynthesis within numerical ecosystem models has to be driven by instantaneous light. If, along with such calculations, we need light fields averaged over some yet-tobe-defined time scale for computation of θ, simulation models would have to be designed to keep track of at least two values of available light, to be used as required. This time scale would

### REFERENCES

Brewin, R. J. W., Sathyendranath, S., Hirata, T., Lavender, S. J., Barciela, R., and Hardman-Mountford, N. J. (2010). A three-component model of be related to that appropriate for balanced growth, as discussed above.

The Geider et al. (1997) model presented here is re-formulated as a function of I∗, which requires only the photosynthesis parameter I<sup>k</sup> for implementation, in addition to data on available light. Bearing in mind the body of data on photosynthesisirradiance parameters that exists, and the relative ease with which these parameters can be measured, compared with direct measurements of phytoplankton carbon in the field (see Casey et al., 2013; Graff et al., 2015), these results open up the possibility of significant augmentation of the information base on carbon-to-chlorophyll ratio in the marine environment. But when photosynthesis-irradiance parameters, available light and phytoplankton carbon are measured concurrently, we also have the possibility to estimate the parameter θm, about which we have so little information from the field.

# AUTHOR CONTRIBUTIONS

SS conceived the problem. TJ, SS, and TP worked jointly to find an analytical solution. TJ made all calculations and figures. The preparation of the manuscript was led by TJ with all authors contributing significantly to the final text.

## FUNDING

This work was funded through the European Space Agency's MAPPS (MArine primary Production: model Parameters from Space) project as part of the Support to Science Element (STSE) Pathfinders Program and the POCO (Pools of Carbon in the Ocean) Project, which is a part of the SEOM (Scientific Exploitation of Operational Missions) Programme. This work is also a contribution to activities of the National Centre for Earth Observation (NCEO), UK. Additional support from Simons Foundation through the CBIOMES project is acknowledged. Finally the contribution of The Jawaharlal Nehru Science Fellowship to TP in the course of this work is gratefully acknowledged.

### ACKNOWLEDGMENTS

We thank ESA Ocean Colour Climate Change Initiative and NASA for the open-access remote-sensing products used in this work. We would like to thank all those who contributed to the in situ database used in this study, to Elizabeth Goult for performing sensitivity analysis checks. We thank Frédéric Mélin and Nicolas Hoepffner for making available their photosynthesisirradiance parameter database for this work. We thank reviewers Christoph Voelker and Greg M. Silsbe for their helpful comments on the initial manuscript. We also thank Peter Regner and Diego Fernandez at ESA for their continuous support.

phytoplankton size class for the Atlantic Ocean. Ecol. Model. 221, 1472–1483. doi: 10.1016/j.ecolmodel.2010.02.014

Butenschön, M., Clark, J., Aldridge, J. N., Allen, J. I., Artioli, Y., Blackford, J., et al. (2016). Ersem 15.06: a generic model for marine biogeochemistry and the ecosystem dynamics of the lower trophic levels. Geosci. Model Develop. 9, 1293–1339. doi: 10.5194/gmd-9-1293-2016


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Jackson, Sathyendranath and Platt. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Characterizing Spatial Variability of Ice Algal Chlorophyll a and Net Primary Production between Sea Ice Habitats Using Horizontal Profiling Platforms

### Edited by:

Victor Martinez-Vicente, Plymouth Marine Laboratory, United Kingdom

### Reviewed by:

Jaume Piera, Institut de Ciències del Mar (CSIC), Spain Thomas Jackson, Plymouth Marine Laboratory, United Kingdom Karley Lynn Campbell, University of Manitoba, Canada

\*Correspondence: Benjamin A. Lange benjamin.lange@dfo-mpo.gc.ca

### † Present Address:

Benjamin A. Lange, Fisheries and Oceans Canada, Freshwater Institute, Winnipeg, MB, Canada

### Specialty section:

This article was submitted to Ocean Observation, a section of the journal Frontiers in Marine Science

Received: 23 February 2017 Accepted: 18 October 2017 Published: 08 November 2017

### Citation:

Lange BA, Katlein C, Castellani G, Fernández-Méndez M, Nicolaus M, Peeken I and Flores H (2017) Characterizing Spatial Variability of Ice Algal Chlorophyll a and Net Primary Production between Sea Ice Habitats Using Horizontal Profiling Platforms. Front. Mar. Sci. 4:349. doi: 10.3389/fmars.2017.00349 Benjamin A. Lange1, 2 \* † , Christian Katlein<sup>1</sup> , Giulia Castellani <sup>1</sup> , Mar Fernández-Méndez 1, 3 , Marcel Nicolaus <sup>1</sup> , Ilka Peeken1, 4 and Hauke Flores 1, 2

<sup>1</sup> Alfred-Wegener-Institute Helmholtz Center for Polar and Marine Research, Bremerhaven, Germany, <sup>2</sup> Centre for Natural History (CeNak), Zoological Museum, University of Hamburg, Hamburg, Germany, <sup>3</sup> Norwegian Polar Institute, Fram Centre, Tromsø, Norway, <sup>4</sup> MARUM, Center for Marine Environmental Sciences, University of Bremen, Bremen, Germany

Assessing the role of sea ice algal biomass and primary production for polar ecosystems remains challenging due to the strong spatio-temporal variability of sea ice algae. Therefore, the spatial representativeness of sea ice algal biomass and primary production sampling remains a key issue in large-scale models and climate change predictions of polar ecosystems. To address this issue, we presented two novel approaches to up-scale ice algal chl a biomass and net primary production (NPP) estimates based on profiles covering distances of 100 to 1,000 s of meters. This was accomplished by combining ice core-based methods with horizontal under-ice spectral radiation profiling conducted in the central Arctic Ocean during summer 2012. We conducted a multi-scale comparison of ice-core based ice algal chl a biomass with two profiling platforms: a remotely operated vehicle and surface and under ice trawl (SUIT). NPP estimates were compared between ice cores and remotely operated vehicle surveys. Our results showed that ice core-based estimates of ice algal chl a biomass and NPP do not representatively capture the spatial variability compared to the remotely operated vehicle-based estimates, implying considerable uncertainties for pan-Arctic estimates based on ice core observations alone. Grouping sea ice cores based on region or ice type improved the representativeness. With only a small sample size, however, a high risk of obtaining non-representative estimates remains. Sea ice algal chl a biomass estimates based on the dominant ice class alone showed a better agreement between ice core and remotely operated vehicle estimates. Grouping ice core measurements yielded no improvement in NPP estimates, highlighting the importance of accounting for the spatial variability of both the chl a biomass and bottom-ice light in order to get representative estimates. Profile-based measurements of ice algae chl a biomass identified sea ice ridges as an underappreciated component of the Arctic ecosystem because chl a biomass was significantly greater in this unique habitat. Sea ice ridges are not easily captured with ice coring methods and thus require more attention in future studies. Based on our results, we provide recommendations for designing an efficient and effective sea ice algal sampling program for the summer season.

Keywords: ice algae, ice core, chl a, remotely operated vehicle, surface and under-ice trawl, net primary production, spectral irradiance, bio-optics

### INTRODUCTION

There is mounting evidence for an overall increase in Arctic-wide net primary production (NPP) as a result of the declining sea ice cover and increasing duration of the phytoplankton growth season (Arrigo and van Dijken, 2011, 2015; Fernández-Méndez et al., 2015). However, it remains uncertain how sea ice algae NPP will respond to continued changes of the sea ice environment. It has been suggested that a thinning Arctic sea ice cover, which will lead to increased light transmittance, will also result in increased sea ice algal NPP rates due to more available photosynthetically active radiation (PAR; Nicolaus et al., 2012; Fernández-Méndez et al., 2015). On the other hand, some forecasts predict increased snow precipitation in the Arctic (IPCC, 2013), which would result in less available light for bottom-ice algal growth during spring. Other than available light, other variables may have an equal or greater influence on Arctic primary production depending on region and season. Such variables include nutrient supply, temperature, and CO<sup>2</sup> intake (Tremblay et al., 2015). Declining sea ice may increase oceanic CO<sup>2</sup> intake, which would result in increased NPP, but could be counteracted by increased runoff and higher temperatures expected throughout the Arctic (Tremblay et al., 2015).

In the central Arctic Ocean sea-ice algae has been documented to contribute up to 60% of the NPP during summer (Gosselin et al., 1997; Fernández-Méndez et al., 2015). However, net sympagic (ice-associated) primary production is relatively low accounting for 1–10% of total NPP in the Arctic Ocean (Dupont, 2012; Arrigo and van Dijken, 2015). Regardless of the overall low contribution of sympagic NPP, both sympagic and pelagic organisms showed a high dependency on ice-algae produced carbon within the central Arctic Ocean (Budge et al., 2008; Wang et al., 2015; Kohlbach et al., 2016, 2017). The key role of sea ice algae in Arctic foodwebs, particularly in terms of reproduction and growth of key Arctic organisms, such as: Calanus glacialis (Michel et al., 1996; Søreide et al., 2010), highlights the importance of timing and duration of ice algal growth, and the availability of algal biomass throughout different times of the year.

Spatial variability of springtime ice algal chl a biomass has been related to the distribution of snow on first-year sea ice (FYI), due to the large influence of snow on light transmission by the reflection and scattering of light near the surface. This relationship explains the similar patch sizes observed for snow and sea ice algae biomass on the same study sites. Between study sites, however, patch sizes had a large range between 5 and 90 m, which was the result of differences in the snow distribution and drifting patterns over relatively level FYI (Gosselin et al., 1986; Rysgaard et al., 2001; Granskog et al., 2005; Søgaard et al., 2010). In contrast, the undulating surface topography of MYI plays an important role in the distribution of snow, which has been linked to the presence of high ice algal chl a biomass at the bottom of thick MYI hummocks with little or no snow cover (Lange et al., 2015, 2017). Gradinger et al. (2010) identified sea ice ridges as important accumulation regions of sea ice fauna during advanced melt. This further highlights the ecological importance of thick sea ice features. Using traditional coring methods, however, it is very difficult to sample ridges and hummocks resulting in sparse observations for ice algae at the bottom or within these features.

In summer when the snow is melted and melt ponds are present, light availability has a less important role in controlling the distribution of ice algal chl a biomass. This is due to increased melt induced algal losses during late-spring and early-summer, which becomes the limiting factor controlling the ability of algal communities to remain in the bottom-ice environment (Grossi et al., 1987; Lavoie et al., 2005). The spatial distribution of ice algal chl a biomass during mid- to late-summer, however, remains poorly understood and under-sampled, particularly in the central Arctic Ocean (Wassmann et al., 2011; Miller et al., 2015).

The high spatial and temporal variability of sea ice algae, in addition to sparse sampling, results in poorly constrained sea ice algal chl a biomass and PP estimates for the central Arctic Ocean (Miller et al., 2015). Large-scale estimates of sea ice algal chl a biomass and PP are limited to modeling studies as satellites are unable to observe the underside of sea ice. Lee et al. (2015) demonstrated that pelagic phytoplankton PP models for the Arctic Ocean were highly sensitive to uncertainties in chlorophyll a (chl a) and performed best with in situ chl a data. In situ ice algal chl a estimates used in models, however, are typically based on a small number of ice core observations (e.g., Fernández-Méndez et al., 2015). A recent study comparing ice core chl a biomass to sea ice algal chl a biomass derived from an 85 m ROV transect of under-ice spectral radiation measurements showed large differences, which could carry high uncertainties for largescale estimates based on these ice core data alone (Lange et al., 2016).

Miller et al. (2015) reviewed the different methods for PP measurements with spatial sampling resolution on the order of 0.01 m for ice coring-based in vitro incubations (e.g., Gosselin et al., 1997; Gradinger, 2009; Fernández-Méndez et al., 2015) or in situ incubations (e.g., Mock and Gradinger, 1999; Gradinger, 2009). At larger scales the under-ice eddy covariance method integrates primary production over an area of 100 m<sup>2</sup> (Long et al., 2012). Thus there is a large gap in spatial coverage between the 0.01 to 100 m<sup>2</sup> scales, which is not resolved by these methods. It is within this spatial range that many sea ice and snow properties (such as thickness, porosity, temperature) can vary, which can have a large influence on light availability, ice melt and growth, nutrient availability, and therefore, the spatial distribution of ice algae. Typical patch sizes of snow have been reported in the range 20–25 m (Gosselin et al., 1986; Steffens et al., 2006). Surface properties such as albedo have patch sizes of ∼10 m (Perovich et al., 1998; Katlein et al., 2015a) and sea ice draft can vary at scales of around 15 m (Katlein et al., 2015a).

Here we present a novel approach to fill this important gap in the spatial scales of ice algal chl a biomass and NPP estimates by combining in vitro photosynthetic parameters of ice algae with chl a biomass derived from under-ice spectral radiation measurements and under-ice available PAR measurements obtained from a moving under-ice profiling platform, the ROV. Furthermore, we investigate the spatial patterns of chl a biomass and NPP estimates, using two under-ice profiling platforms: the ROV and Surface and Under Ice Trawl (SUIT), with special emphasis on sea ice ridges, and evaluate potential discrepancies between the up-scaled and ice core-based estimates. Based on our results, we provide recommendations for designing an efficient and effective sea ice algal sampling program for the summer season.

# MATERIALS AND METHODS

### The Profiling Platforms

All surveys were conducted during the RV Polarstern expedition PS80 to the central Arctic Ocean in August and September 2012. Under-ice profiling platform surveys were conducted using an under-ice Remotely Operated Vehicle (ROV) V8Sii-ROV (Ocean Modules, Åtvidaberg, Sweden) and a SUIT (van Franeker et al., 2009), with mounted sensor arrays, described in Nicolaus and Katlein (2013), David et al. (2015), and Lange et al. (2016). Simplified diagrams and images showing the deployment of the under-ice profiling platforms were presented in Lange et al. (2016). The ROV is an under-water vehicle with mounted sensor array deployed through a small 2 × 2 m man made hole in the sea ice, and is attached by a 300 m long fiber optic cable. The ROV is controlled remotely from a sheltered base station (e.g., tent) located adjacent to the deployment hole. A detailed description of the ROV spectral measurements, calibration and calculations, and ROV operation was provided by Katlein et al. (2015b) and Nicolaus and Katlein (2013). The V8ii ROV was equipped with an altimeter (DST Micron Echosounder, Tritech, UK), a sonar (Micron DST MK2, Tritech, UK), one zoomcamera (Typhoon, Tritech, UK), and one fixed focal length camera (Ospray, Tritech, UK). The SUIT is a net developed for deployment in ice covered waters, typically behind an icebreaker, for sampling sea ice associated zooplankton and micronekton in the upper 2 m of the water within the ice-water interface. During this cruise the sensor array was specifically enhanced to measure the variability of sea ice algae chl a biomass within the sea ice and sea ice habitat properties along the SUIT hauls. The new sensor package included an Aquadopp Acoustic Doppler Current Profiler (ADCP; Nortek AS, Rud, Norway), a Conductivity Temperature Depth probe (CTD; Sea and Sun Technology, Trappenkamp, Germany) with a built-in Cyclops 7 fluorometer (Turner Designs, Sunnyvale, CA, USA), an PA500/6S altimeter (Tritech International Ltd., Aberdeen, UK), one RAMSES-ACC irradiance sensor (Trios, GmbH, Rastede, Germany), one RAMSES-ARC radiance sensor (Trios GmbH, Rastede, Germany) and a forward-looking video camera (GoPro Hero 2).

The ROV spectral surveys were conducted during seven ice stations (**Table 1**; **Figure 1**). The SUIT spectral surveys were conducted at 6 stations (**Table 1**; **Figure 1**). Stations conducted in relatively close proximity (<50 km) to each other were grouped into similar locations represented by the letters A to I (**Figure 1**). Two profiles separated by small distances were sampled using the SUIT (<10 km) at location B, and using the ROV (<500 m) at locations C and D. Incoming solar radiation observations were measured on-ice for ROV-based spectral measurements, and from a ship-mounted sensor for SUIT-based spectral measurements. To ensure high quality spectra, data were limited to observations at a distance to the ice-bottom of ≤1 m and with a pitch and roll between −10◦ and 10◦ , as suggested by Nicolaus and Katlein (2013) and Katlein et al. (2016). Reducing the pitch and roll, and distance to ice bottom also reduced the potential influence of spectral absorption by the water. Since the SUIT behaves less predictable near ridges (e.g., it hits the ridge and is redirected in an unpredictable direction), we manually inspected the spectra to identify reliable spectral measurements at sea ice ridges (e.g., noisy spectra). Less than 1% of the spectra were excluded from analyses.

Sea ice draft was calculated based on sensor measurements of depth and distance to ice bottom, and corrected for pitch and roll angles as described in Lange et al. (2016) and David et al. (2015). Sea ice ridges were identified from the SUIT ice draft profiles using the Rayleigh criteria, following procedures described by Rabenstein et al. (2010) and Castellani et al. (2014) for the sea ice surface topography, and Castellani et al. (2015) for the sea ice bottom profile. Ice draft local minima (thicker sea ice draft values are more negative) identified along the SUIT profiles with a threshold of 1.5 m deeper than the surrounding ice, following Castellani et al. (2015), were selected as potential ridges. Adjacent minima needed a separation distance between points which was less than half the depth of the first minima in order to be identified as two single elements not belonging to the same ridge. Ridge depth and width were measured in order to calculate ridge density (ridges km−<sup>1</sup> ) and percent coverage of ridges. Here, ridge depth was calculated as the width at half maximum. During one SUIT haul (station 358, location H) there were no altimeter measurements. Because the SUIT generally travels directly under the ice, the depth measurements can be used to reliably (R 2 = 0.78) derive level ice draft using a simple linear model (David et al., 2015). We could calculate ridge density, ridge coverage, and ridge width from these ice draft measurements without altimeter data. The absolute draft values at ridges, however, were less accurate and therefore excluded from analysis.

All profiling platform-derived observations (i.e., transmittance, sea ice algal chl a, NPP, draft) were divided into 5 ice classes based on the sea ice draft values in the following ranges: (1) 0–0.5 m; (2) 0.5–1.0 m; (3) 1.0–1.5 m; (4) 1.5–2.0 m; and (5) >2.0 m. Furthermore, we separated profiling


TABLE 1 | Summary of downwelling surface and bottom-ice light, chlorophyll a biomass (chl a), net primary production (NPP), and explained variance of NPP per location (shown in Figure 1) and sampling method (gear): ice cores (FM or LA), remotely operated vehicle (ROV) and surface and under-ice trawl (SUIT).

a "FM" corresponds to FM-cores from Fernández-Méndez et al. (2015); "LA" correspond to LA-cores from Lange et al. (2016); "ROV" correspond to the up-scaled remotely operated vehicle estimates; and "SUIT" correspond to the up-scaled surface and under-ice trawl estimates.

<sup>b</sup>Downwelling surface PAR and bottom ice scalar PAR (I) are presented as mean ± sd to maintain consistency with Fernández-Méndez et al. (2015).

<sup>c</sup>Chl a and NPP are presented as median (interquartile range).

•Correspond to FM-cores not representative of the corresponding up-scaled ROV estimates for that location, i.e., FM-core estimate outside the interquartile range of ROV estimates. (+) indicates over-estimate; (−) under-estimate of the FM cores compared to up-scaled ROV estimates.

\*Represents significant difference between the CORES (FM and LA cores combined) and the up-scaled ROV estimates.

platform-derived observations into level ice and ridged ice. This was done by manually identifying all observations acquired under the identified ridges. We identified dominant ice classes for each location using the modal ice thickness (converted to draft by multiplying by 0.9) from electromagnetic induction sounding ice thickness surveys, using an EM31 instrument, of the entire floe (data presented in Boetius et al., 2013; Fernández-Méndez et al., 2015; Katlein et al., 2015b). We used these larger scale ice thickness surveys to assign the dominant ice class because these surveys were conducted specifically for the purpose of assessing the distribution of ice thickness at the ice floe. Ground-based EM surveys are a common method to representatively capture the spatial variability of ice thickness on floe scales (Haas et al., 1997; Haas, 2004).

extent correspond to the 2012 September monthly mean (extent data acquired from NSIDC, (Fetterer et al., 2002, updated 2011).

## Sea Ice Algal Chl a Biomass Estimates Derived from Under-Ice Spectral Radiation

Ice algal chl a biomass estimates were derived from underice profiling platform-based spectral transmittance observations using empirical orthogonal function (EOF) analysis combined with generalized linear models (GLM), as described in Lange et al. (2016). EOF analyses reduce the dimensionality of the data while maintaining the variability of key spectral absorption properties, which can then be used to relate to chl a concentrations or other environmental variables. GLMs were fitted using ice core chl a concentrations as a response variable and EOF modes as predictor variables. All ice cores were extracted along ROV spectral radiation profiles. The best set of EOF modes used as predictor variables was selected by searching all possible combinations of EOF modes and using the Bayesian Information Criterion (BIC) to assess the quality of the GLM. The EOFs used represented the spectral variability that can best be explained by the variability within the ice algal chl a biomass. Furthermore, EOF analyses captured variability within multiple regions of the PAR light spectrum (400–700 nm) where chl a light absorption occurs. In addition, we used mean robustness R 2 and true prediction error estimates as ranking criteria to find the best predictive model for our data set. Each model was applied to 5 data subsets not used to fit the model then we determined the predicted vs. observed R 2 -value for each data subset then took the mean R 2 -value as the mean robustness R 2 . To determine the true prediction error estimate we used 10-Fold Cross-Validation (10 FCV). In 10 FCV, data are randomly separated into 10 data subsets then model fitting and error estimation are repeated 10 times. Each time the model is fitted to 9-folds then applied to the 10th-fold. This is repeated 100 times and the mean of all root mean square error (RMSE) values is used as the true prediction error estimate. Based on these criteria we determined that the combination of spectral transmittance, calculated according to Nicolaus et al. (2010), and the EOF approach resulted in the most reliable predictive model (EOF-Transmittance) with a predicted vs. observed chl a R<sup>2</sup> of 0.90, and a true prediction error estimate (10-fold cross validated root mean squared error, RMSECV), of 1.8 mg chl a m−<sup>2</sup> (model M15 from Lange et al., 2016). In addition, the selected predictive model showed good agreement between chl a estimates derived from independent spectral data (spectra not used to fit the model) and ice core chl a concentrations, which were all extracted along the ROV profiles.

# ROV Data Re-Sampling

We resampled the ROV chl a, ice draft and transmittance observations in order to account for potential spatial sampling biases (e.g., multiple or overlapping measurements at the same location; **Figure 2**), and variable footprint size of the under-ice ROV spectral measurements. Data were resampled to a grid (x, y) of equally spaced 1 m diameter circles (grid circles; **Figure 2**). A grid of circles was created for the ROV measurements (ROV circles) with each circle's center location determined by the measurement location (x, y) and the diameter determined by the footprint of the measurement (i.e. distance to ice bottom multiplied by 2, as described in Lange et al. (2016). For each grid circle with only one overlapping ROV circle, which had an overlapping area ≥0.2 m<sup>2</sup> (25% of the 1 m circle), the corresponding ROV-based transmittance and chl a were assigned to that grid circle. For each grid circle that had more than one overlapping ROV circle, of which at least one ROV circle had an overlapping area ≥0.2 m<sup>2</sup> , weighted means of the corresponding ROV-based transmittance, draft and chl a were assigned to the grid circle. Weighting factors were calculated for all overlapping ROV circles in each grid as the overlapping area of each ROV circle with the corresponding

grid circle relative to the total ROV circle area. **Figure 2** shows a detailed diagram outlining the resampling process with an example calculation. SUIT data were not re-sampled because they represent a straight linear profile and therefore the measurements have no possibility to have overlapping footprints for the same regions.

# ROV-Derived Net Primary Production Estimates

All NPP estimates were calculated based on the re-sampled ROV observations of chl a and transmittance. Up-scaled daily ice algal NPP estimates, P (mg C m−<sup>2</sup> d −1 ), were calculated using the photosynthesis equation from (Platt et al., 1980):

$$P = \int\_{t} \left[ \left( P\_s^B \left[ I - e^{-\boldsymbol{\alpha}^B I\_t / P\_s^B} \right] e^{-\boldsymbol{\beta}^B I\_t} / P\_s^B \right) \mathbf{B} \right] \right.$$

where P B s is the chl a-normalized maximum fixation rate with no photoinhibition (mg C [mg chl a] <sup>−</sup><sup>1</sup> h −1 ); α B is the initial slope of the saturation curve (mg C [mg chl a] <sup>−</sup><sup>1</sup> h −1 [µmol photons m<sup>2</sup> s −1 ] −1 ); and β B is strength of photoinhibition (same units as α). P B s , α B , and β B correspond to the photosynthetic parameters determined by Fernández-Méndez et al. (2015) using the <sup>14</sup>C method and incubating for 12 h, based on ice core samples collected from the same seven ice stations. Derivation of the photosynthetic parameters was conducted for upper-half and lower-half portions (mean: 0.58 m; range: 0.40–0.98 m) of the sea ice melted at 4◦C in the dark for 24 h. NPP estimates were only calculated and compared for the bottom ice portions because previous in situ incubations studies demonstrated bottomice had the highest primary production rates, despite lower irradiance levels (Mock and Gradinger, 1999). Furthermore, because sea ice algal chl a biomass typically accumulates in the bottom-ice portion it is safe to assume a large majority of the primary production also occurred in the bottom-ice. Accordingly, we used only chl a biomass estimates for the lower portion, where 75% of the total chl a biomass was observed Fernández-Méndez et al. (2015). ROV-based chl a correspond to the total chl a biomass within the entire ice column, therefore we multiplied by 0.75 to get the appropriate fraction of the total chl a in the bottom ice portion. B represents the bottom-ice algae chl a concentrations derived from ROV-based spectral transmittance measurements. I<sup>t</sup> is the hourly-averaged transmitted PAR (µmol photons m<sup>2</sup> s −1 ) at the ice-water interface, converted to bottom-ice scalar irradiance according to Katlein et al. (2014), and calculated for each hour (t) over a 24 h period (t =1, 2, . . . 24) by multiplying the ROV spectral (PAR) transmittance by hourlyaveraged (t) incoming PAR (µmol photons m<sup>2</sup> s −1 ) measured during each ice station.

### Statistical Analyses

All statistical analyses were conducted using R software Version 2.15.2 with all relevant packages (R-Development-Core-Team, 2012) listed after the corresponding analysis description.

Ice core chl a data used for comparison were presented in Fernández-Méndez et al. (2015), hereafter referred to as "FM" cores (1 core per station), and were melted in filtered sea water. Since the FM-cores were used to characterize the NPP for each ice station (Fernández-Méndez et al., 2015) we assessed the representativeness of the single cores compared to the up-scaled ROV surveys of chl a biomass and NPP. NPP was not measured on the LM-cores, thus we only compared NPP estimates for FMcores with the ROV estimates, which had both chl a biomass and under-ice light measurements. FM-cores were considered representative of the area if they were within the interquartile range (IQR; 25–75 percentiles) of the up-scaled ROV and SUIT estimates.

Cores from Lange et al. (2016), hereafter referred to as "LA" cores (4–12 cores per station) were directly melted. For comparisons of chl a biomass between ice core and ROVderived estimates and between level ice and ridged ice, FM and LA cores were grouped together, referred to as CORES. The significance of differences between these groupings was assessed using the non-parametric Wilcoxon rank sum test (Wilcoxon, 1945). We used a non-parametric test because the assumption of normality required for parametric tests (e.g., t-test) could not be achieved for the entire datasets using common data transformation methods (e.g., log, square root, squared, cuberoot).

The relative importance of each variable (B and It), in terms of explaining the variance of NPP for each ROV station, was assessed using the coefficient of determination (R 2 ) for all upscaled NPP estimates (Pt) vs. chl a (B) estimates (i.e., explained variance due to chl a), and NPP estimates (Pt) vs. bottom-ice light (It) observations (i.e., explained variance due to light). The R 2 was calculated for each hour (t) of the 24 h period to capture the diurnal variability of light conditions. Values provided in **Table 1** correspond to the daily mean R 2 .

### Spatial Autocorrelation Analyses

Spatial autocorrelation was used to investigate the horizontal patchiness of sea ice draft, transmittance, chl a biomass and NPP measured at the seven ice stations (**Table 1**). Autocorrelation was estimated using Moran's I (Moran, 1950; Legendre and Fortin, 1989; Legendre and Legendre, 1998), which was calculated for each of the eight sites at equally spaced (3 m) distance classes. Individual autocorrelation coefficients or Moran's I estimates were plotted for each distance class in the form of a spatial correlogram (Legendre and Fortin, 1989; Legendre and Legendre, 1998). These analyses were conducted using the "R" software function correlog from the "pgirmess" package. Autocorrelation coefficients for each distance class were assigned a two-sided p-value following methods in Legendre and Fortin (1989) and Legendre and Legendre (1998). Global significance was determined on the correlogram using the Bonferroni-corrected significance level. The presence of spatial autocorrelation (i.e., spatial patterns or patchiness) was determined if the correlogram was globally significant at p < 0.05. We identified the first xintercept of globally significant correlogram lines as the patch size (P) of the variables (Legendre and Fortin, 1989; Legendre and Legendre, 1998). Here, patches were identified for sea ice draft (Pd), transmittance (Pt), chl a biomass (Pc), and NPP (Pp). This methodology is consistent with spatial autocorrelation analyses used in other snow and sea ice studies to identify patch sizes of both biological and physical variables (e.g., Gosselin et al., 1986; Rysgaard et al., 2001; Granskog et al., 2005; Søgaard et al., 2010).

We classified the correlograms according to correlogram curve patterns described in Legendre and Legendre (1998): (i) multiple-bumps; (ii) wave-like structure; (iii) single bump; (iv) gradient; (v) step; or (vi) random. Because we do not have fully gridded data, it is difficult to differentiate between i) vs. ii), or iv) vs. (v), as the correlograms are very similar. Therefore, we combined these pattern types together resulting in four categories (1) multi-bump/wave; (2) one-bump; (3) gradient/step; and (4) random/noisy. Interpretations of the correlograms together with the xy gridded maps allowed for more detailed interpretation of the patterns (Legendre and Legendre, 1998). Patches or regions of high chl a biomass, high transmittance, thick draft, and high NPP were identified manually by visually inspecting the gridded maps. The identified patches were compared between variables to identify coincident patches for different variables.

# RESULTS

# Sea Ice Algal Chl a Biomass Estimates

The median chl a concentrations were generally low (<3.0 mg m−<sup>2</sup> ) at sampling locations A-H, irrespective of the method used (**Table 1**). Only at location I, median chl a concentrations were above 4 mg m−<sup>2</sup> for ice core and ROV estimates (**Table 1**). The range of chl a concentrations observed, however, appeared to be greater at locations G to I compared to locations A to F (**Figure 3A**).

At 5 of the 7 locations sampled for ice cores and ROV measurements, (B-D, F-G), sea ice cores had significantly lower chl a biomass than ROV estimates (Wilcoxon test, p < 0.05). No significant differences were observed at locations H and I (Wilcoxon test, p > 0.05; **Table 1**; **Figure 3A**). On average, ice core-based estimates of chl a concentration were 63% of the ROV-based estimates from the same sampling sites. The range was 13–62% for locations B to H, however, location I was substantially larger at 182%. Excluding location H results in a mean underestimation of core based estimates of 43% compared to ROV based estimates. There was no significant difference between integrated estimates of sea ice chl a concentrations of ROV and nearby SUIT profiles (Wilcoxon test, p < 0.05).

FM-cores were not representative (i.e., within the IQR) of the ROV-derived chl a biomass estimates at all locations, except

location B (**Table 1**). FM-cores at location C, H, and I, overestimated chl a biomass compared to the ROV-derived estimates (**Table 1**). At locations D, E, F, and G the FM-cores underestimated chl a biomass compared to ROV-derived estimates (**Table 1**). When chl a estimates were combined by FYI and MYI stations for each sampling method, mean FM-core chl a estimates were considerably lower than spatially integrated ROV- and SUIT-based estimates, but these differences were not significant due to the large variability of the datasets (Wilcoxon test, p > 0.05; **Table 2**). Regardless of the sampling method, MYI stations had consistently higher chl a concentrations and lower PP rates than FYI stations (**Table 2**).

All gridded ROV surveys of chl a, sea ice draft, transmittance and NPP are shown in Figures S1–S8. SUIT profiles of chl a, sea ice draft, and identified ridges are shown in Figures S9–S16.

### ROV-Derived Sea Ice Algal NPP

We accounted for the spatial variability of NPP by combining the variability of both chl a and bottom-ice light in the calculations of the larger-scale NPP estimates. All gridded ROV surveys of NPP are shown in Figures S1–S8. We then determined the explained variance of NPP by each variable individually. At locations B, C, F, G, and I, the spatial variability of bottom-ice light explained most of the spatial variability of the up-scaled NPP estimates, whereas at locations D and H, chl a explained most of the spatial variability of NPP (**Table 1**; **Figure 4**).

The largest diurnal variabilities of light levels and explained variances were observed at locations with the highest mean bottom-ice light levels (**Table 1**; **Figure 4**). At all stations, the explained variance of chl a was inversely related to light, which is expected since NPP is a function of both variables and chl a estimates were constant over the diurnal cycle while only light varied. The inter-location differences regarding which variable (chl a or light) explained most of the variance in NPP cannot be stated for certain as we observed no significant correlations between the explained variance for each station and any other station variable (e.g., nutrient concentration, median and IQR chl a or bottom-ice light).

FM-core NPP estimates were representative (i.e., within the IQR) of the up-scaled estimates at station group B and one ROV survey at station group C (**Table 1**; **Figure 3B**). FM-cores under-estimated NPP at station groups C, D, F, and G, and over-estimated NPP at station groups H and I compared to the up-scaled ROV-based NPP estimates (**Table 1**; **Figure 3B**). The differences between methods were likely the result of differences in chl a and/or light. Location B had similar chl a biomass and NPP for both the FM-core and up-scaled estimates (**Table 1**; **Figures 3A,B**). Station groups D, F, and G had higher up-scaled chl a biomass and NPP estimates compared to FM-core estimates (**Table 1**; **Figures 3A,B**). Conversely, station groups H and I had lower up-scaled chl a biomass and NPP estimates compared to FM-core estimates (**Table 1**; **Figures 3A,B**). Only station group C had higher chl a biomass but lower NPP estimates for


TABLE 2 | Ice algal chlorophyll a biomass and NPP summarized for sampling gears into MYI and FYI. Means, range (min–max), and sample size [N] are provided for comparison to values presented in Fernández-Méndez et al. (2015).

the FM-cores compared to the up-scaled estimates (**Table 1**; **Figures 3A,B**). Furthermore, light levels were comparable (237a) or slightly higher (237b) for the FM-core derived NPP estimates compared to the ROV surveys (**Table 1**; **Figures 3A,B**). When FM-cores and the up-scaled NPP estimates were pooled into FYI and MYI stations, we observed no significant differences between the methods (Wilcoxon test, p > 0.05; **Table 2**). The median and IQR-values had large differences between sampling methods for the MYI stations but the mean values were similar (**Table 2**).

# Sea Ice Algal Chl a Biomass and NPP in Relation to Sea Ice Properties

### Sea Ice Classes

Chl a biomass and NPP estimates were divided into the five different ice classes. The values showed large variability between ice classes and locations, and within ice classes and locations (**Figure 5**). ROV-derived chl a biomass estimates at locations B and I were highest in the thickest sea ice class (2.0 m +; **Figure 5A**). Locations B and C had high ROV-derived chl a biomass in the thinnest ice class (0.0–0.5 m; **Figure 5A**). The three middle ice classes generally had uniform ROV-derived biomass estimates, with the exception of location H which had the highest ROV-derived chl a biomass in the 1.5–2.0 m ice class (**Figure 5A**). The SUIT-derived estimates were very low at location B for all ice classes and highly variable within the ice classes for all other stations with no obvious patterns (**Figure 5C**). In general, at each location ROV-derived NPP estimates showed a decreasing trend with increasing range of ice class thickness values (**Figure 5B**).

The dominant ice class surveyed by the ROV was identified by the modal sea ice draft of ice floes based on EM31 measurements (**Table 3**). Ice core and ROV chl a biomass estimates for the dominant ice classes differed significantly (Wilcoxon test, p < 0.05) at 2 locations (F,G; **Table 3**; **Figure 3C**). NPP estimates derived from FM-cores and ROV observations showed no obvious changes and maintained the same patterns (i.e., nonrepresentativeness) for all locations. Most obvious differences were observed between the entire chl a biomass surveys and dominant ice class subsets for the SUIT at locations B, F and G, and for the ROV at locations H and I (**Tables 1**, **3**; **Figures 3A,C**). Furthermore, the separation between low chl a biomass locations B to F and high chl a biomass locations G to I is more obvious from the large scale dominant ice class estimates (**Figure 3C**).

Two sea ice regimes were identified at station 349 of group H: one thicker sea ice region and one thinner region (Figure S7). The thicker region (median: 1.9, IQR: 1.2–3.5 mg chl a m−<sup>2</sup> ) had significantly higher (Wilcoxon test, p < 0.05) chl a biomass than the thinner region (median: 1.3, IQR: 1.2–1.4 mg chl a m−<sup>2</sup> ). NPP, however, was significantly lower at the thicker region (median: 0.07, IQR: 0.04–0.19 mg C m−<sup>2</sup> d −1 ) compared to the thinner region (median: 0.14, IQR: 0.12–0.21 mg C m−<sup>2</sup> d −1 ). Ice cores from the thicker region had higher chl a biomass (median: 0.3, IQR: 0.2–0.5 mg chl a m−<sup>2</sup> ) compared to ice cores from the thinner region (median: 4.6, IQR: 2.8–6.2 mg chl a m−<sup>2</sup> ) although the p-value of the Wilcoxon test was 0.06 due to the low sample size.

### Sea Ice Ridges

At ice location B (station 224; **Figure 1**) we identified two sea ice ridges based on the ROV draft measurements (**Figure 6A**). Ridge 1 had a median sea ice draft of 4.5 m and ridge 2 had a median draft of 2.8 m based on ROV measurements (**Table 4**). Bottom-ice light was significantly higher in level ice compared to both ridges (p < 0.05; **Table 4**). Nonetheless, both ridges had significantly higher ice algal chl a biomass than the level ice (p < 0.05; **Table 4**; **Figure 6C**). Ridge 2, however, had significantly lower NPP compared to level ice, whereas ridge 1 had similar NPP compared to the level ice (**Table 4**; **Figure 6D**). Conversely, ridge 1 had both higher draft values and higher bottom-ice scalar irradiance values I at the bottom compared to ridge 2 (**Table 4**; **Figures 6A,B,D**). In the level ice, chl a biomass and bottomice light explained comparable amounts of the NPP variance. At ridges 1 and 2, however, chl a biomass explained relatively more variance compared to bottom-ice light (**Table 4**).

Based on the ridge identification analysis for all SUIT stations we calculated a mean (min–max) ridge density of 7.5 ridges km−<sup>1</sup> (2.5–18.0), mean ridge width of 68.7 m (47.6–100.3), and a mean percent total ice coverage by ridges of 9.2% (2.5–15.4%). Ridge analysis summaries for each SUIT station are shown in **Table 5**. SUIT profiles with identified ridges are shown in **Figure 7** (station 223) and for all other stations in Figures S9–S16.

High chl a biomass sea ice ridges were also identified within three SUIT stations (station 223: **Figure 7**; stations 233, 285, and 358 Figures S11, S13, S16). These identified high chl a biomass ridges had chl a biomass estimates in the range 2– 9 mg chl a m−<sup>2</sup> (**Table 5**), which was larger than the overall SUIT profile median values in the range 1.2–1.9 mg chl a m−<sup>2</sup>

(**Table 1**). When comparing chl a biomass values at coincident identified sea ice ridges with chl a biomass at level ice for each SUIT haul separately, we observed significantly higher (Wilcoxon test, p < 0.05) sea ice ridge chl a biomass than level ice chl a biomass at 2 SUIT hauls (stations 223 and 233; **Table 5**). When comparing all SUIT observations combined, sea ice ridge chl a biomass (median: 0.7 and IQR: 0.2–1.4 mg chl a m−<sup>2</sup> ) was significantly higher (Wilcoxon test, p < 0.05) than level ice chl a biomass (median: 0.3 and IQR: 0.0–1.0 mg chl a m−<sup>2</sup> ).

# Spatial Variability of Sea Ice Properties, Algae Chl a Biomass, and NPP

Autocorrelation analyses for each station were conducted using correlograms (i.e., Moran's I vs. distance classes), and were all globally significant at the Bonferonni corrected level (p < 0.05/n; n = the number of distance classes). Patch sizes, identified as the distance class at which the first zero value of Moran's I occurred in the correlograms, were highly variable between stations and between measured variables (**Table 6**). Patch sizes for chl a (Pc) had a lower range of values between 7 and 30 m, whereas

patch sizes for transmittance (Pt), draft (Pd) and NPP (Pp) were slightly higher in the range 10–50 m (**Table 6**). P<sup>t</sup> and P<sup>p</sup> were comparable (within 5 m) at all ROV stations except 224, which had the two identified ridges. The shapes of correlogram curves were similar for transmittance and NPP for all station surveys (**Figure 8** and Figures S17–S24). Correlogram shape comparisons for all stations were highly variable with no obvious patterns for all other measured variables (**Figure 8** and Figures S17–S24).

Based on the manually identified patches within the gridded maps, coincident patches of high transmittance and high NPP were observed at all stations. Coincident patches of only high chl a and thick draft values were observed at stations 224 and 237b, although the patches at 237b were more subtle (**Figure 6** and Figure S18). The two draft patches observed at 224 correspond to ridge 1 and ridge 2 (**Figure 6**) described in the previous section Sea Ice Ridges. Coincident patches of only high chl a, transmittance and NPP were observed at stations 224, 335f,m, and 360 (**Figure 6**, Figures S5, S6, S8).Coincident patches of only high chl a and NPP were observed at stations 255 and 349 (Figures S3, S7).

### DISCUSSION

### Overall Representativeness of the Ice Algal Chl a Biomass and NPP Estimates Using Different Sampling Methods Chl a Biomass

During land-based campaigns in coastal regions it is possible to achieve ice core sample sizes well over 50 ice cores (e.g., Gosselin et al., 1986; Rysgaard et al., 2001; Granskog et al., 2005; Mundy et al., 2007; Campbell et al., 2015). However, such studies are conducted over a period of weeks to months and are typically confined to a local study region. Furthermore, land-based studies are generally conducted on landfast sea ice, in regions dominated by seasonal sea ice. Thus, during the advanced melt stages in seasonally ice covered regions sampling sea ice is typically not done because it also coincides with the termination of the algal bloom, and/or due to logistical and safety constraints. Where sea ice survives into late-summer (e.g., the central Arctic Ocean), ship-based sampling is the most effective sampling approach. Although ship-based sampling has some advantages (e.g., bringing the equipment and lab to the study region), the main disadvantage is that sampling is generally timelimited. Thus, ice core sampling during ship-based campaigns is generally limited to <10 algal chl a biomass or NPP cores per ice station making it difficult to conduct spatial studies of sea ice algae (e.g., this study; Gosselin et al., 1997; Gradinger, 1999; Schünemann and Werner, 2005; Fernández-Méndez et al., 2015). Even during long term ship-based studies (e.g., Melnikov et al., 2002) ice core sampling was limited to a small number of cores for each sampling interval every 1–2 weeks.

Our results demonstrate large uncertainties in coring-based methods for capturing the larger-scale variability of ice algal chl a biomass observed by the ROV-based methods. However, assessing the magnitude of this uncertainty for other studies is not possible. In general, our ice coring results under-estimated ice algal chl a biomass at the relatively lower chl a biomass locations (B-F), which implies an overall under-estimation of total chl a biomass. Only at the higher chl a biomass locations (H and I) the ice cores accurately captured the variability of ice algal chl a biomass. The higher chl a biomass observed at

m−<sup>2</sup> ) derived from ROV spectral radiation measurements; and (D) net primary production-NPP (mg C m−<sup>2</sup> s −1 ) derived from ROV measurements. R1 and R2 depict ridge 1 and ridge 2, respectively. Gray circles represent values greater than the scale maximum value.

locations G-I was likely the result of less melt-induced algal losses due to thicker ice and lower melt rates at these high-latitude locations (Lange et al., 2016). This difference can be explained because at low chl a biomass stations relatively higher chl a biomass patches had a lower probability to be sampled by coring compared to higher chl a biomass stations, and hence were not accurately represented, whereas at high chl a biomass locations the probability of sampling higher chl a biomass locations was higher. We must also note that the possibility that the upscaled spectrally derived estimates over-estimated the true chl a biomass is unlikely, because the model for spectrally deriving chl a biomass had no directional bias related to chl a concentration in sea ice (Lange et al., 2016).

The higher chl a biomass location I showed no significant difference between the cores and ROV-based chl a biomass estimates. In the individual core values, however (0.05, 6.46, 8.03, 8.00, and 11.83 mg chl a m−<sup>2</sup> ), only one core was within the IQR (2.96–6.70 mg chl a m−<sup>2</sup> ). In this sample size, one core with near-zero chl a biomass was highly influential and may have impeded the detection of significant differences. A similar pattern was also apparent at location H, which also showed no significant difference, but also had only one core within the IQR of the up-scaled chl a biomass estimates. The discrepancy between the ice core-based and ROV-derived chl a biomass estimates indicates the ice algal chl a biomass was highly variable at small scales (<2 m), which was difficult to


TABLE 3 | Modal sea ice draft from literature (Boetius et al., 2013; Katlein et al., 2015b) and ROV measurements, dominant ice class based on literature modal draft, interquartile range of chlorophyll a biomass observations for the dominant ice class using different gears (ice coring, ROV, and SUIT) summarized for each location.

<sup>a</sup>Literature modal ice thickness converted to draft by multiplying by 0.9. "–" indicates no data. Noteworthy wilcoxon test results are indicated by \* for a significant difference at p < 0.05 for comparisons between cores and ROV chl a biomass for observations on ice within the dominant ice class. • Indicates a CORES NPP estimate outside the IQR of the ROV NPP estimates for observations within the dominant ice class. (+) indicates CORES greater than ROV 75th percentile; and (−) indicates CORES smaller than ROV 25th percentile. nd refers to no data.

capture with average measurement footprints between 1 and 2 m for ROV surveys. Individual data points of up-scaled estimates averaged chl a concentration over a larger area, and were thus less likely to capture small patches of extremely high chl a biomass or extremely low chl a biomass (i.e., values in the range 8–12 mg chl a m−<sup>2</sup> or with near-zero chl a biomass). These considerations highlight two important sampling constraints. First, the cores did not capture the large-scale variability; and second, we were unable to assess the small-scale variability below 2 m. The second limitation is less drastic since the signal received from the sensor under the ice does capture the smallscale variability within its measurement by averaging it over a larger distance. Since little is known or has been reported on summertime spatial variability of ice algal chl a biomass we propose that observations from both core-based and under-ice spectral profiling systems should be combined when making

TABLE 4 | Comparison of chlorophyll a biomass and net primary production between sea ice ridges and level ice at station 224.


Ridges are identified in Figure 7. \* Indicates a statistically significant (p < 0.05) Wilcoxon test between the corresponding Ridge and Level Ice.

<sup>a</sup>Value within square brackets represents the explained variance of NPP by the corresponding variable and data subset of ridge or level ice. "I" is the bottom ice light levels (PAR).

TABLE 5 | Summary of ridge identification analysis from the SUIT hauls conducted during PS80.


\* Indicates a statistically significant (p < 0.05) Wilcoxon test comparing chl a biomass in ridges and level ice.

assumptions about multi-scale spatial variability of ice algal chl a biomass.

The fact that no statistical differences (Wilcoxon test, p > 0.5; **Table 2**) were observed between ROV-based and ice corebased estimates (both chl a and NPP) when they were grouped into MYI and FYI stations, an approach taken by Fernández-Méndez et al. (2015), may suggest an improvement because it increased the probability of the ice cores to be representative of the larger area. In this case the sample sizes and range of chl a biomass values were sufficient to obscure any differences at the station level. This method should only be considered when other options are not possible, because large uncertainties are still present even though significant differences were not found. For example, even though in this case the mean MYI FM-core chl a biomass values were not significantly different (Wilcoxon test, p > 0.05), each MYI FM-core value was higher or lower than the IQR of the larger-scale estimates. A similar pattern was observed with the FYI grouping comparison although not as drastic because overall the values were smaller, particularly the range of values. Potential uncertainties of grouping ice cores should also be considered depending on the objectives of your study. Grouping the ice cores into MYI and FYI would result in mean ice core MYI algal chl a biomass estimates 160% larger than the ROV MYI estimates. In contrast, FYI ice core estimates would be around 60% of the ROV FYI estimates. For grouped ice core-derived NPP the difference for MYI is even more pronounced at 270% larger compared to ROV estimates. FYI NPP values, however, were comparable between both methods.

Photoacclimation may be another potential factor influencing the chl a to carbon ratios, which could in turn explain the increased chl a biomass at higher latitude stations due to increased chl a production under lower light conditions. Fernández-Méndez et al. (2015) measured lower photoacclimation indices for the higher latitude stations (Ik; mean: 30 µmol photons m−<sup>2</sup> s −1 , range 17–45) compared to the lower latitude stations (mean: 60 µmol photons m−<sup>2</sup> s −1 , range: 34–77). However, we did not observe any variability (<1 g C: g chl a) between high and low latitude stations in the chl a to POC ratios (data not presented here), therefore it is unlikely that photoacclimation explains the regional chl a biomass differences.

### NPP

In general, NPP sampling involves measuring available PAR levels through a hole in the ice (Gosselin et al., 1997), which may produce higher than expected values due to the hole. PAR available for bottom-ice algae may also be modeled by using simple light extinction models (Fernández-Méndez et al., 2015). Both methods are established and regularly employed, however, both are limited in the fact that they do not account for the spatial variability of the bottom-ice PAR levels. During spring, ice algae are typically light-limited and therefore have higher chl a biomass where light levels are higher (e.g., Gosselin et al., 1986), assuming there is no or limited photo-inhibition. During our summer sampling period, however, we found no strong correlation at any station between the ROV-derived chl a estimates and available under-ice light (maximum spearman correlation coefficient, r = 0.22). This means the under-ice light TABLE 6 | Summary of the autocorrelation analyses per location and ROV survey.


Patch sizes for chl a, Pc; transmittance, Pt; draft, Pd; and NPP, Pp. TM corresponds to transmittance, and NPP to net primary production.

\*All correlograms globally significant at the Bonferonni corrected level (p < 0.05/n; where n is number of distance classes; Legendre and Legendre, 1998).

a Identifies correlogram curves which are similar in shape to each other (e.g., chl a-TM-NPP means the correlogram curves are similar for the chl a, transmittance and net primary production).

<sup>b</sup>Manually identified patches that are coincident in location to each other. The number of patches per ROV survey is followed by which patches are coincident (e.g., TM-NPP refers to a transmittance patch coincident to an NPP patch). Large/multi refers to a larger area with multiple small patches in close proximity.

and chl a varied independently of each other. This behavior is expected, because in late-summer biomass losses due to high melt rates have a dominant influence on bottom-ice biomass (e.g., Grossi et al., 1987; Lavoie et al., 2005; Lange et al., 2016). These conditions would not have sustained a bottom-ice algal community, and therefore even if light conditions were suitable for high primary production rates the NPP would have been almost zero if no algae were present. With an additional variable (i.e., melt), which can influence NPP, the spatial distribution of NPP may be more complex in late-summer than during the spring to summer transition making it even more important to understand and account for the spatial variability of both chl a biomass and the bottom-ice light field.

Location B had similar NPP estimates for the FM-core and up-scaled observations (**Table 1**; **Figure 2**), which we attributed to the similar chl a biomass estimates (**Table 1**; **Figure 1**). Even though light levels and chl a biomass were only slightly larger at location B compared to groups C and D, group B had NPP estimates almost an order of magnitude larger than groups C and D. This was attributed to the substantially higher value of the photosynthetic parameter P B s determined for this station (Fernández-Méndez et al., 2015), compared to all other stations. This demonstrates that the combination of data from several stations, an approach described by Fernández-Méndez et al. (2015) and used by others (Mundy et al., 2011; Campbell et al., 2016), was not only able to improve the spatial representativeness of light and chl a but also accounted for the potential variability of the derived photosynthetic parameters. Our results suggest pooling ice core samples increases the chance for the samples to be representative of both chl a biomass and NPP estimates. Because of the large range of chl a biomass and NPP estimates, and the small number of samples, however, this approach can still carry a high risk of obtaining non-representative estimates (e.g., overestimates of up to 270% for MYI).

The same directional difference of chl a biomass and NPP observed between up-scaled and FM-core estimates for all station groups, except group C, suggests the differences between the FMcores and up-scaled NPP estimates were driven by the differences in chl a biomass. This was further confirmed by the fact that the bottom-ice light levels used for each method were comparable for each station (**Table 1**). The opposing pattern of chl a biomass and NPP between up-scaled and FM-core estimates at location C, even though light levels were comparable, suggests that the spatial variability of both the chl a biomass and bottom-ice light had a combined influence on the observed differences that is not apparent from the overall survey estimates. The explained variance of NPP by chl a and light showed large diurnal variability and large inter-location variability, which indicates a complex and highly variable relationship between ice algal chl a biomass and light levels during our sampling period. These results emphasize the importance of accounting for both the spatial variability of ice algal chl a biomass and the bottomice light field in order to make representative NPP estimates. We must also note the possible influence of nutrients since we found a significant (p < 0.05) positive correlation (r = 0.46) between explained variance of NPP by chl a with sea

ice NO<sup>3</sup> concentrations (data from Fernández-Méndez et al., 2015), and a significant (p < 0.05) negative correlation between explained variance of NPP by bottom-ice light with sea ice NO<sup>3</sup> concentrations (r = −0.55). These correlations provide some indication that the sea ice nutrient regime could have also had some influence on the relative (inter-station) importance of chl a biomass vs. light on NPP.

Gosselin et al. (1997) measured ice algal NPP of up to 300 mg C m−<sup>2</sup> d −1 in the high Arctic Ocean (>87◦N) during August. Our results for September in the high Arctic Ocean (station 360) were over 3 orders of magnitude lower than those found by Gosselin et al. (1997) in August for the same area. The large difference in NPP estimates between the studies could partially be explained by the higher incoming solar irradiance in August compared to September. However, upscaling results from Fernández-Méndez et al. (2015) for August were also substantially lower with a mean (range) of 5.8 (0.06–42) mg C m−<sup>2</sup> d −1 and with a similar range of daily mean incoming solar irradiance (101–249 µmols photons m−<sup>2</sup> s −1 ). Therefore, we applied the range of incoming irradiance values (∼125– 214 µmols photons m−<sup>2</sup> s −1 ) observed at the high latitude stations (>87◦N) during the Gosselin et al. (1997) study to this studies ROV survey at station 360. We used our observed chl a biomass and transmittance in order to calculate potential NPP under higher incoming irradiance conditions typical for August at these high latitudes. Overall NPP increased by nearly the same relative amount as the available light, however, with median values between 0.82 and 1.32 mg C m−<sup>2</sup> d −2 (**Table 7**) this remains two orders of magnitude lower than ice algal NPP observed by Gosselin et al. (1997). This suggests that something other than available light is influencing these observed differences. This is likely explained by the fact that the Gosselin et al. (1997) estimates were dominated by the subice algal species Melosira arctica, whereas Fernández-Méndez et al. (2015) measured primary production on ice samples with a lower contribution of M. arctica. Thus, our samples represent a good estimate for in-ice algal NPP, however, a conservative estimate for overall ice-associated NPP (Fernández-Méndez et al., 2015).

The explained variance of NPP by bottom-ice light compared to chl a using the increased incoming irradiance levels, which were observed in August at high latitudes by Gosselin et al. (1997), showed interesting differences (**Table 7**). As the incoming irradiance increased, the explained variance of chl a biomass also increased, while the explained variance of the bottom-ice light decreased to nearly equal values of 0.39 and 0.48, respectively, at an incoming irradiance of 214 µmols photons m−<sup>2</sup> s −1 (**Table 7**). This indicates that under increased irradiance levels the spatial variability of chl a biomass becomes more important in terms of contribution to overall NPP estimates. Furthermore, these results suggest a complex spatio-temporal relationship of the relative importance of chl a biomass and available bottom-ice irradiance for NPP estimates, which can only be accounted for by characterizing biomass and under-ice light at spatial scales from


TABLE 7 | Net primary production estimates for the ROV survey at station 360 with observed downwelling surface irradiance (PAR) and using different downwelling surface irradiance conditions as observed for the same region (>87◦ N) earlier in the season (∼ mid-August) by Gosselin et al. (1997).

<sup>a</sup>Downwelling surface irradiance data presented in Gosselin et al. (1997) from the same region as station 360.

<sup>b</sup>The bottom-ice scalar irradiance used to calculate NPP.

meters to 100s of kilometers, and temporal scales accounting for diurnal and seasonal variations.

# Sea Ice Algal Chl a Biomass and NPP in Relation to Sea Ice Properties

### Sea Ice Classes

Electromagnetic (EM) sea ice thickness surveys are commonly used to representatively characterize the overall ice thickness distribution (Eicken, 2001; Haas and Eicken, 2001; Haas, 2004) and thus represent a reliable characterization of the dominant ice class for the surveyed floe and overall region. The differences between the ROV-derived and EM-derived modal draft values at several ice stations warranted the use of the EM data to determine dominant ice types. The range of modal ice thicknesses for locations B-G dominated by FYI (0.8–1.3 m) were consistent with previous studies that conducted large-scale airborne and floescale ground-based electromagnetic ice thickness surveys for the same region and season (Haas et al., 1997; Haas and Eicken, 2001; Rabenstein et al., 2010). The two locations H and I dominated by MYI had modal thicknesses between 1.6 and 1.8 m, which were also consistent with modal ice thickness values for second-year sea ice from the same region and season (Haas and Eicken, 2001).

Since the dominant ice type thickness value (i.e., modal ice thickness) is a commonly used metric to characterize the sea ice environment it stands to reason that sea ice algal chl a biomass from the dominant ice class would also provide a representative metric to describe the overall sea ice algal chl a biomass. Comparing the ice algal chl a biomass estimates solely from the dominant ice classes showed better agreement between ROV and ice core-derived values (**Figure 3C**). Therefore, we suggest that using chl a biomass estimates from the dominant ice class only may be an improvement on providing a single value, which is representative of the large scale sea ice algal chl a biomass for that region. There remain some limitations to this approach, since these estimates do not account for the chl a biomass of the other ice types/classes. Sampling other ice types/classes may be of particular importance in regions of low chl a biomass (e.g., station 224) where high chl a biomass features such as ridges may have a substantial contribution to the overall large-scale ice algae chl a biomass. A further step to improve these overall chl a biomass values could be to use the larger-scale ice thickness density distributions (data not available for this study) to provide weighting factors for chl a biomass values of each ice type/class.

The observed trend of higher chl a biomass at higher latitude stations was more obvious within the dominant ice class estimates (**Figure 3C**). This was previously attributed to enhanced melt-induced algal losses at lower latitude stations, although based on a smaller number of stations (Lange et al., 2016). Here we have a larger sample size covering a larger geographic region and confirmed the pattern related to latitude and the presence of thicker ice. Enhanced melt is a common mechanism for substantial losses of bottom-ice algae in summer (Grossi et al., 1987; Lavoie et al., 2005). Gosselin et al. (1997) also observed a shift from low to high bottom-ice chl a biomass with a shift from low to high latitude, which is consistent with our observed trend. Furthermore, the higher dominant ice class chl a biomass estimates between 2.5 and 5.1 mg chl a m−<sup>2</sup> observed at the three high latitude, thicker ice (1.4–1.9 m) locations G–I is consistent with previous studies from high latitude regions of the central Arctic Ocean with bottom-ice algae concentrations in the range of 3–14 mg chl a m−<sup>2</sup> (Gosselin et al., 1997), and up to 22 mg chl a m−<sup>2</sup> (Melnikov, 1997).

Castellani et al. (2017) introduced a pan-Arctic Sea Ice Model for Bottom-Algae (SIMBA) coupled with a 3D sea-ice-ocean model and also showed that within the eastern Eurasian basin during late-summer (this studies sampling region/period) there was an increasing trend in bottom-ice algal chl a biomass from lower to higher latitudes. The SIMBA model, however, showed the opposite trend with increasing chl a biomass from higher to lower latitudes in the region from the North Pole toward the northern coast of Canada and Greenland (the Lincoln Sea) where the thickest ice in the Arctic Ocean is located. During latesummer Castellani et al. (2017) identified sea ice thickness as a main factor controlling bottom-ice chl a biomass by limiting basal melt-induced algal losses during a period of advanced melt. Based on observations alone, a purely latitudinal effect may have been identified as driving the large-scale spatial patterns of sea ice algae. Therefore, the modeling results of Castellani et al. (2017) emphasize the need for more observations in the region between Canada, Greenland and the North Pole (a region coined: the "Last Ice Area"), and that modeling studies are essential in the context of interpreting large-scale patterns of sea ice algae observations. Furthermore, the SIMBA model included different ice classes, such as sea ice ridges. The SIMBA results showed that ridges, though exhibiting lower peak chl a biomass compared to level ice, the maximum chl a biomass was reached later in the season compared to level ice. More information is required in order to accurately parameterize sea ice features such as ridges in pan-Arctic models, however, SIMBA is a big step forward in terms of including ice classes within models, which we have identified as an important component of sea ice algal spatial variability.

In contrast to chl a biomass, NPP estimates showed no improvement when comparing only the dominant ice class (**Figure 3D**). This suggests that NPP estimates require a different approach for up-scaling and parameterizing models. The complex and highly variable relationship between ice algal chl a biomass and light levels during our sampling period suggests that more representative sea ice algal NPP estimates may be achieved by accounting for the relative contribution of NPP within each ice type. This would involve using larger scale ice thickness estimates to assign weighting factors to each ice classes' NPP estimate. In the absence of larger scale observations it is not possible to discover the spatial patterns of sea ice algal chl a biomass and NPP, or assess if the ice cores are actually representative of the area. To further improve upon the large scale pan-Arctic NPP and chl a biomass estimates we suggest to integrate our five ice classes, together with weighting factors for each ice class (based on large-scale ice thickness surveys), into pan-Arctic studies (e.g., Fernández-Méndez et al., 2015). Accurately assessing sea ice associated NPP in models is of particular importance since it can represent a dominant portion of total (water plus sea ice) NPP in regions covered by sea ice for most of the year (e.g., Gosselin et al., 1997; Fernández-Méndez et al., 2015). In general, pan-Arctic models of NPP, which include sea ice contributions to NPP are very limited (e.g., Lee et al., 2015) highlighting the need for improved sea ice algae model parameterizations.

### Sea Ice Ridges

One source of variability in sea ice chl a concentrations, light transmittance and derived NPP may be topographical features of sea ice, such as ridges. Sea ice ridges are often under-sampled due to the logistical challenges to sampling this type of ice. Despite this fact, sea ice ridges have been reported to host high abundances of sea ice fauna during advanced melt (Gradinger et al., 2010). Furthermore, in the northern Baltic Sea high chl a biomass were observed within the ice along the upper sides of sea ice ridges and within the interstitial spaces, typically present within the unconsolidated aggregation of ice blocks that form ridge keels (Kuparinen et al., 2007). Therefore, we specifically investigated sea ice ridges with the hypothesis that they could host high abundances of ice algae during advanced melt due to lower melt rates in these locations.

We showed that the identified sea ice ridges at ROV station 224 and all SUIT stations (measurements grouped together) had significantly higher chl a biomass than measurements under relatively more level ice (e.g., areas that are not ridges). It can be assumed that ridges were under-represented in the ROV sampling due to a preference for relatively uniform sampling sites. In SUIT profiles, the natural distribution of ridges was likely well-represented, because the sampled profile cannot be chosen after the deployment of the net. The overall difference between median level ice chl a biomass and median ridge chl a biomass from the SUIT surveys, however, was relatively small (0.4 mg chl a m−<sup>2</sup> ). The small difference is likely the result of not all ridges having high chl a biomass.

Our results of sea ice ridge densities between 2.5 and 18.0 ridges km−<sup>1</sup> are within the range of larger scale airborne surveys with mean ridge sail densities between 4.3 and 7.2 ridges km−<sup>1</sup> (Rabenstein et al., 2010). With the high resolution (0.5 m) under-ice topography measurements, we were able to accurately estimate the widths of the ridge and determined that these features represented up to 10% of the total sea ice area. Together with the higher chl a biomass observed at sea ice ridges, this indicates that these features require more in-depth investigations and may have a significant impact on overall chl a biomass estimates and availability of food for under-ice organisms.

Gradinger et al. (2010) showed sea ice ridges had elevated concentrations of ice meiofauna and under-ice amphipods, which was attributed to the flushing of the sea ice and lowsalinity stress imposed at the thinner sea ice environment. Sea ice ridges may also extend into higher salinity water below the highly stratified, fresher surface melt water, which accumulates adjacent to the ridges under thinner ice (Gradinger et al., 2010). These results suggest that higher ice algal chl a biomass at ridges may be the result of reduced flushing and lower environmental stress. Furthermore, the presence of high algal chl a biomass as a food source may provide an additional explanation for the observed accumulation of organisms at ridges by Gradinger et al. (2010).

In addition to the possibility of reduced flushing and lower environmental stress at ridges, we suggest that the thicker ice experienced lower melt rates than the surrounding level ice resulting in lower algal losses. Perovich et al. (2003) indicated that sea ice ridges experienced an overall greater amount of melt than the surrounding undeformed sea ice, which may appear to contradict our premise. The higher overall melt observed at ridges by Perovich et al. (2003), however, was partially attributed to a few very thick ridges extending deep into the water, which were experiencing melt the entire year even during winter. Except for one weekly measurement in August, the melt rates for ridges were lower than the mean and were among the lowest of all ice types for that entire month during advanced melt (Perovich et al., 2003).

NPP estimates for sea ice ridges showed some interesting patterns at ROV station 224. Although both ridges had significantly higher chl a biomass than the level ice, ridge 2 had significantly lower NPP rates than ridge 1 and the level ice, whereas ridge 1 and level ice were not significantly different. These differences were due to the available light measured under the different types of sea ice. The higher chl a biomass at ridge 1 compensated for lower light levels compared to the level ice, resulting in similar NPP estimates compared to the level ice. However, the chl a biomass at ridge 2 was not sufficient to compensate for the lower bottom-ice light levels. Even though ridge 2 had a thinner median draft (2.8 m) value compared to ridge 1 it still had lower light levels. This shows that ridges can have a considerable impact on the complex relationship between chl a biomass and available PAR for NPP estimates at larger spatial scales. Furthermore, these results imply that sea ice features such as ridges have a different and perhaps more complex relationship between available light and chl a biomass than the surrounding sea ice. As a consequence, ridges must be sampled representatively, and both the variability of bottom-ice light levels and the variability of chl a biomass are required to make representative large-scale ice algal chl a biomass and NPP estimates.

The identification of sea ice ridges as potential chl a biomass and NPP hotspots warrants further dedicated research of these features. Further work should include dedicated modeling of the (bio)optical properties of sea ice ridges, which would require ice core chl a biomass estimates from ridges and high spatial resolution spectral radiation measurements under ridges.

# Spatial Variability and Patchiness of Sea Ice Properties, Algae Chl a Biomass, and NPP

Our results indicated high variability of patch sizes between locations, which suggests that there is large regional and temporal variability of ice algal chl a biomass. Patch sizes of algal chl a biomass were within the range of springtime chl a biomass patch sizes between 5 and 90 m (Gosselin et al., 1986; Rysgaard et al., 2001; Granskog et al., 2005; Søgaard et al., 2010). However, the upper limit of this range is nearly the scale of some ROV surveys. The above mentioned studies were limited to the spring and found that ice algal chl a biomass and NPP typically followed the light regime (Gosselin et al., 1986; Rysgaard et al., 2001; Granskog et al., 2005), which is primarily controlled by the overlying snow pack (Perovich, 1996). Furthermore, these studies were conducted on uniform, landfast sea ice from coastal regions and thus are not representative of sea ice from the central Arctic Ocean. During our study, we also found that patch sizes and spatial variability of NPP was controlled primarily by light availability, albeit in the absence of snow. This was evident by the high explained variance of NPP by bottomice light, the similarity of NPP and transmittance correlogram curves, and the coincidence of high NPP patches with high light transmittance. Similar to a recent study by Campbell et al. (2017) that demonstrated a disjoint in ice algal carbon biomass and NPP over the spring to summer transition period, our chl a biomass patches did not always follow the light and NPP regimes, which clearly illustrates one key difference between the spring and summer ice algal communities.

We also demonstrated that patches of high NPP were associated with patches of high chl a biomass in the absence of high light availability. The fact that both chl a and transmittance show spatial patterns consistent with NPP patterns is not surprising given the fact that NPP estimates were calculated from light and chl a biomass. However, this emphasizes the need to account for the spatial variability of both the bottom-ice light and chl a biomass to properly characterize the spatial variability of NPP in order to make accurate large-scale estimates. At a few stations (most notably 360), however, we did observe high chl a biomass patches directly adjacent to high transmittance locations (e.g., melt ponds). NPP was also high at the high transmittance locations and the adjacent high chl a biomass patches creating one high NPP patch. We propose that the presence of high chl a biomass adjacent to high transmittance regions could be explained by a combination of lower melt rates in the thicker ice adjacent to high transmittance regions and increased bottomice light levels due to horizontal light scattering from e.g., melt ponds. This would have allowed for higher NPP rates and increased accumulation of chl a biomass while having reduced melt-induced losses, however, we note that more work is needed to confirm this hypothesis.

## Sea Ice Algae Sampling Recommendations

In this section we provide some recommendations for conducting the most representative sea ice algae sampling possible under the typical time limitation of an ice station on this cruise of ∼8 h. We assume that the dominant ice class (e.g., modal ice thickness) is known before sampling. Knowledge of the dominant ice class is important to ensure representative sampling; however, this depends on the objectives of the study. Knowledge of the spatial distribution for all ice types and classes will provide the best sampling protocol since a representative sample of each ice type/class will provide the most accurate and reliable estimates for the region.

### Ice Core Chl a Biomass and NPP

A nested approach has been outlined in Miller et al. (2015), which identifies four hierarchical levels of sea ice sampling. We suggest, however, some modifications for sampling during summer. If the main objective of the study is to acquire one representative ice algal chl a biomass value for that ice station, then we suggest following the quaternary scale of the nested approach according to Miller et al. (2015) by extracting replicate cores (N = 3) within a small area (<2 m). This accounts for the small scale variability. The tertiary scale of the nested approach should be selected based on ice type/class. Here we have demonstrated that a representative chl a biomass value may be best estimated by the dominant ice class (NOTE: this should only be considered if additional sampling is not possible). Therefore, ice cores should be sampled in triplicate (quaternary scale) at three different dominant ice class locations (tertiary scale).

To capture the spatial variability of chl a biomass using ice coring alone, all ice classes should be considered. The nested approach should sample triplicate ice cores (quaternary scale) at 10 m intervals (tertiary scale) based on our observed patch sizes between 10 and 30 m. We further suggest that the replicates and direction of tertiary scale transects should be designed to capture all ice classes. A systematic approach would be to classify the sea ice using 0.5 m interval classes (as presented here). The sample design must also consider other ice types such as melt ponds, bare ice and thick ice features (e.g., ridges and hummocks). We must also note that the time requirements for conducting a spatial variability study using ice coring will be highly variable depending on season and ice conditions. For example, to quantify the spatial variability of thick MYI in early spring over a distance of 100 m (e.g., 3 cores at 11 sites = 30 cores) would take 30 h (based on previous experience coring spring MYI). This same task could be accomplished by an ROV with a typical deployment time of 8 h for two perpendicular survey transects of 100 m.

We demonstrated that NPP estimates have a complex relationship between light and chl a biomass. Therefore, in order to acquire a representative estimate the spatial variability of both the under ice light field and chl a biomass must be accounted for. We suggest a nested approach similar to that proposed for assessing the spatial variability of ice algal chl a biomass. Triplicate ice cores (quaternary scale) should be sampled at 10 m intervals (tertiary scale). In general, nested NPP sampling schemes should be conducted at the 5 different ice classes, as proposed earlier (N = 15).

### ROV Chl a Biomass and NPP

ROV surveys should be conducted either over a grid or perpendicular transects with at least 60 m axis lengths in both directions for chl a biomass (two times maximum patch size of chl a) and at least 100 m for NPP estimates (two times maximum patch sizes for TM and NPP ∼50 m). This ensures you cross the boundary of the patch at least once. The survey should be chosen so that it covers these dimensions depending on the objectives of the study. However, the main criteria for setting transect/grid dimensions should be so that all 5 ice classes are surveyed (or all identified ice classes for the study site), of particular importance is the inclusion of unique and under-sampled sea ice features such as ridges or MYI hummocks (Lange et al., 2017). As we have shown, even over distances of 100 m the dominant ice classes and all ice features such as ridges were not representatively sampled. This is partly due to the location chosen and also due to filtering of the data resulting in inconsistent sample spacing-coverage. This in turn may over-sample some regions compared to others and result in non-representative surveys for certain ice classes. Therefore, we recommend that care is taken to choose ROV transects or grids that cover all ice classes, and to ensure ROV measurements are conducted while minimizing distance to the ice bottom, and pitch and roll angles.

During data analyses one should always consider the dominant ice class for the corresponding region based on larger scale ice thickness surveys. Because universal algorithms are not yet available for deriving chl a biomass from spectral radiation, ice cores should always be conducted at as many locations as possible along the ROV surveys for training bio-optical models, deriving photosynthetic parameters for up-scaling ROV NPP estimates and subsequently to parameterize algae models. Because the time requirements for ice coring (this does not include laboratory processing times) at the distances required for spatial variability studies (e.g., >100 m) are likely much greater than a typical ROV deployment of 8 h, we strongly recommend to conduct both ROV and ice core sampling particularly for spatial variability studies of both chl a biomass and NPP.

This method does have limitations in terms of assessing the temporality of ice algal chl a biomass and NPP due to the limited period of sampling and logistical constraints. This is a common drawback in observational sea ice biogeochemistry, which results from the limitations of sample processing and incubation times, and the shear difficulty of sampling within the Arctic Ocean in order to cover the necessary periods of weeks to months. However, our approach showed the successful application on spatially extensive datasets and thus is an ideal approach that should be applied to long-term studies (e.g., icetethered sensor arrays; Nicolaus et al., 2010) in order to assess the short- to long-term temporal variability of ice algal chl a biomass and NPP.

# CONCLUSIONS

We provided, for the first time, a detailed multi-scale comparison of ice-core based ice algal chl a biomass and NPP estimates with estimates derived from under-ice spectral radiation measurements conducted over distances of tens to thousands of meters. These approaches demonstrated substantial improvements regarding representative sea ice algae observations. Our results showed that ice core-based estimates of summertime ice algal chl a biomass and NPP do not representatively capture the spatial variability compared to the spatially more extensive estimates of moving platforms. This may carry similar uncertainties, with an overall negative bias of ∼60%, for pan-Arctic estimates based on ice core observations alone.

Our autocorrelation analyses showed patch sizes of algal chl a biomass (10–30 m) and NPP (10–50 m) that were highly variable between locations and with scales of variability unlikely to be captured by ice coring alone. Based on our results we presented sampling recommendations depending on the objectives of the study. To estimate ice algal chl a biomass alone, taking a representative sample (N = 3) of each ice type/class using the ice core method should provide a reliable estimate of the overall area if there is also knowledge/observations of the ice thickness distribution on large scales (>1 km). Upscaling chl a biomass estimates would benefit from sampling all ice classes and factoring in weights for the spatial coverage of different ice classes in the region of interest. For NPP estimates, however, a combination of larger scale (>100 m) under ice light and ice algal chl a biomass is required because of the independent relationship between light and chl a biomass during the end of summer. In order to get the most representative estimates and to address the spatial variability of chl a biomass and NPP, we recommend that future sea ice sampling should combine ice-core based methods with the larger-scale under-ice spectral profiling approaches presented and described here and in Lange et al. (2016). This combined approach is also logistically justified since the time requirements for ice coring, which does not include processing times, at the distances required for spatial variability studies are typically much greater than a typical ROV deployment of 8 h.

We also identified high chl a biomass ridges within several upscaled surveys, which have been generally neglected in sea ice biogeochemical studies. Sea ice ridges had significantly higher chl a biomass than the level ice and accounted for up to 10% of the total areal ice coverage. This suggests that these features may represent important regions for sea ice algal growth that are not easily captured by ice coring methods due to logistical difficulties of coring such thick sea ice. Further dedicated sea ice ridge studies are warranted particularly in terms of ice algal chl a biomass, nutrients, primary production and bio-optical properties.

# AUTHOR CONTRIBUTIONS

This study was designed by BL, CK, MN, IP, and HF. Data acquisition were performed by BL, CK, MF-M, MN, IP, and HF. Data analyses were performed by BL, GC, CK, MF-M, MN, IP, and HF. Interpretation of the results were performed by BL, CK, GC, and HF. Drafting the first version of the manuscript was done by BL with critical revisions and important intellectual additions by all other authors (CK, MN, GC, IP, MF-M, and HF) during all stages of manuscript preparation. All authors give final approval for the publication of this manuscript in its current form.

## ACKNOWLEDGMENTS

We thank Captain Uwe Pahl, the crew, and scientific cruise leader Antje Boetius of RV Polarstern expedition PS80.3 (ARK27- 3; IceArc), for their excellent support and guidance with work at sea. We thank Martin Schiller for his technical expertise and operational support during ROV deployments. We thank

# REFERENCES


Jan Andries van Franeker (IMARES) for kindly providing the SUIT and Michiel van Dorssen for technical support. SUIT was developed by IMARES with support from the Netherlands Ministry of EZ (project WOT-04-009-036) and the Netherlands Polar Program (project ALW 866.13.009). We acknowledge the collaboration and technical support by Ocean Modules, Sweden for development and deployment of the ROV. This study is part of the Helmholtz Association Young Investigators Group Iceflux: Ice-ecosystem carbon flux in polar oceans (VH-NG-800). We also acknowledge the Alfred-Wegener-Institut, Helmholtz-Zentrum für Polar- und Meeresforschung for essential financial and logistical support. All data are available from the PANGAEA databases: doi: 10.1594/PANGAEA.833292; and doi: 10.1594/ PANGAEA.834221.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmars. 2017.00349/full#supplementary-material


the Arctic Ocean. Deep Sea Res. Part I Oceanogr. Res. Pap. 49, 1623–1649. doi: 10.1016/S0967-0637(02)00042-0


Wilcoxon, F. (1945). Individual Comparisons by Ranking Methods. Biometr. Bull. 1, 80–83. doi: 10.2307/3001968

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer TJ and handling Editor declared their shared affiliation.

Copyright © 2017 Lange, Katlein, Castellani, Fernández-Méndez, Nicolaus, Peeken and Flores. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Primary Production: Sensitivity to Surface Irradiance and Implications for Archiving Data

Trevor Platt <sup>1</sup> , Shubha Sathyendranath<sup>2</sup> \*, George N. White III <sup>3</sup> , Thomas Jackson<sup>1</sup> , Stéphane Saux Picart <sup>4</sup> and Heather Bouman<sup>5</sup>

<sup>1</sup> Plymouth Marine Laboratory, Plymouth, United Kingdom, <sup>2</sup> Plymouth Marine Laboratory, National Centre for Earth Observation, Plymouth, United Kingdom, <sup>3</sup> Bedford Institute of Oceanography, Dartmouth, NS, Canada, <sup>4</sup> Centre de Météorologie Spatiale, Météo-France, Lannion, France, <sup>5</sup> Department of Earth Sciences, Oxford University, Oxford, United Kingdom

An equation is derived to express the sensitivity of daily, watercolumn production by phytoplankton in the ocean to variations in irradiance at the sea surface. Assuming no spectral effects, and a vertically uniform chlorophyll profile, the sensitivity is a function only of the dimensionless irradiance. Spectral effects can be accounted for as a function of the chlorophyll concentration. At the global scale, the relative reduction in daily production consequent on halving the surface irradiance (representing the expected scope for variation in surface irradiance under natural conditions) is found to be from 30 to 40%. Choice of data source for irradiance may incur a further systematic error of up to 15%. Given that local irradiance (the principal forcing for primary production) may vary from day to day, the issue of how to archive production data for the most generality is discussed and recommendations made in this regard.

### Edited by:

Chris Bowler, École Normale Supérieure, France

### Reviewed by:

Oliver Zielinski, University of Oldenburg, Germany Thomas Lacour, UMI3376 TAKUVIK, Canada

> \*Correspondence: Shubha Sathyendranath ssat@pml.ac.uk

### Specialty section:

This article was submitted to Ocean Observation, a section of the journal Frontiers in Marine Science

Received: 21 March 2017 Accepted: 16 November 2017 Published: 05 December 2017

### Citation:

Platt T, Sathyendranath S, White GN III, Jackson T, Saux Picart S and Bouman H (2017) Primary Production: Sensitivity to Surface Irradiance and Implications for Archiving Data. Front. Mar. Sci. 4:387. doi: 10.3389/fmars.2017.00387 Keywords: primary production, data archiving, sensitivity analysis, irradiance, photophysiology, photosynthesis measurements

## 1. INTRODUCTION

Conventionally, there are two approaches to the measurement of phytoplankton production at sea, both of which require incubation of phytoplankton samples for a finite time (**Table 1**). One is the so-called in situ method (or its variant, the simulated in situ method) (Lohrenz et al., 1992; Lohrenz, 1993). The object here is to produce data representing the vertical distribution of phytoplankton production through the photic zone. The irradiance that drives the photosynthesis is solar irradiance, attenuated by the sea itself (or by other filters in the case of the simulated in situ method). The estimated vertical profile of phytoplankton photosynthesis can be integrated over depth to calculate production in the water column during the period of the incubation. If the duration of incubation is less than that of the light day, the data may be extrapolated to obtain daily water-column production. The result expresses the daily, autotrophic carbon flux under unit area of sea surface at the place and time from which the sample was drawn, under all prevailing conditions, physical and biological.

The other method is through construction of photosynthesis–light curves. Here, the samples are incubated in artificial light at a sufficient number of irradiance levels that the curve of photosynthetic response can be established, fitted to a standard equation, and the parameters (minimum two for a normal range of irradiances) extracted (Platt and Jassby, 1976).



These parameters index the photosynthetic performance of the phytoplankton present at the time and place from which the sample was drawn. The results can be applied in mathematical models to calculate daily watercolumn production (Platt et al., 1977; Sathyendranath and Platt, 1989; Sathyendranath et al., 1989; Morel, 1991; Morel et al., 1996). This serves the same purpose as in situ measurements of primary production, but calculations using the photosynthesis parameters can be tailored for any reasonable time interval, to suit any application. Light-dependent models of primary production are used in remote sensing (Platt and Sathyendranath, 1988; Mélin and Hoepffner, 2004, 2011) and in large-scale simulation models of the marine ecosystem (Laufkötter et al., 2015). The photosynthesis parameters are fundamental bio-optical properties of phytoplankton and have many other applications beyond estimation of water-column production, for example in the calculation of chlorophyll-to-carbon ratio in phytoplankton (Jackson et al., 2017).

The photosynthesis parameters may therefore be considered to contain more information than estimates of in situ production (which they subsume through application of a mathematical model). In the planning of present-day oceanographic expeditions on which phytoplankton production estimates are required, a choice is usually made in favor of the photosynthesis–parameter method (which also imposes fewer restrictions on the movement of the ship than does the in situ method).

The question then arises: How should these data be archived? In the case of in situ production estimates, the picture is clear, except possibly for the time scale involved. The data represent the photosynthetic carbon flux per unit area of the sea surface for the duration of the incubation, perhaps extrapolated to a time scale of 1 day, and can be archived as such. For the (preferred) photosynthesis–response method, the parameters can be archived but they do not in themselves constitute an estimate of phytoplankton production. If they are to be applied to the calculation of daily water-column production, what should be taken as the forcing irradiance? Does it matter? Here, we analyse the sensitivity of daily, water-column production to variable surface irradiance, and discuss the archiving of data on primary production, for example as used in climate studies.

## 2. THEORETICAL BACKGROUND: NON-SPECTRAL-LIGHT, UNIFORM-BIOMASS CASE

The photosynthesis–irradiance curve relates phytoplankton photosynthesis P to irradiance I. It is convenient, and more general, to work with the normalized photosynthetic rate P B , where the superscript B indicates normalization to the chlorophyll biomass. Thus,

$$P^B(I) = p^B(I; \text{parameters}),\tag{1}$$

where p B is a function to be specified and where various choices are available for the parameter sets. We have shown that, regardless of the parameter set chosen and for any plausible choice of p B , the various manifestations of Equation (1) can all be recast into a single common form (Platt and Sathyendranath, 1993). The discussion to follow is thus robust against alternative choices of p B and the parameters. Therefore, with no loss of generality, we shall use as parameters the assimilation number P B m and the initial slope α <sup>B</sup> of the photosynthesis–irradiance curve. Then,

$$P^B(I) = p^B(I; \alpha^B, P\_m^B). \tag{2}$$

In the sea, irradiance is a function of depth z and of time of day t, such that we can write

$$P^B(I) = p^B(I(z,t); \alpha^B, P\_m^B). \tag{3}$$

The desired result is the double integral of Equation (3) over time and depth, which is the daily watercolumn production PZ,T.

$$P\_{Z,T} = B \int\_0^D \int\_0^\infty \rho^B(I(z,t); \alpha^B, P\_m^B) \, \mathrm{d}z \, \mathrm{d}t,\tag{4}$$

where D is the length of day in hours from sunrise and where it is assumed that there is no production outside the interval 0 ≤ t ≤ D. The factor B is taken outside the integral signs because we assume that the chlorophyll biomass is independent of depth. The choice of infinity as the upper limit on the integral over z avoids any ambiguity over depth of the photic zone, without invoking any simplifying assumption. Contributions to the integral from all depths below the photic zone are negligible for all practical purposes.

Under clear-sky conditions, the irradiance at the sea surface I(0, t) can be calculated for any latitude and date according to standard astronomical functions (Bird, 1984; Bird and Riordan, 1986; Sathyendranath and Platt, 1988; Platt et al., 1990). Analytically, the time course of clear-sky irradiance I(t) can be described by a function with two parameters (D and the irradiance at local noon I m 0 ). Thus, I(0, t) = I m 0 g(t; D), where g(t) is a function to be specified. The effect of clouds may be represented through a reduction in the magnitude of I m 0 according to the proportion of the sky covered by clouds.

Given observations of α B and P B <sup>m</sup> (and a choice of p B ) we can evaluate the double integral of Equation (4) for any station and time. The result will depend upon the magnitude of I m 0 . For archival purposes, is it appropriate to use the clear-sky value of I m 0 (which would return the maximum possible value of production under the prevailing conditions at that time and place)? Or is it more appropriate to use the value of I m 0 corresponding to the sky conditions at the station and time (which would return the best value of production for the conditions, but which would lack any generality for other sky conditions)? Would the differences be significant?

To address these issues, we need to look at the derivative of Equation (4) with respect to irradiance. Let us first select p B (I) = P B m 1 − exp(−I/I<sup>k</sup> ) as the formulation of the photosynthesis– light curve, where the photoadaptation parameter I<sup>k</sup> = P B m/α<sup>B</sup> . The ratio I/I<sup>k</sup> is a dimensionless irradiance that we designate as I∗. The dimensionless noon irradiance is I m ∗ . We assume a vertically-homogeneous water column so that B(z) = B, and light attenuation can be characterized by a single coefficient K such that I(z) = I(0) exp(−Kz) where the sea surface is at z = 0 and z is positive downwards. Then Equation 4 may be written as

$$P\_{Z,T} = B P\_m^B \int\_0^D \int\_0^\infty \left(1 - \exp\left(-I\_\*^m \mathcal{g}(t) \exp(-Kz)\right)\right) \mathrm{d}z \,\mathrm{d}t. \tag{5}$$

With a change of variable x = I m ∗ g(t) exp(−Kz), we have

$$P\_{\mathbf{Z},T} = \frac{BP\_m^B}{K} \int\_0^D \int\_0^{I\_\mathbf{\tilde{z}}^m \mathbf{g}(t)} \left(\frac{1 - \exp(-\mathbf{x})}{\mathbf{x}}\right) \,\mathrm{d}\mathbf{x} \,\mathrm{d}t.\tag{6}$$

The inner integral (on x) is a standard form, the entire exponential integral Ein I m ∗ g(t) , so that

$$P\_{Z,T} = \frac{BP\_m^B}{K} \int\_0^D \operatorname{Ein} \left( I\_\*^m \mathbf{g}(t) \right) \,\mathrm{d}t. \tag{7}$$

In the specification of the surface irradiance, the dimensionless irradiance at noon I m ∗ can be considered as a scale factor, with the function g(t) describing the time course of variation through the day. Our immediate interest is on the derivative of PZ,<sup>T</sup> with respect to this scale factor I m ∗ ,

$$\frac{\mathrm{d}P\_{Z,T}}{\mathrm{d}I\_{\ast}^{m}} = \frac{\mathrm{B}P\_{m}^{B}}{K} \int\_{0}^{D} \frac{\mathrm{d}}{\mathrm{d}I\_{\ast}^{m}} \mathrm{Ein}\left(I\_{\ast}^{m}\mathcal{g}(t)\right)\mathrm{d}t.\tag{8}$$

By virtue of the definition of the function Ein (.) as an integral, the differentiation returns the integrand of the definition:

$$\frac{\mathrm{d}P\_{Z,T}}{\mathrm{d}I\_{\ast}^{m}} = \frac{\mathrm{B}P\_{m}^{B}}{K} \int\_{0}^{D} \mathrm{g}(t) \left( \frac{1 - \exp(-I\_{\ast}^{m}\mathrm{g}(t))}{I\_{\ast}^{m}\mathrm{g}(t)} \right) \mathrm{d}t. \tag{9}$$

The functions g(t) in numerator and denominator cancel to give the simple result

$$\frac{\mathrm{d}P\_{Z,T}}{\mathrm{d}I\_{\ast}^{m}} = \frac{\mathrm{B}P\_{m}^{B}}{K I\_{\ast}^{m}} \int\_{0}^{D} \left(1 - \exp\left(-I\_{\ast}^{m}\mathbf{g}(t)\right)\right) \mathrm{d}t.\tag{10}$$

The choice of function g(t) = 1 − (1 − 2t/D) 2 gives a good representation of the time course of surface irradiance through the day. With this choice, and with the substitution θ = πt/D, Equation 10 becomes

$$\frac{\mathrm{d}P\_{Z,T}}{\mathrm{d}I\_{\ast}^{m}} = \frac{\mathrm{B}P\_{m}^{B}}{K I\_{\ast}^{m}} \frac{\mathrm{D}}{\pi} \int\_{0}^{\pi} \left(1 - \exp\left(-I\_{\ast}^{m}[1 - (1 - 2\theta/\pi)^{2}]\right)\right) \mathrm{d}\theta. \tag{11}$$

Given the symmetry of g(θ) about θ = π/2, we can express Equation (11) as

$$\frac{\mathrm{d}P\_{Z,T}}{\mathrm{d}I\_{\ast}^{m}} = \frac{\mathrm{B}P\_{m}^{\mathrm{B}}}{\mathrm{K}I\_{\ast}^{m}} \frac{D}{\pi} \Big\{\pi - 2 \int\_{0}^{\pi/2} \exp\left(-I\_{\ast}^{m} \left[1 - \left(1 - 2\theta/\pi\right)^{2}\right]\right) \mathrm{d}\theta\Big\}.\tag{12}$$

At this point it is convenient to make the substitution s = p Im ∗ (1 − 2θ/π). Then,

$$\frac{\mathrm{d}P\_{Z,T}}{\mathrm{d}I\_{\ast}^{m}} = \frac{\mathrm{B}P\_{m}^{B}}{\mathrm{K}I\_{\ast}^{m}} \frac{D}{\pi} \left\{ \pi - \frac{\pi e^{-I\_{\ast}^{m}}}{\sqrt{I\_{\ast}^{m}}} \int\_{0}^{\sqrt{I\_{\ast}^{m}}} e^{s^{2}} \, \mathrm{ds} \right\}.\tag{13}$$

Using the definition of Dawson's integral:

$$\text{Daw}(\mathbf{x}) = e^{-\mathbf{x}^2} \int\_0^\chi e^{s^2} \, \text{ds} \tag{14}$$

with x = p Im ∗ , we see that Equation (13) can be written as

$$\frac{\mathrm{d}P\_{Z,T}}{\mathrm{d}I\_{\ast}^{m}} = \frac{\mathrm{B}P\_{m}^{\mathcal{B}}D}{\mathrm{K}I\_{\ast}^{m}} \left\{ 1 - \frac{1}{\sqrt{I\_{\ast}^{m}}} \,\mathrm{Daw}(\sqrt{I\_{\ast}^{m}}) \right\}.\tag{15}$$

This form is convenient because there exist robust numerical forms for computing Daw(x).

We want to assess the sensitivity of the daily production estimates to changes (errors) in the surface irradiance I m 0 . Noting that dI m <sup>0</sup> = I<sup>k</sup> dI m ∗ , we can rewrite Equation (15) as

$$\frac{\mathrm{d}P\_{Z,T}}{\mathrm{d}I\_0^m} = \frac{\mathrm{B}P\_m^B D}{\mathrm{K}I\_0^m} \left\{ 1 - \frac{1}{\sqrt{I\_\*^m}} \,\mathrm{Daw}(\sqrt{I\_\*^m}) \right\}.\tag{16}$$

It was shown in Platt and Sathyendranath (1993) that all solutions to Equation (4) can be written in the canonical form

$$P\_{Z,T} = A f(I\_\*^m),\tag{17}$$

where A = BP<sup>B</sup> <sup>m</sup>D/K is a scale factor with dimensions of daily production, and f(I m ∗ ) is a (known) function of the dimensionless irradiance whose particular form depends on the choice of the photosynthesis-light curve. Dividing both sides of Equation (16) by Af(I m ∗ ), we obtain

$$\mathcal{E}\left(\frac{\mathrm{d}P\_{\mathrm{Z},T}}{P\_{\mathrm{Z},T}}\right)\bigg/\left(\frac{\mathrm{d}I\_{0}^{m}}{I\_{0}^{m}}\right) = \frac{1}{f(I\_{\mathrm{\*}}^{m})}\left\{1 - \frac{1}{\sqrt{I\_{\mathrm{\*}}^{m}}}\mathrm{Daw}(\sqrt{I\_{\mathrm{\*}}^{m}})\right\}.\tag{18}$$

Here, we can see that the relative change in daily primary production for a given relative change in surface irradiance depends only on a function of the scaled irradiance.

### 3. ESTIMATING DAILY, WATER-COLUMN PRODUCTION

In the previous section, we found an expression for the sensitivity of primary production to variations in surface irradiance, of an idealized water column chosen for mathematical tractability. We next consider the detailed calculation of primary production in operational mode and the sensitivity of the results to changes in the surface irradiance.

Various procedures exist for estimating daily, water-column production by phytoplankton, given information on the two

to changes in dimensionless irradiance at noon (I ∗ ), spectral results by numerical integration for different chlorophyll concentrations. Non-spectral result also shown, for comparison.

photosynthesis parameters P B <sup>m</sup> and α B (Platt et al., 1977, 1990, 2008; Platt and Sathyendranath, 1988, 1993). All require an estimate of surface irradiance. Briefly, the differences among them depend mainly on whether a depth-independent biomass profile is assumed and on whether the (known) wavelength dependence of light penetration and photosynthetic response is included or suppressed. A depth-independent, spectrally-neutral treatment provides a frame of reference (Sathyendranath et al., 1989; Platt and Sathyendranath, 1991); it has the canonical solution given in Equation (17).

At the other extreme, a wavelength-dependent and non-uniform biomass treatment has no analytic solution (although Equation 17 remains a reliable guide) and must be integrated numerically (Sathyendranath et al., 1989; Platt and Sathyendranath, 1991). This is the preferred calculation, and its reliability has been demonstrated (Platt and Sathyendranath, 1988). Applications of the various models have been compared (Platt et al., 1991; Kyewalyanga et al., 1992). In the most detailed models, the angular distribution of the irradiance is included (Platt and Sathyendranath, 1988; Sathyendranath and Platt, 1989; Sathyendranath et al., 1989), the light field being separated into its direct and diffuse components. The direct component is

affected much more strongly by the presence of clouds, and the diffuse component (which may be one half of the total) will still be available for photosynthesis even under 100% cloud cover. For this reason, separation of the forcing irradiance into direct and diffuse components is a key step in addressing the sensitivity of water column production to variations in cloud cover.

For the calculations presented here, we use a spectraland angular-distribution-resolving primary-production model (Platt and Sathyendranath, 1988; Longhurst et al., 1995; Platt et al., 1995; Sathyendranath et al., 1995), forced by remotelysensed chlorophyll and light data, and information derived from ship-based in situ measurements on phytoplankton physiology and vertical biomass profile parameters. These model parameters were organized according to season and ecological province, as in Longhurst et al. (1995). Photophysiological parameter (P m B and α B ) values for defined biogeochemical provinces were taken from the work of Mélin and Hoepffner (2004). The vertical structure of the phytoplankton biomass profile was described by three parameters: the depth of maximum chlorophyll concentration (Zm), the thickness of the subsurface peak in chlorophyll concentration (σ) and the ratio of the peak chlorophyll concentration to the background chlorophyll concentration (ρ), also provided on a provincial basis following Mélin and Hoepffner (2011). To provide a smooth transition of parameter estimates across province boundaries, a smoothing filter was applied to the province values of Mélin and Hoepffner (2004) where values at a given pixel were averaged from a 30 × 30 pixel box, with a pixel size of 9 km, centered on the pixel of interest.

The chlorophyll profile parameters were used in conjunction with the remotely-sensed chlorophyll data from the OC-CCI v1.0 products (Sathyendranath et al., 2016), to create chlorophyll profiles for each 9 km pixel. Average sea-surface irradiance (Photosynthetically-Active Radiation) for May 2011 was taken from three data sources. The first PAR data source was the NASA MODIS PAR product (OBPG, 2014b); the second data source was the NASA MERIS PAR product (OBPG, 2014a); the third source was a new ESA processing of MERIS as documented in the Algorithm Theoretical Baseline Document (ATBD) for the PAR and Primary Production (PPP) SEOM project.

In each of the three cases, the remotely-sensed PAR was used to scale the results of a spectral clear-sky model to provide spectrally-resolved irradiance for input to the primary production model. The propagation of spectral light to various depths in the water column accounted for attenuation by water, phytoplankton and other colored and scattering substances, assuming all optical properties were related to chlorophyll concentration (open-ocean conditions). The profile of light was then combined with the vertical profile of chlorophyll and photosynthetic parameters to obtain estimates of depth-resolved primary production. The calculations were repeated for 12 time steps during half the day length. The results were then integrated over time and depth to yield total primary production per day and per unit area.

# 4. SENSITIVITY OF PZ,<sup>T</sup> TO VARIATIONS IN SURFACE IRRADIANCE

For the non-spectral approximation, with vertically-uniform biomass profile, the sensitivity of daily, water-column production is given by Equation (18), shown in **Figure 1**. The spectral analog, calculated by numerical integration for the case of uniform biomass, is shown in **Figure 2**. The spectral results lie in the same range as that for the non-spectral case, varying slightly according to the chlorophyll concentration. The differences are greater for higher values of chlorophyll and higher values of normalized irradiance. We see that, in both spectral and non-spectral calculations, sensitivity is high and approaches one (corresponding to the case when relative change in primary production is the same as the relative change in surface irradiance) when the dimensionless noon-time irradiance approaches zero. As the scaled noon-time irradiance increases, the sensitivity decreases, reaching values less than 0.5 for I m ∗ greater than 10. For the spectral model, there is an additional dependence of the answer on chlorophyll concentration: the sensitivity is less than that of the non-spectral case at low I m ∗ , but the opposite holds for high I m ∗ values. The additional dependence of the results on chlorophyll concentration arises from the effect of phytoplankton absorption on the spectral quality of the underwater light field, which is not taken into account in the nonspectral model. When phytoplankton absorption increases (highchlorophyll conditions), the water turns progressively green, and less suitable for phytoplankton absorption, introducing a decrease in light available for photosynthesis. The effect on primary production is equivalent to decreasing I m ∗ in a non-spectral model. For chlorophyll concentration less than 1 mg m−<sup>3</sup> , which would be typical of most open-ocean waters, the spectral and non-spectral models yield results that are quite close to each other, such that the analytical solution may be taken to be a reasonable guide to realistic open-ocean conditions.

In global-scale computations, in addition to the spectral effects, we also have to explore the effect of vertical structure in chlorophyll concentration. We have estimated the globalscale effect of variations in surface irradiance as follows. We selected the month of May, 2011 to illustrate the results. First, we calculated primary production using a detailed, spectral model integrated numerically (**Figure 3C**), forced by the typical irradiance at each pixel for the month concerned (**Figure 3A**), using a spectral model, and with the photosynthesis parameters and chlorophyll profile parameters assigned according to provinces and season, as noted earlier. Then, we repeated the calculation (**Figure 3B**), but with irradiance only one half of that in the previous calculation (**Figure 3D**). The difference between values of PZ,<sup>T</sup> calculated for the two irradiances is a measure of the sensitivity of the daily, water-column production to changes in surface light (**Figure 3E**) and the relative sensitivity (change in production normalized to change in surface irradiance) is shown in **Figure 3F**.

The results, which vary with region (especially with latitude) are not symmetrical about the equator: they represent Spring in the northern hemisphere, but Winter in the southern hemisphere, with corresponding changes to the surface light field (**Figure 3A**). The results also reflect the regional assignment of photosynthesis parameters. The relative reduction in daily, water-column production lies generally in the range from 30 to 40%, consequent on halving the surface irradiance (**Figure 3E**).The sensitivity of daily production to changes in surface irradiance is typically in the range from 60 to 80% (**Figures 3F**, **4**), showing that the relative change in primary production is almost always less than the relative change in incoming solar radiation. The relative effect is stronger in areas with low surface irradiance, consistent with the diagnosis of the analytical solution.

Systematic errors may be introduced into the calculation of primary production through the choice of the surface field of photosynthetically-active radiation (PAR). We have examined this possibility by comparing the results arising from use of

reduction in surface irradiance on all sea pixels in Figure 3E.

MERIS (**Figures 5A,C**) and MODIS (**Figure 5E**) fields of PAR (**Figures 5B,D,F**). We find that the differences in surface PAR between the two data streams are usually in the range from 5 to 20% (**Figure 5E**), implying a potential systematic error in primary production of some 15% or less arising from the choice of PAR field. Though these differences may be small, they are systematic; their effect could be significant in studies associated with climate change: when results from multiple sensors are merged to create a long time series, differences between PAR data from individual sensors may lead to an erroneous conclusion on trends in primary production, which is to be avoided.

### 5. DISCUSSION

In this paper, we have examined the sensitivity of modeled primary production to changes in the solar radiation available at the sea surface. Using an analytical solution derived for a non-spectral model assuming uniform chlorophyll concentration in the water column, using a spectral model with uniform chlorophyll concentration, and finally using a spectral model with non-uniform chlorophyll profile, we have demonstrated that the relative changes in primary production are likely to be always less than the relative change in surface irradiance. The effect of surface light on primary production decreases as the scaled surface irradiance increases. In the non-spectral model, the only parameter that affects the sensitivity is the photoadaptation parameter I<sup>k</sup> . In the spectral model, the chlorophyll concentration has an additional modulating effect on the sensitivity.

In the light of these results, how might we enhance the management of primary production data? Given an archived set of photosynthesis-light parameters what is the most useful information that could be archived about the corresponding phytoplankton production? We suggest the following: First, calculate the phytoplankton production under the climatological value of the clear sky irradiance for the day number and latitude concerned. This represents the maximum phytoplankton production that could be achieved with the given photosynthesis

FIGURE 5 | Systematic differences in surface PAR arising from choice of data stream, exemplified by difference between MERIS and MODIS streams for May 2011. (A) PAR field from MERIS using ESA protocol (test product); (C) PAR field from MERIS using NASA product; (E) PAR field from MODIS Aqua using NASA product; the differences between the products are shown in (B,D,F).

parameters on that day of the year at that latitude. Next, calculate the production under 100% cloud cover, representing the minimum production that could be achieved under the prevailing conditions. These two calculations would establish the range of possible values of phytoplankton production expected at the relevant time and place under most of the possible values of surface irradiance. In fact, because the surface irradiance has both diffuse and direct components, 100% cloud cover will correspond to a reduction in I m 0 of nominally 50% (loss of direct component); all light reaching the sea surface will be in the diffuse component. The effect on daily production of halving the surface irradiance varies with region and season. Taking 50% as a rough estimate of the scope for variation in surface irradiance at a given time and place, we have shown that the implied relative variation in daily, water column is roughly 30–40%. The relative sensitivity, as given by Equation (18), will be independent of the magnitude chosen for the reduction in surface irradiance. A further systematic difference of about 15% may arise from the choice of data source for the surface irradiance field, indicating the value of improving the satellite-derived estimates of PAR.

The most useful particular value of phytoplankton production would be that at the climatologically-averaged value of surface irradiance at the time and place concerned. This average would take into account the effect of latitude on day length and also the local effect of cloud cover. As an example, for the Northwest Atlantic Ocean, we have developed from the SeaWiFS irradiance an empirical function that yields the climatological total daily irradiance as a function of latitude and day number (Platt et al., 2009). If longitudinal differences in cloud cover are to be taken into account, one could use the climatological data on cloud cover to estimate the relevant irradiance.

Paradoxically, the estimate of phytoplankton production made with the irradiance measured at the same time as the parameters themselves were measured (if available) is often the least useful (lowest generality) of all the estimates. Of course, the matter depends on the question being addressed. If the goal is to close a local carbon budget or a local energy budget over a short time period including the day of measurement, then the estimate of phytoplankton production made with the actual irradiance for that particular day would be the one to choose.

However, we expect that most applications of the archived data would be more general than this. In such cases, we would seek estimates of phytoplankton production that were suitably general, and the options we have indicated above would probably be preferred.

The light field in the sea is the resultant of a complex and reciprocal interplay between physics and biology (**Figure 6**). The effect on daily, water-column production consequent on changes in irradiance at the sea surface depends, at least, on the processes shown in **Figure 6**. A guide to the scale of the effect (not accounting for spectral dependence) is provided by Equation (18). Spectral effects can be parameterized as a function of chlorophyll concentration (**Figure 2**).

# AUTHOR CONTRIBUTIONS

TP, SS, and HB originated the work and organized assembly of the global data bases. GW and TP developed the theoretical background. TJ and SSP made most of the calculations. TP wrote the first draft of the paper. All authors contributed to revision of the paper.

# FUNDING

Funding for this work was provided through the European Space Agency (ESA), contracts 4000113690/15/I-LG and 4000110968/14/I-BG; and through the National Centre for Earth Observation (NCEO), grant PR140015. Additional support from the Simons Foundation CBIOMES project is acknowledged.

# ACKNOWLEDGMENTS

This work is a contribution to the "PAR for Primary Production" Project and the "Photosynthesis Parameters from Space" Project of the European Space Agency. Support from the Jawaharlal Nehru Science Fellowship to TP is gratefully acknowledged. Additional support was received from the National Centre for Earth Observation (NCEO) of the Natural Environment Research Council of UK. The authors thank the two reviewers for their helpful comments.

# REFERENCES


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Platt, Sathyendranath, White, Jackson, Saux Picart, and Bouman. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Intercomparison of Ocean Color Algorithms for Picophytoplankton Carbon in the Ocean

Víctor Martínez-Vicente<sup>1</sup> \*, Hayley Evers-King<sup>1</sup> , Shovonlal Roy <sup>2</sup> , Tihomir S. Kostadinov 3, 4 , Glen A. Tarran<sup>1</sup> , Jason R. Graff <sup>5</sup> , Robert J. W. Brewin1, 6, Giorgio Dall'Olmo1, 6 , Tom Jackson<sup>1</sup> , Anna E. Hickman<sup>7</sup> , Rüdiger Röttgers <sup>8</sup> , Hajo Krasemann<sup>8</sup> , Emilio Marañón<sup>9</sup> , Trevor Platt <sup>1</sup> and Shubha Sathyendranath1, 6

<sup>1</sup> Plymouth Marine Laboratory, Plymouth, United Kingdom, <sup>2</sup> Department of Geography and Environmental Sciences, School of Agriculture, Policy and Development, University of Reading, Reading, United Kingdom, <sup>3</sup> Department of Geography and the Environment, University of Richmond, Richmond, VA, United States, <sup>4</sup> Division of Hydrologic Sciences, Desert Research Institute, Reno, NV, United States, <sup>5</sup> Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States, <sup>6</sup> Plymouth Marine Laboratory, National Centre for Earth Observation, Plymouth, United Kingdom, <sup>7</sup> Ocean and Earth Science, National Oceanography Centre Southampton, University of Southampton, Southampton, United Kingdom, <sup>8</sup> Helmholtz-Zentrum Geesthacht, Center for Materials and Coastal Research, Geesthacht, Germany, <sup>9</sup> Departamento de Ecología y Biología Animal, Universidade de Vigo, Vigo, Spain

# Edited by:

Hervé Claustre, Centre National de la Recherche Scientifique (CNRS), France

### Reviewed by:

Emmanuel Devred, Fisheries and Oceans Canada, Canada Severine Alvain, Centre National de la Recherche Scientifique (CNRS), France

> \*Correspondence: Víctor Martínez-Vicente vmv@pml.ac.uk

### Specialty section:

This article was submitted to Ocean Observation, a section of the journal Frontiers in Marine Science

Received: 03 April 2017 Accepted: 10 November 2017 Published: 11 December 2017

### Citation:

Martínez-Vicente V, Evers-King H, Roy S, Kostadinov TS, Tarran GA, Graff JR, Brewin RJW, Dall'Olmo G, Jackson T, Hickman AE, Röttgers R, Krasemann H, Marañón E, Platt T and Sathyendranath S (2017) Intercomparison of Ocean Color Algorithms for Picophytoplankton Carbon in the Ocean. Front. Mar. Sci. 4:378. doi: 10.3389/fmars.2017.00378 The differences among phytoplankton carbon (Cphy) predictions from six ocean color algorithms are investigated by comparison with in situ estimates of phytoplankton carbon. The common satellite data used as input for the algorithms is the Ocean Color Climate Change Initiative merged product. The matching in situ data are derived from flow cytometric cell counts and per-cell carbon estimates for different types of pico-phytoplankton. This combination of satellite and in situ data provides a relatively large matching dataset (N > 500), which is independent from most of the algorithms tested and spans almost two orders of magnitude in Cphy. Results show that not a single algorithm outperforms any of the other when using all matching data. Concentrating on the oligotrophic regions (Chlorophyll-a concentration, B, less than 0.15 mg Chl m−<sup>3</sup> ), where flow cytometric analysis captures most of the phytoplankton biomass, reveals significant differences in algorithm performance. The bias ranges from −35 to +150% and unbiased root mean squared difference from 5 to 10 mg C m−<sup>3</sup> among algorithms, with chlorophyll-based algorithms performing better than the rest. The backscatteringbased algorithms produce different results at the clearest waters and these differences are discussed in terms of the different algorithms used for optical particle backscattering coefficient (bbp) retrieval.

Keywords: phytoplankton carbon, carbon-to-chlorophyll, ocean color remote sensing, picophytoplankton, flow cytometry, optical water class, algorithm uncertainty

# 1. INTRODUCTION

One of the standard products from ocean-color remote sensing is the concentration of chlorophylla (B) in the surface layers of the ocean, which is an estimation of phytoplankton abundance. This product has proven to be extremely useful for various applications (e.g., Platt and Sathyendranath, 2008). More recently, there has been a growing interest in monitoring the standing stock of phytoplankton in carbon units (CEOS, 2014), in addition to chlorophyll units. There are many reasons for this interest, which include calculation of primary production using carbon-based models (Behrenfeld et al., 2005; Westberry et al., 2008); estimating phytoplankton loss rates (Zhai et al., 2008, 2010); comparison with estimates of phytoplankton biomass in carbon units from marine ecosystem models (Dutkiewicz et al., 2015); and establishing the budget of the pools of carbon in the ocean (CEOS, 2014), their turnover rates (Casey et al., 2013), and their exchanges with the atmospheric and terrrestrial domains (CEOS, 2014). With increasing appreciation of the different roles of various phytoplankton functional types in the oceanic biogeochemical cycles (Le Quéré et al., 2005), there is a corresponding need to know the pools of carbon associated with the different phytoplankton types, rather than just the total phytoplankton carbon.

A handful of algorithms have been proposed for deriving phytoplankton carbon from satellite data. These include methods based on particle back-scattering coefficient (bbp) at a single wavelength (Behrenfeld et al., 2005; Martínez-Vicente et al., 2013); empirical relationships based on chlorophyll concentration (Sathyendranath et al., 2009; Marañón et al., 2014); and methods based on allometric considerations combined with either the spectral slope of the particle back-scattering spectrum (Kostadinov et al., 2009, 2016) or with the phytoplankton absorption characteristics (Roy et al., 2017). Of these, the method proposed by Martínez-Vicente et al. (2013) dealt with a fraction of the phytoplankton community (diameter < 20µm), whereas those of Behrenfeld et al. (2005), Sathyendranath et al. (2009), and Marañón et al. (2014) dealt with the whole phytoplankton community. The methods based on allometric structure (Kostadinov et al., 2009, 2016; Roy et al., 2017), on the other hand, have the advantage of being able to target the whole of the phytoplankton community, and partition phytoplankton carbon among any user-defined size-intervals. Comparison of these algorithms is not straightforward, because of the differences in approaches used and the products obtained. Furthermore, they have been subjected to varying degrees of validation, with differences in the number of validation points used and in their regional and seasonal coverage. Another difficulty lies with having access to in situ data in sufficient quantity and comprehensive enough for algorithm assessment.

Various methods for in situ measurements of phytoplankton carbon in the laboratory or in the field have been reviewed by Casey et al. (2013). Some of the in situ methods require a proxy measurement, which is then calibrated against phytoplankton carbon. Subsequently, the carbon concentration is inferred from measurements of the proxy, which would typically be easier to measure than the carbon concentration itself. The proxies include adenosine triphosphate (ATP) (Sinclair et al., 1979); the refractive index of phytoplankton cells (Stramski, 1999); and the forward light scatter by phytoplankton cells in a flow cytometer (Casey et al., 2013). Redalje and Laws (1981) used chlorophyll-a labeling and showed that the specific activity of carbon in chlorophylla became equivalent to that of total phytoplankton carbon in incubations of 6–12 h, and so chlorophyll-a labeling could be used to infer phytoplankton carbon and growth rates. Graff et al. (2015) used flow cytometer cell sorting (Graff et al., 2012) to measure phytoplankton carbon in sorted samples, thereby avoiding contamination of results by non-pigmented particles. An accepted approach to estimating phytoplankton carbon at sea is to use a flow-cytometer to count phytoplankton cells sorted into different types. Using laboratory-based estimates of carbon per cell and typical (or measured mean) cell diameters for those phytoplankton types, the total carbon is computed by adding the carbon contribution of each phytoplankton cell type. This is obtained by multiplying the number of cells enumerated with the flow cytometer by the carbon per cell (DuRand et al., 2001; Oubelkheir et al., 2005; Martínez-Vicente et al., 2013). Such methods have an upper limit on measured cell size, depending on how the flow-cytometer is set up (typically D < 50µm).

We present in this work a comparison of six different algorithms for estimating phytoplankton carbon from space. The algorithms have been selected as representative of all existing state-of-the-art approaches. The comparisons are based on a newly-compiled, global, flow-cytometric dataset that is used to compute the in situ picophytoplankton carbon, matched with satellite data from the same location, and for the same day. The performance of these products is explored in different optical water classes. The comparison is limited to picophytoplankton, because the flow-cytometric database dealt largely with this size class. The objective of the comparison is to learn more about the advantages and limitations of the algorithms, rather than to rank them. We expect that the results will allow a more informed use of phytoplankton carbon products from satellites, for example, when they are compared with model outputs, and serve to identify areas where improvements are needed and potential avenues for achieving them. The analysis also brings to light some of the limitations of the in situ database, and highlights areas where progress is needed, to enable better validation of satellite data.

# 2. METHODOLOGY

# 2.1. In Situ Dataset

As part of this study, more than 12,000 observations of picophytoplankton abundance have been collated from coastal and oceanic regions (**Table 1**), building upon a dataset compiled by the modeling community through MAREDAT (http://maremip.uea.ac.uk/maredat.html) (Buitenhuis et al., 2012). Additional data come from a long-term observation program, the Atlantic Meridional Transect (AMT); as well as recently available data collected independently during AMT-22 and in the Pacific (Graff et al., 2015) and from other regions in the Atlantic ocean (Taylor et al., 2013). The dataset assembled consists of cell counts (in cells per milliliter), from water samples originating between 0 and 200 m depth, and collected in the period between 1997 and 2014, to match satellite observations available. Flow cytometry analysis of the samples provides cell abundances segregated into different types of phytoplankton. At this stage, the database consisted of 12,431 sample entries. Only the picophytoplankton cells (<2µm) were available in the MAREDAT dataset, which were separated into Prochlorococcus spp., Synechococcus spp. and picoeukaryotic phytoplankton. For consistency, only information on the same phytoplankton types were extracted from the additional data sources (Zubkov et al., 1998; Tarran et al., 2001, 2006; Taylor et al., 2013;

### TABLE 1 | Summary of in situ data.


Graff et al., 2015; Tarran, 2015; Tarran and Bruun, 2015) (see **Table 1**). The carbon concentration (Cphy, in mg C m−<sup>3</sup> ) for each phytoplankton group (i) and for each sample (j), Cphy(i, j), was calculated as follows:

$$C\_{phy}(i,j) = 10^{-6} \times N(i,j)\varepsilon(i) \tag{1}$$

where N(i, j) is cell abundance (cell mL−<sup>1</sup> ) for each of the three phytoplankton types (i = Prochlorococcus spp., Synechococcus spp. or picoeukaryotic phytoplankton) at sample j; and ε(i) is cellular carbon per cell (fgC cell−<sup>1</sup> ) for each of the picophytoplankton types. The factor 10−<sup>6</sup> converts mL to m<sup>3</sup> and fgC to mgC. We used the mean ε(i) for each phytoplankton type proposed by Buitenhuis et al. (2012): 60 fgC cell−<sup>1</sup> for Prochlorococcus spp., 154 fgC cell−<sup>1</sup> for Synechococcus spp. and 1319 fgC cell−<sup>1</sup> for picoeukaryotic phytoplankton. These values of ε(i) are comparable to values from the Bermuda Atlantic Timeseries Study (BATS) (Casey et al., 2013), for Prochlorococcus spp. and Synechococcus spp., whereas picoeukaryotic phytoplankton ε(i) values are lower than in BATS. The total picophytoplankton carbon concentration per sample j, i.e., Cphy(j) is the sum of the contributions from each picophytoplankton type (i.e., Cphy(i, j)), and will be hereafter referred to as Cphy at a given location and depth.

The choice of phytoplankton types included in this computation, as well as the parameters used for the conversion to carbon, matches the modeling community approach as represented in Buitenhuis et al. (2012). The choice of phytoplankton types is such that phytoplankton types with diameter >2µm are not taken into account. Furthermore, the choice of a mean carbon concentration per cell for each phytoplankton type does not permit accounting for any variations in size or cellular carbon spatially or temporally for each type of phytoplankton. To test our choice of carbon conversion parameters we compared direct measurements of Cphy with estimates computed using the conversion factors above. In samples from the AMT-22 (N = 15) (Graff et al., 2015), the slope of the regression between direct measurements of Cphy and computed Cphy, was 0.8 (r <sup>2</sup> = 0.6, p < 0.05). According to this result, the estimates of picoplankton Cphy in our dataset are significantly correlated with direct estimates of phytoplankton carbon, and could be an overestimate of direct observations of Cphy, which include nanophytoplankton, although a larger sample is required to support this conclusion.

### 2.2. In Situ and Satellite Match-up Selection

The in situ database described above was matched with merged ocean-color satellite data from the Ocean Color Climate Change Initiative (OC-CCI) (Sathyendranath et al., 2012). These merged products were used to maximize the possibility of finding matching in situ data as well as to use a set of common inputs to the different algorithms. The OC-CCI version 2 data had a daily sinusoidal projection (binned) and a 4 km spatial resolution. These satellite data were used as inputs for Cphy algorithms: total B from OC4v6, bbp from the Quasi-Analytical Algorithm (QAA) v5 (a modification to v4 in Lee et al., 2002, 2007, but that does not include Raman scattering, Westberry et al., 2013; Lee and Huot, 2014). Second, the water class membership (Moore et al., 2001, 2009; Jackson et al., 2017).

The procedure for match-up selection was the same as that used for particulate organic carbon (POC) data (Evers-King et al., 2017). The day of year the in situ sample was collected was matched with the same day of year from the merged satellite products. Then all relevant data were extracted from a 3 × 3 pixel set with the sample location at the center. The number of valid data, within the 3 × 3 grid, and mean and standard deviation of the valid points were recorded for each computed Cphy product. The 3 × 3 grid was used to identify where sufficient satellite data were available. In this dataset only 11 matched points had 3 valid pixels or less. The Cphy algorithms were applied to the central pixel of the satellite matched up data. The match-up process reduced the sample size considerably. Further reduction came from depth-averaging (between 0 and 10 m) the Cphy profiles that matched the satellite data, and ignoring the deeper samples, leaving 647 data points. Finally, to remove outliers, the top and bottom 2 percentile were removed from the dataset, leaving N = 557 for the analysis. The geographical distribution of matchup database for the picophytoplankton carbon concentration, Cphy, is given in **Figure 1**. The match-up dataset usable for the algorithm comparison was only about 5% of the inital data (**Table 1**). It is worth emphasizing that the match-up data set has not been used for the calibration or development of most of the algorithms compared (see section 4).

## 2.3. Ocean-Color Phytoplankton Carbon Algorithms

The following section describes the six algorithms compared in this exercise. All the phytoplankton algorithms were implemented using as input data the appropriate OC-CCI product for consistency and to isolate the effects of the different algorithms. **Table 2** provides a comparison of the input data and the phytoplankton size range that is included in the

mg C m−<sup>3</sup> .

TABLE 2 | Summary of Cphy algorithms main characteristics and median values predicted for the in situ match-up database (N = 557).


Chlorophyll concentration, B, in mg Chla m−<sup>3</sup> ; optical particulate backscattering coefficient, bbp, in m−<sup>1</sup> ; phytoplankton absorption coefficients aphy , in m−<sup>1</sup> ; pico phytoplankton carbon concentration, Cphy , in mg C m−<sup>3</sup> .

outputs of each algorithm. These are important characteristics of the algorithms, required for the interpretation of the results. For phytoplankton carbon, Cphy in mg C m−<sup>3</sup> , six products were derived and they are briefly described in this section. According to their common characteristics, they can be grouped into chlorophyll-based, backscattering-based and allometric algorithms.

### 2.3.1. Chlorophyll-Based Algorithms

This family of algorithms use chlorophyll concentration as an input, B with units of mg Chla m−<sup>3</sup> . Chlorophyll in this study is obtained from OC-CCI merged dataset with the algorithm OC4v6, which is a band switching algorithm, mainly a fourth-order polynomial relationship between remote sensing reflectance in the blue and green bands. The two algorithms in this group use the same input and have a similar formulation, however, the assumptions made in their construction and hence their definition of Cphy are different. Algorithm A (Sathyendranath et al., 2009) was developed from an empirical relationship between in situ measurements of total particulate carbon and B. For this model, Cphy in Equation (2) below is an upper bound on the total phytoplankton carbon:

$$C\_{\text{phy}} = 65 \times B^{0.63}.\tag{2}$$

Algorithm B (Marañón et al., 2014) was also developed from an empirical relationship using in situ measurements of B, and not originally designed as an algorithm for ocean color applications. However, the estimates of total phytoplankton carbon originated from applying a conversion factor to microscope (counting cells with diameter, D > 5µm) and flow cytometry (D < 10µm) phytoplankton cell counts. This model is formulated in Equation (3) as:

$$C\_{\rm phys} = 62 \times B^{0.89}.\tag{3}$$

Because of the difference in their definition of Cphy, Algorithm A and Algorithm B have been considered separately in our analysis. A priori, the expectation is that both chlorophyllbased algorithms, using total chlorophyll concentration as input data, will overestimate the picophytoplankton carbon from our in situ match-up dataset, since they are both designed to calculate total phytoplankton carbon, rather than just the picophytoplankton in our dataset. Further, it is also worth noting that the conversion factors used to compute phytoplankton carbon from cell abundance in Algorithm B are different to the ones used in our in situ match-up dataset.

### 2.3.2. Backscattering-Based Algorithms

Some semi-empirical algorithms use the (wavelength dependent) optical particulate backscattering coefficient, bbp in m−<sup>1</sup> , to estimate Cphy. The backscattering coefficient in this study is obtained from the OC-CCI merged dataset by applying the algorithm by the Quasi-Analytical Algorithm (QAA) v5 (a modification to v4 in Lee et al., 2002, 2007). In essence, the QAA first computes bbp (555), from combining remote sensing reflectance at 555 nm with an empirical relationship between remote sensing reflectance ratios and the total absorption coefficient and the backscattering of pure seawater (modeled). Then, to propagate the bbp(555) at other wavelengths, the algorithm uses a band ratio (again blue to green bands) to compute the backscattering spectral slope. The same QAA bbp product is used for both backscattering based Cphy algorithms, but at different wavelengths. Algorithm C (Behrenfeld et al., 2005) uses bbp(443) as an input:

$$C\_{ply} = 13000 \times (b\_{bp}(443) - 0.00035). \tag{4}$$

As this algorithm was developed from MODIS-Aqua (Moderate Resolution Imaging Spectroradiometer) ocean-color data, and 443 nm is a native OC-CCI band, no spectral adjustment is therefore needed. However, it is worth noting that the algorithm was developed originally using the GSM algorithm (Garver and Siegel, 1997; Maritorena et al., 2002; Siegel et al., 2002), but in this test, the bbp input come from the QAA algorithm. The Cphy derived with this algorithm includes all the phytoplankton size ranges. Algorithm D (Martínez-Vicente et al., 2013) is another semi-empirical algorithm, developed from the relationship between in situ flow cytometry-based carbon and bbp(470), but is included in the comparison with some changes. The first modification was to re-compute the coefficients in the original equation by using the same computation of picoplankton as the one used in this work, which meant ignoring the nanoeukaryotes, cryptophytes and coccolithophorids contributions to the picoplankton carbon and use the same carbon to cell conversion factors as in this study (i.e., those of MAREDAT; Buitenhuis et al., 2013). This recalculation led to lower (pico)phytoplankton carbon estimates which were, on average, 27% less than the published values of phytoplankton carbon (from pico- and nano-plankton) used in Martínez-Vicente et al. (2013). When the new picophytoplankton Cphy estimate was used with the original in situ bbp data, the resulting fit was:

$$C\_{ply} = 18000 \times (b\_{bp} (470) - 43 \ast 10^{-5}), N = 70. \tag{5}$$

This equation explains considerably less variance in the observed data (r <sup>2</sup> = 0.4) than the r <sup>2</sup> of 0.89 reported in the original work. However, it makes the definition of Cphy by this model directly comparable to the in situ data. The second modification was to adjust the backscattering coefficient wavelength from the available value, 490 to 470 nm. To do so, the spectral slope of the bbp from the OC-CCI data was obtained by doing an ordinary least squares fit to the log<sup>10</sup> transformed data and calculated the new bbp(470) needed for Equation (5).

### 2.3.3. Allometric Type Algorithms

These algorithms belong to a family of algorithms that use optical properties to compute phytoplankton size structure and then convert it into biomass (Mouw et al., 2017). Algorithm E (Kostadinov et al., 2016) retrieves the absolute and fractional phytoplankton carbon biomass in three phytoplankton size classes (or, approximately equivalent − phytoplankton functional types) − picophytoplankton (0.5–2µm in diameter), nanophytoplankton (2–20µm) and microphytoplankton (20–50µm). The algorithm uses retrievals of the particle size distribution (PSD) to estimate particle volume. Note that the PSD is estimated for all particles in suspension in the water. Particle volume is then converted to carbon concentrations using a compilation of existing allometric relationships between size and carbon content of phytoplankton cells (Menden-Deuer and Lessard, 2000). Derived carbon concentration is then divided by 3 to estimate the living phytoplankton carbon fraction. The PSD retrievals themselves are based on a PSD algorithm (Kostadinov et al., 2009), which relates the spectral slope and magnitude of the backscattering coefficient spectrum to the underlying parameters of an assumed power-law PSD, via look-up tables (LUTs) constructed using Mie theory modeling. In the implementation used here, the input backscattering spectrum comes from the standard QAA products of the OC-CCI dataset, which are derived using Lee et al. (2002) algorithm, as summarized above. This is different from the original implementations (Kostadinov et al., 2009, 2016), where the Loisel and Stramski (2000) algorithm was used to retrieve spectral bbp. The PSD parameters retrieved are the power-law slope (ξ ) and the scaling parameter (i.e., differential particle number concentration at a reference diameter of 2 µm, No, [m−<sup>4</sup> ]). Kostadinov et al. (2016) applied an empirical correction to the PSD scaling parameter N<sup>o</sup> obtained from the model LUT, to improve absolute phytoplankton carbon concentration estimates.

A further allometry-based method, Algorithm F (Roy et al., 2017), uses chlorophyll concentration and the absorption coefficient of phytoplankton at 676 nm, aph(676), to compute phytoplankton carbon. In this algorithm, the exponent of the phytoplankton size spectrum (ξ ) is first computed from the specific-absorption coefficient of phytoplankton at 676 nm, a ∗ ph(676) using a method developed by Roy et al. (2013). This algorithm uses as input B from OC4v6 and aph(676) from QAA, both standard products in the OC-CCI dataset. The estimated exponent of the size spectrum ξ and the allometric relationships between the cellular content of phytoplankton carbon (Ccell) and cell volume (Vcell) reported by Menden-Deuer and Lessard (2000) are then used to compute the concentration of phytoplankton carbon (Ctotal, in mg C m−<sup>3</sup> ) contained in the cells within any specified diameter range. To do so, the allometric parameters corresponding to the mixed populations of phytoplankton are derived from the allometric relationships found for individual groups of phytoplankton, as reported in Menden-Deuer and Lessard (2000), by performing linear regression. The derived allometric relationship is used then to compute the magnitude of the carbon-to-chlorophyll ratio (χ), using the derived allometric expressions for the concentrations of chlorophyll and phytoplankton carbon, Ctotal. Finally, phytoplankton carbon for the specified size range is computed as the product of χ and satellitederived chlorophyll concentration (for more details see Roy et al., 2013, 2017). The allometry-based algorithms E and F were used to compute picophytoplankton concentration within a diameter range 0.2–2.0 µm, which is directly comparable with the size range included in the matching in situ database.

In summary, each method is thus based on a satellite measurement that provides underlying variability in the resulting Cphy (reflectance ratio or analytically-derived backscatter) which is then combined with selected empirical relationships that scale those measurements to Cphy (using either linear or non-linear relationships and sometimes including more than one step, such as via B). The strength of this study lies in the use of the OC-CCI satellite dataset as a common source of inputs for all the algorithms, which removes sources of uncertainty from other parts of the satellite processing. A limitation, however, comes from the differences in the definition of the Cphy for each algorithm (**Table 2**). It is expected that the algorithms which compute total Cphy (i.e., Algorithms A, B, and C) will be most comparable to the in situ data when the contribution to the phytoplankton Carbon by nano and picoplankton is not significant.

# 2.4. Statistical Metrics and Their Contribution to the Study

Ranking of algorithms according to their performance is a classic exercise for the ocean-color community, that has evolved from comparisons of chlorophyll algorithms (O'Reilly et al., 1998) to more complex and comprehensive approaches recently (Brewin et al., 2015; Kostadinov et al., 2017). Typically, a battery of statistical metrics is used to construct an index of overall performance against a set of matched data with in situ observations (Brewin et al., 2015). In this exercise, however, we do not use a scoring system to rank algorithms, since one of the aims of this work is to provide an overall idea of the current accuracy of the phytoplankton Carbon product from a group of algorithms. The Kolmogorov-Smirnov test for normality of the in situ match-up data showed a significant deviation from normality for log<sup>10</sup> transformed and un-transformed data and the residuals (p < 0.001). Therefore, statistical metrics that assume normality would be less reliable. For completeness, the statistical tests were computed for log<sup>10</sup> transformed data, using parametric tests; and for un-transformed data, using non parametric, rank-based, statistics. Statistical metrics computed were:


To provide an indication of the stability of the statistics and to compute confidence intervals on them, bootstrapping (Efron, 1979; Efron and Tibshirani, 1993) with random re-sampling and replacement was used to construct 1,000 different datasets from which confidence intervals were computed for some of the statistical metrics above. These metrics were computed for all the algorithms tested against the match-up dataset as a whole and, in adition, they were also computed after segregating the match-up dataset according to the dominant water class at the central match-up pixel. Because of the nature of the ocean color CCI satellite data and the Cphy algorithms, it is expected that algorithm performance will degrade toward more turbid environments (water classes 8 and over). Furthermore, the statistical results per water class provided a measure of dispersion of the phytoplankton Carbon product among algorithms, describing in which optical environments the algorithms show greater agreement. The statistics per water class were used to produce uncertainty maps (RMSD and bias). To generate the uncertainty maps, the optical class memberships at each pixel, and the per-class uncertainty values for each class were used to produce a weighted average uncertainty value for the pixel, with the weighting function being provided by the class membership. This is the method followed by the OC-CCI and described fully in Jackson et al. (2017).

# 3. RESULTS

# 3.1. Distribution of in Situ Data and Accuracy of Algorithms

The sources and geographic distribution of the in situ data, as well as the corresponding median values of the picophytoplankton carbon data are summarized in **Table 1**. The spatial distribution of the match-ups (**Figure 1**) shows their limited coverage of the oceans, with most of the data (71%) located in the Northern Hemisphere and from the Atlantic Ocean. The overall median value of Cphy from the match-up database was 11.7 ± 5.3 mg C m−<sup>3</sup> (median ± IQR/2), with values ranging from 1.7 to 60.2 mg C m−<sup>3</sup> . As a comparison,chlorophyll concentration (B) from the coincident satellite data was 0.12 ± 0.08 mg Chla m−<sup>3</sup> , ranging from 0.01 to 3.53 mg Chla m−<sup>3</sup> . The median values of Cphy from the algorithms applied to the matching data (**Table 2**) were not significantly different to the Cphy in situ (Mann-Witney test, N = 557, p < 0.05). Relative frequency histograms of these data (**Figure 2**) however, show some bias. The peaks of the histogram of the chlorophyll-based algorithms (Algorithms A and B) were closest to the in situ; the backscattering-based models (Algorithms C and D) were double the median of in situ and allometric algorithms (Algorithms E and F), were half the in situ median value (**Figure 2**). The distribution spread of chlorophyllbased algorithms were about the same or wider than the in situ (C.V. between 40 and 56%), with backscattering and allometric algorithms having narrower distributions (C.V. lower than 30%).

**Figure 3** shows the results from the algorithm estimates of Cphy against in situ derived estimates of Cphy and the relevant statistical metrics are in **Table 3**. All the algorithms showed different predictions of Cphy, but as expected, there were commonalities among models that shared the same formulation. For example, both chlorophyll-based algorithms (**Figures 3A,B**) presented an elongated cloud with a weak but positive (∼0.6) and significant correlation with in situ data, and stayed mostly along the 1:1 line, with slopes close to 1; whereas both backscattering-based algorithms (**Figures 3C,D**), had weaker correlations (∼0.4) and, although the slopes where also close to 1, the data cloud does not capture the lower Cphy measurements. Contrary to the other two groups of algorithms, the allometric-based do not share a formulation, hence their results (**Figures 3E,F**) differ significantly among them.

The statistical metrics (**Table 3**) provide a range of values among the algorithms tested, showing that there is not a clearly superior performance of a single algorithm on all metrics. The bias (δ) ranged from 3.5 mg C m−<sup>3</sup> for Algorithm B (Marañón et al., 2014) to 15 mg C m−<sup>3</sup> for the Algorithm D (Martínez-Vicente et al., 2013) as modified in this work. Chlorophyll and backscattering based algorithms had a positive bias, less than 15 mg C m−<sup>3</sup> , whereas allometric algorithms had a negative bias, less than 7 mg C m−<sup>3</sup> . The un-biased RMSD (1), which gives an idea of the dispersion of the predictions, ranged from 8.9 mg C m−<sup>3</sup> for Algorithm E (Kostadinov et al., 2016) to 29 mg C m−<sup>3</sup> for Algorithm D, (Martínez-Vicente et al., 2013) as modified in this work. The lowest median absolute percentage difference (MAPD), a measure of accuracy, for the untransformed data was for Algorithm B (Marañón et al., 2014), in agreement with the bias indicator for log and non-log transformed data. The inter-quartile range of the MAPD (IQR), a measure of precision of the algorithm, was lowest for Algorithm E (Kostadinov et al., 2016), and coincided with another indication of dispersion (1) in the non-log statistics, but differed for the log-transformed data.

Overall, chlorophyll-based algorithms had higher correlation and indicators of lower bias (i.e., δ, MAPD), whereas allometric algorithms had indicators of lower dispersion (i.e., 1, IQR of MAPD). Between algorithms, there was a factor 4 of difference between the maximum and minimum predictions from the algorithms for all matchups pooled together as a median (i.e., the median of the fractional difference between the minimum and the maximum predictions by the algorithms in each match-up point).

An additional way to assess model performance is to study their emergent properties (de Mora et al., 2016). Here we have compared the in situ and the algorithm derived Cphy to the chlorophyll concentration, B (standard OC4v6 OC-CCIproduct) for the match-ups (**Figure 4** and **Table 4**), to investigate the behavior of the Cphy:B ratio. **Figure 4A** displays the positive correlation between the satellite derived B (for the whole of the phytoplankton assemblage) and the in situ derived Cphy for the picophytoplankton fraction only, over more than two orders of magnitude of chlorophyll concentration. Because of the mismatch between the particle assemblage in B and Cphy,

the overall values reported for the Cphy:B ratio are smaller than they would be if the ratio had been derived from B for picoplankton only. However, with a median value of 91 mg C mg Chla−<sup>1</sup> , the in situ Cphy-to-satellite-B-ratio falls within or close to observed values in oligotrophic areas. For instance, Sathyendranath et al. (2009) reported average values of this


TABLE 3 | Summary of statistics of algorithm performance (algorithms A–F, columns) for log<sup>10</sup> and untransformed data (N = 557).

Statistics provided have uncertainty estimates (95% confidence interval), from 1,000 bootstrap realizations (See section 2.4). Bold numbers are the best results for each statistic: highest value for r<sup>s</sup> and rp, lowest for Ψ, δ, ∆, I and MAPD, and closest to one for S.

between chlorophyll and in situ for comparison with the regressions by the algorithms. For comparison, data in (A) are repeated (gray) in the other panels.

ratio greater than 100 mg C mg Chla−<sup>1</sup> for prymnesiophytes, cyanobacteria and Prochlorococcus sp. Direct observations of pico and nano plankton carbon in the Northern and Southern Atlantic gyres produced carbon-to-chlorophyll ratio estimates, on average, of 106 and 190 mg C mg Chla−<sup>1</sup> , respectively (Graff et al., 2015). Marañón et al. (2014) also reports values in the range of 80–117 mg C mg Chla−<sup>1</sup> for oligotrophic regions. Therefore, the Cphy:B ratio used as reference in this study compares well with existing observations reported in the literature, despite the mismatch between the phytoplankton assemblages considered. These data are repeated as the background of the other panels in **Figure 4** for reference along with their corresponding regression


Cphy was computed from the algorithms and B from the standard OC-CCI chlorophyll product for the match-up dataset (N = 557). Pearson correlation (rp) was computed for log<sup>10</sup> converted data. Type II linear regression (Reduced Major Axis) was computed for untransformed data, with dependent variable Cphy and independent variable is B.

line. Algorithms A and B are chlorophyll-based, therefore their predictions fall on the line of the equations used, respectively Equations 2, 3 in **Figures 4B,C**. Their predictions are close to the in situ data cloud and the resulting median phytoplankton carbon-to-chlorophyll ratio encompass the in situ reference value, Algorithm A providing an upper limit, as expected from the assumptions made in its construction. Backscattering-based algorithms (Algorithms C and D, in **Figures 4D,E**, respectively) overestimate the Cphy:B reference relationship, specially at the lower concentrations of chlorophyll, and produce median phytoplankton carbon-to-chlorophyll ratio values up to two times greater than the reference. However these algoritms capture some of the variability around the prediction line, which is in the same order of the in situ data. Algorithms using inherent optical properties with allometric conversions (Algorithms E and F, in **Figures 4F,G**, respectively) underestimate the reference Cphy:B reference relationship, with Algorithm E showing a narrower distribution of data points than Algorithm F.

The in situ Cphy dataset is representative only of a fraction of the particle population (picophytoplankton). However, its geographical distribution, the median Cphy concentrations and carbon-to-chlorophyll ratios derived from this dataset are in agreement with existing observations in oligotrophic oceanic conditions. Taking into account this charcteristic of the dataset, the overall performance of the algorithms was on the low side, with chlorophyll-based algorithms producing slightly lower bias and allometric algorihms slightly lower dispersion. Among algorithms there was a median dispersion of a factor 4 between minimum and maximum predictions. The algorithms were also tested on their ability to produce realistic a Cphy:B ratio, which again highlighted great dispersion in predictions among and within algorithms types. Arguably, part of the dispersion in the statistical results may have arised from the fact that the in situ Cphy dataset is representative only of a fraction of the particle population (i.e., picophytoplankton). Therefore if we limited the study to the optical cases where picophytoplankon is expected to dominate the phytoplankton carbon pool and the chlorophyll content, we would expect an improvement on the results from the algorithms. In the next part of this study we present results obtained from segregating the data into optical water types.

# 3.2. Algorithm Comparisons for Individual Optical Water Types

An optical water class, in this context, is defined by a mean remote sensing reflectance spectrum representative of particular optical characteristics, i.e., an end-member spectrum. Each extracted satellite pixel coinciding with a match-up in situ data has contributions from the different end-members in different proportions. The water class contributing with the largest proportion to the pixel water class membership is classed as belonging to that water class (Jackson et al., 2017). The geographic locations of the match-up points per water class and the number of observations per class are shown in **Figure 5**. There was a good correspondence between the geographic location of the optical water types (note that the classes are numbered such that i = 1 is the most oceanic type and i = 14 the most coastal type) in **Figure 5A** and the Cphy concentrations (**Figure 1**), such that the higher-numbered optical classes tend to be representative of higher concentrations of phytoplankton carbon. The majority of the data (63%), though can be classed as representative of an oligotrophic environment (i 1 to 6, B < 0.15 mg Chla m−<sup>3</sup> ). A boxplot summary of the descriptive statistics per optical water class per algorithm is provided (**Figure 6**). **Table 5** summarizes the median Cphy values highlighting a steady increase from oceanic to coastal waters, except for i = 13, which only has N = 5 observations, and is hereafter discarded from the analysis. The magnitude and the increase of the picophytoplankton carbon is in agreement with oceanic and coastal data (Tarran et al., 2006). For instance, using the current carbon conversion parameters on existing abundance data (Tarran and Bruun, 2015), Cphy median in the coastal area of the Western English Channel is 12.1 ± 6.1 mg C m−<sup>3</sup> (N = 68).

Algorithm performance per water class was quantified by the signed bias (δ) and the center-pattern RMSD (1) (see **Table 5** and **Figure 7**). These statistical indicators of performance improved by limiting the analysis to oligotrophic waters (i 1 to 6) as expected: bias, δ, was similar or lower than those obtained when considering all data, for all the algorithms (section 3.1, **Table 3**). Center-pattern RMSD, 1 an indicator of precision, was, on average, half that when considering all data, indicating decreased noise in the retrieval for all algorithms (for nonlog results, section 3.1). Chlorophyll-based algorithms had, on average, similar δ and 1 to the allometric-based algorithms, with back-scattering based algorithms producing higher bias and higher 1 values.

Mesotrophic waters (0.15 < B < 0.7 mg Chla m−<sup>3</sup> , or optical water classes 7–10) comprise 32% of data. With respect to the results obtained for all data available (section 3.1), the chlorophyll based algorithms had an increased δ, whereas 1 was similar. Backscattering-based algorithms had higher uncertainties than chlorophyll-based algorithms in the more turbid waters (high optical class numbers) in the untransformed data (**Figure 7D**). Bias increased for all of the algorithms for these water classes except for Algorithm E, which remained negative and relatively constant at −60%. However, the results for mesotrophic and more turbid waters, should be taken with caution as the use of in situ Cphy data comprising only picophytoplankton is more problematic in these waters.

TABLE 5 | Median Cphy from in situ matchups and non-log average bias (δ) and un-biased RMSD (1) of the different algorithms (algorithms A–F, columns headers) per water class number (i) in mg C m−<sup>3</sup> , alongside with the number of observations per water class (Ni ).


The in situ median Cphy:B ratio by optical water class were also compared to the median Cphy:B ratio from the algorithms (**Figure 8**). The in situ data (solid gray line and error bars) is repeated as a reference throughout **Figure 8**, showing that for i from 1 to 6 (oligotrophic waters), the range of variation is narrow (106 to 165 mg C mg Chla−<sup>1</sup> , median 133 mg C mg Chla−<sup>1</sup> ) and error bars are overlapping among the optical water classes 1 to 6. For mesotrophic waters (i from 7 to 10), there was a decreasing tendency of the in situ Cphy:B ratio.

The analysis of the algorithm predictions of the Cphy:B ratio focusses on the optical water classes 1 to 6, where Cphy and B are expected to describe the same phytoplankton assemblage. Essentially, the behavior observed for the Cphy is also repeated here. Algorithm A was an upper limit to the Cphy, it is also an upper limit to Cphy:B ratio, which decreases with increasing optical water class number (**Figure 8A**). Algorithm B shows relatively little variation of the median Cphy:B ratio for the optical water classes of interest and beyond (i > 6). Backscattering-based algorithms, Algorithm C (Behrenfeld et al., 2005) and Algorithm D (Martínez-Vicente et al., 2013), showed also decreasing median Cphy:B ratio with increasing water class number, with large overestimations with respect to the in situ Cphy:B ratio at the clearest waters. The allometric-based algorithms, Algorithm E (Kostadinov et al., 2016) and Algorithm F (Roy et al., 2017), were generally predicting lower than observed Cphy, and also predicted lower median Cphy:B ratios. However, both algorithms had a decreasing tendency for the median values with increasing turbidity (or water class number, in this study), with Algorithm E being closest to observations for i from 1 to 3.

Finally, **Figure 9** shows an example of the Cphy product from algorithms A to D using the OC-CCI monthly product from May 2005. All algorithms reproduce the broad patterns that would be associated with Cphy i.e., increased values in high-chlorophyll areas (upwelling sites and coastal regions) and lower concentrations in the gyres, however the salient point of this Figure is the large differences in predictions among the algorithms, as expected from the statistical results.

### 4. DISCUSSION

### 4.1. The Picophytoplankton C Match-up Dataset

This study has compiled a large in situ database of picophytoplankton carbon, building on a combination of a substantial pre-existing effort by the modeling community (the

MAREDAT data) and long time series observation programmes in the open ocean (Atlantic Meridional Transect, AMT). The ambition is to see this dataset growing with time, as new data are incorporated.

There are a number of advantages and limitations for this dataset with respect to its use for algorithm testing and validation. An advantage is that only a small fraction of the data have been used for the development of any of the algorithms tested here. Algorithms A, B, C, and E are completely independent of the match-up dataset. Algorithm D as implemented by Martínez-Vicente et al. (2013), was developed using a small subset of the new database, N < 70, but the subset included nanoplankton.

Algorithm F used MAREDAT for testing algorithm performance, but not for its development. So the data for the validation presented here are mostly independent of the data used in the construction of algorithms. The geographic distribution of the match-up database, though largely limited to the Atlantic, with some additional points from the Pacific, does cover a variety of oceanic regions. It is a purpose-built database for satellite validation studies, and therefore, only an average of the matching data in the top 10 m have been selected for convenience.

However, the dataset also has limitations. One of them is that it is only composed of picophytoplankton: though they are important contributors to the open-ocean phytoplankton biomass, picophytoplankton form a decreasing proportion of the phytoplankton biomass in more productive waters, where larger cells tend to become more important (Marañón et al., 2012; Marañón, 2015). One interesting avenue would be to expand the database with other phytoplankton groups (Buitenhuis et al., 2013; Sal et al., 2013). Nanoplankton can also be counted using flow cytometry, and microphytoplankton groups counted using microscope or automated image processing (Sosik and Olson, 2007; Álvarez et al., 2012), but the relationship between carbon and abundance becomes more variable for larger and more irregularly-shaped phytoplankton (Moberg and Sosik, 2012; Saccà, 2016).

Because the data are confined to the surface layers, they may also be adversely affected by underestimation of Prochlorococcus sp. abundance by flow cytometry because of extremely low fluorescence per cell (Partensky et al., 1996; Heywood et al., 2006). Furthermore, by limiting our dataset to the top 10 m, we have precluded the possibility of testing any potential impact that the vertical structure in the first optical depth might have on algorithm performance. Examples exist in the literature where the depth variations in particulate organic carbon been taken into account (Duforêt-Gaurier et al., 2010), and this may be an avenue worth exploring also for phytoplankton carbon algorithms.

Finally, the conversion of abundance to carbon using estimates from the laboratory could cause errors in the computed Cphy in the field, if the laboratory estimates do not hold under natural environmental conditions. These factors can vary with physiological state and with depth (Casey et al., 2013) and have been discussed previously (Buitenhuis et al., 2012; Martínez-Vicente et al., 2013). It is important to highlight that in this study we have used indirect estimates of phytoplankton carbon (through cell counts and cell size) only because of the lack of direct measurements. However, methods for direct quantification of Cphy have recently become available (Graff et al., 2012; Casey et al., 2013) and the expectation is that there will be more direct Cphy data available in the future.

Limiting the study to the optical water classes where the composition of the phytoplankton assemblage is expected to be dominated by picoplankton ( i 1 to 6, B up to 0.15 mg Chl m−<sup>3</sup> ) median values of Cphy and Cphy:B ratio matching the literature (Marañón et al., 2001; Marañón, 2015), were obtained, and the algorithms results showed more stability and less dispersion. However, some algorithms displayed significant differences which are discussed hereafter.

# 4.2. Algorithm Comparison by Type: Chlorophyll Based, Backscattering Based and Allometric

The algorithms presented here were broadly classified into three classes: chlorophyll-based, back-scattering based and allometric. In the results presented here, it is evident that algorithms based

gyres indicate values close or below 5 mg C m−<sup>3</sup> , light gray indicates invalid retrieval or unavailable input data.

on similar approaches perform alike. So it is worth examining each of these approaches in some detail, to explain the differences observed in the results for oligotrophic waters.

The two chlorophyll-based algorithms were designed to consider all the phytoplankton groups: Algorithm A (Sathyendranath et al., 2009) provides an upper limit to the phytoplankton contribution to the total particulate carbon pool and Algorithm B (Marañón et al., 2014) is based on phytoplankton carbon computed from flow-cytometric data supplemented with microscopic counts for larger phytoplankton. Yet, when compared with only one fraction of the total phytoplankton pool (the picophytoplankton in this study), the results are similar, with Algorithm A slightly overestimating and Algorithm B slightly underestimating in situ Cphy. Algorithm A was designed as the upper limit to the phytoplankton carbon, hence this result is as expected. Algorithm B was computed using a similar conversion method between cell count and carbon concentration to the one used in this study, but with different conversion coefficients. We speculate that a possible reason for the difference observed between the predicted Cphy by Algorithm B and the in situ Cphy could be found in the differences in conversion between cell counts and carbon. Production of datasets with consistent conversion factors would help eliminate this source of discrepancy.

Differences in results are greater with the backscatteringbased algorithm, which produced overestimation of Cphy and greater dispersion statistics than the other algorithms. Because it has been verified in situ that bbp increases with pico and nano plankton carbon (Martínez-Vicente et al., 2013; Graff et al., 2015), the degradation of results may be linked to the backscattering input from the algorithm. Algorithm C equation was derived using the GSM algorithm for obtaining a relationship between the backscattering coefficient and the B. It has been found in situ, that at low B there may be an overestimation of the bbp by using this satellite-derived relationship (Huot et al., 2008; Antoine et al., 2011; Brewin et al., 2012), yet in situ data also have shown overestimation of backscattering by the QAA algorithm (Behrenfeld et al., 2013). It has been found recently, that Raman scattering plays a role in the discrepancy of the retrievals of the backscattering coefficient in very clear waters from satellite (Westberry et al., 2013), causing an overestimation in backscattering. This effect of Raman scattering has been only identified recently (Lee and Huot, 2014), and was not incorporated in the QAA version used for the production of the OC-CCI version 2 data used for this study, but can be analytically corrected (McKinna et al., 2016), and future versions of the OC-CCI dataset will address this issue.

While validation of the OC-CCI chlorophyll product has been performed intensively in Atlantic oligotrophic waters and showed low error statistics (Brewin et al., 2016), investigations to improve the validation of bbp and to understand better the relationships of bbp with particles in oligotrophic areas are still required (Brewin et al., 2015). At a fundamental level, backscattering is dependent on the refractive index of the cells, their abundance and size, whose interplay is not yet fully understood. But in addition to phytoplankton, other particles (e.g., detritus) are known to contribute to the backscattering signal, and their variability relative to phytoplankton could potentially contribute to the discrepancies observed in this study.

Algorithm E used in its original formulation an inversion model to retrieve the backscattering spectral slope (Loisel et al., 2006) that is different from the one used in this comparison as input (i.e., QAAv5). The QAA used here invoked a band ratio to solve for the backscattering spectral slope, and this may account for at least part of the observed tighter coupling between B and Algorithm E. Algorithm E provides consistently low 1 values across the water classes, although it underestimates the in situ data systematically. This may be pointing to a need to re-adjust the size scaling parameter (like No), the empirical correction for which is now based on a rather limited in situ validation data set (Kostadinov et al., 2016). Ultimately, a better optical closure is needed between modeled and observed backscattering spectra, and a better understanding of the underlying particle assemblages, their refractive indices, and their relative contributions to the backscattering coeffient. It remains to be validated if at more productive waters, the prediction of low Cphy (from Algorithm E picophytoplankton fraction) remains valid, whereas the current in situ dataset showed an increase (**Table 5**).

Algorithm F was similar to algorithm E in all water classes except for water classes 8 to 11, where the uncertainty is almost double compared with the rest of the algorithms, pointing perhaps to a vulnerability to uncertainties in the aph(676) retrieval in these waters. More accurate estimates of phytoplankton carbon by Algorithm F would possibly require improving the retrievals of the input aph(676) values, especially in high-chlorophyll waters.

### 5. CONCLUSIONS AND FUTURE WORK

Further work is required to extend the in situ dataset to include additional phytoplantkon sizes to evaluate if uncertainties can be reduced in the product by including larger phytoplankton to capture phytoplankton dynamics at wider scales. Despite the limitations of the in situ data used, it has been shown that where chlorophyll concentrations are less than 0.15 mg Chla m−<sup>3</sup> , chlorophyll-based algorithms provide the best estimates of Cphy, allometric-based algorithms consistently underestimate Cphy and backscattering-based algorithms, can produce large overestimations of Cphy, at least for the particular case of back-scattering data, provided using the QAA algorithm, as implemented in OC-CCI version 2.0. To improve back-scattering products from satellites, fundamental optical work on explaining the relationship between the bbp and particles in oligotrophic areas is still needed. Satellitebased phytoplankton carbon product, once validated to a level that meets user requirements, and in situ datasets, similar to the one presented here, will be useful for validation of marine ecosystem and biogeochemical models at a wider scale (Dutkiewicz et al., 2015).

### DATA AVAILABLE

The Cphy data, computed from in situ phytoplankton counts, and the matching Cphy data, computed from different algorithms using OC-CCI version 2.0 satellite data, can be obtained from http://www.zenodo.org with doi: 10.5281/zenodo. 1067229.

## AUTHOR CONTRIBUTIONS

VM-V: Lead writer, collation of in situ database, statistical analysis and production of code for match-up evaluation. HE-K: Production of code for satellite processing and matchup extraction, and writer. RB: Provision of code for statistical analysis based on OC-CCI methodology. TJ: Provision of code for calculation of per optical water class uncertainties, based on the OC-CCI methodology. GD: Provision of in situ data. AH: Perspectives on ecosystem modeling. SR: algorithm provider, input in statistical analysis/interpretation. RR and HK: Review of the in situ database. TK and EM: Algorithm providers. JG: Data provider. GT: Data provider. TP: Project leader, scientific advice. SS: Co-lead development of the concept, work plan and guidance. All authors reviewed and provided comments on the draft manuscript.

### FUNDING

Several authors (VM-V, SS, TP, SR, HE-K, GD, AH, RR, and HK) were supported by POCO, which is a project funded by the European Space Agency (ESA) under the program of Science Exploitation of Operational Missions (SEOM) following Contract: 4000113692/15/I-LG. This study is a contribution to the international IMBeR project and was supported by the UK Natural Environment Research Council National Capability funding to Plymouth Marine Laboratory and the National Oceanography Centre, Southampton. This is contribution number 319 of the AMT programme. TK was supported on NASA grant # NNX13AC92G and by the Division of Hydrologic Sciences, Desert Research Institute. JG was supported through NASA Grant # NNX10AT70G. Thanks to ESA for sponsoring this publication.

### REFERENCES


### ACKNOWLEDGMENTS

The authors would like to thank Peter Regner at ESA for his support and management of the POCO project. The authors would like to thank Oliver Fisher for his contributions to the project during his student internship. The authors would like to thank the participants of the ESA sponsored Color and Light in the Ocean (CLEO) Workshop for constructive discussions and comments. Also thanks to the providers of the Ocean Color Climate Change Initiative dataset, Version 2.0, European Space Agency, available online at http://www.esa-oceancolour-cci.org and to British Oceanographic Data Centre (BODC) for the provision of the archived AMT and WCO flow cytometry data. VM-V thanks Prof. A. Bracher for directing us to additional data sources and Dr. L. Polimene for useful discussions on the variations of the Cphy:B ratio. We thank the two reviewers for their time and comments, which helped to improve significantly the quality of the work.


Zubkov, M. V., Sleigh, M. A., Tarran, G. A., Burkill, P. H., and Leakey, J. G. (1998). Picoplanktonic community structure on an Atlantic transect from 50 N to 50 S. Deep Sea Res. I 45, 1339–1355. doi: 10.1016/S0967-0637(98)00015-6

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer SA and handling Editor declared their shared affiliation.

Copyright © 2017 Martínez-Vicente, Evers-King, Roy, Kostadinov, Tarran, Graff, Brewin, Dall'Olmo, Jackson, Hickman, Röttgers, Krasemann, Marañón, Platt and Sathyendranath. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Comparison of Seasonal Cycles of Phytoplankton Chlorophyll, Aerosols, Winds and Sea-Surface Temperature off Somalia

Muhammad Shafeeque1, 2, Shubha Sathyendranath<sup>3</sup> \*, Grinson George<sup>1</sup> , Alungal N. Balchand<sup>2</sup> and Trevor Platt 1, 4

<sup>1</sup> Fishery Resources Assessment Division, Central Marine Fisheries Research Institute, Kochi, India, <sup>2</sup> School of Marine Sciences, Cochin University of Science and Technology, Kochi, India, <sup>3</sup> National Centre for Earth Observation, Plymouth Marine Laboratory, Plymouth, United Kingdom, <sup>4</sup> Plymouth Marine Laboratory, Plymouth, United Kingdom

### Edited by:

Catherine Jeandel, Centre National de la Recherche Scientifique (CNRS), France

### Reviewed by:

Hubert Loisel, Université du Littoral Côte d'Opale, France David Antoine, Curtin University, Australia

> \*Correspondence: Shubha Sathyendranath ssat@pml.ac.uk

### Specialty section:

This article was submitted to Ocean Observation, a section of the journal Frontiers in Marine Science

Received: 30 March 2017 Accepted: 15 November 2017 Published: 12 December 2017

### Citation:

Shafeeque M, Sathyendranath S, George G, Balchand AN and Platt T (2017) Comparison of Seasonal Cycles of Phytoplankton Chlorophyll, Aerosols, Winds and Sea-Surface Temperature off Somalia. Front. Mar. Sci. 4:386. doi: 10.3389/fmars.2017.00386 In climate research, an important task is to characterize the relationships between Essential Climate Variables (ECVs). Here, satellite-derived data sets have been used to examine the seasonal cycle of phytoplankton (chlorophyll concentration) in the waters off Somalia, and its relationship to aerosols, winds and Sea Surface Temperature (SST). Chlorophyll-a (Chl-a) concentration, Aerosol Optical Thickness (AOT), Ångström Exponent (AE), Dust Optical Thickness (DOT), SST and sea-surface wind data for a 16-year period were assembled from various sources. The data were used to explore whether there is evidence to show that dust aerosols enhance Chl-a concentration in the study area. The Cross Correlation Function (CCF) showed highest positive correlation (r <sup>2</sup> = 0.3) in the western Arabian Sea when AOT led Chl-a by 1–2 time steps (here, 1 time step is 8 days). A 2 × 2 ◦ box off Somalia was selected for further investigations. The correlations of alongshore wind speed, Ekman Mass Transport (EMT) and SST with Chl-a were higher than that of AOT, for a lag of 8 days. When all four variables were considered together in a multiple linear regression, the increase in r <sup>2</sup> associated with the AOT is only about 0.02, a consequence of covariance among AOT, SST, EMT and alongshore wind speed. The AOT data show presence of dust aerosols most frequently during the summer monsoon season (June–September). When the analyses were repeated for the dust aerosol events, the correlations were generally lower, but still significant. Again, the inclusion of DOT in the multiple linear regression increased the correlation coefficient by only 2%, indicating minor enhancement in Chl-a concentration. Interestingly, during summer monsoon season, there is a higher probability of finding more instances of positive changes in Chl-a after one time step, regardless of whether there is dust aerosol or not. On the other hand, during the winter monsoon season (November–December) and rest of the year, the probability of Chl-a enhancement is higher when dust aerosol is present than when it is absent. The phase relationship in the 8-day climatologies of Chl-a and AOT (derived from NASA's SeaWiFS and MODIS-A ocean colour processing chain) showed that AOT led Chl-a for most of the summer monsoon season, except when Chl-a was very high, during which time, Chl-a led AOT. The phase shift in the Chl-a and AOT climatological relationship at the Chl-a peak was not observed when AOT from Aerosol Climate Change Initiative (Aerosol-CCI) was used.

Keywords: essential climate variables, aerosol optical thickness, Ångström exponent, chlorophyll-a, ocean colour climate change initiative, climate change, remote sensing, dust aerosols

# 1. INTRODUCTION

Phytoplankton, Sea-Surface Temperature (SST), sea-surface winds and aerosols are all Essential Climate Variables (ECVs) identified by the Global Climate Observation System (GCOS, 2011) as being worthy of sustained global observations at high spatial resolution and over long time scales, to aid studies of Earth's climate and climate change. As we strive to understand how the Earth system might respond holistically to climate change, it is important to explore not only the behavior of individual ECVs, but also their inter-relationships and the feedbacks between them. In the western Arabian Sea, the relationships between phytoplankton, winds and SST are better understood than that between phytoplankton and aerosols.

Yet, there are known functional links between marine aerosols and phytoplankton. For example, dust aerosols, transported by winds over the ocean, can be an important source of micronutrients such as iron, essential for phytoplankton growth (Duce and Tindale, 1991; Martin et al., 1991, 1994; Prospero et al., 2002; Cropp et al., 2005; Jickells et al., 2005; Mahowald et al., 2005; Meskhidze et al., 2005; Gallisai et al., 2014), with the proviso that not all the iron contained in dust particles is usable by phytoplankton. Winds over the ocean are also responsible for the formation of aerosols through generation of sea salt sprays (O'Dowd et al., 1997; Smirnov et al., 2003; Satheesh et al., 2006; Mulcahy et al., 2008; Glantz et al., 2009; Huang et al., 2010; Meskhidze and Nenes, 2010) and the same winds also mix the surface layer of the ocean, dictating the entrainment of nutrients from the deeper waters into the surface layer and controlling the average light available for phytoplankton growth in the layer. In addition to sea salt sprays, biological particles (for example, fragments of phytoplankton) contained in sea spray can also aid aerosol formation (Leck and Bigg, 2005; Facchini et al., 2008; Hawkins and Russell, 2010; Quinn and Bates, 2011). Feedback mechanisms (both positive and negative) have been proposed between dimethyl sulphide in the atmosphere of phytoplanktonic origin and the Earth's radiation budget, via aerosols (Charlson et al., 1987; Lovelock, 2006).

Positive (Martin et al., 1994; Jickells et al., 2005; Patra et al., 2007; Banerjee and Prasanna Kumar, 2014) and negative (Mallet et al., 2009; Paytan et al., 2009; Jordi et al., 2012) correlations between marine aerosols and phytoplankton concentration have been reported for different parts of the world ocean. Some studies have also identified regions where no relationship exists between the two (Cropp et al., 2005; Gallisai et al., 2014). Possible explanations for the positive correlations include the fertilizing role of iron contained in dust aerosols, or phytoplankton themselves, acting as a source of marine aerosols. Negative correlations might arise from high winds causing production of wind-spray aerosols, while at the same time forming deep mixed layers that may be able to support only low concentrations of phytoplankton, because of low average light levels available in the layer.

Satellite-based measurements provide a valuable tool for studies of aerosols and phytoplankton. Aerosol Optical Thickness (AOT), amenable to remote sensing, is an often-used measure of aerosol concentration. The Ångström Exponent (AE), which defines the wavelength dependence of AOT, is indicative of the type of aerosols present, and is also available through remote sensing. Dust Optical Thickness (DOT) can be inferred from AOT and the AE. Satellite data have been used to track dust aerosols for thousands of kilometers away from their source (Myhre et al., 2005). Likewise, ocean colour measured from space provides information on the concentration of chlorophylla (Chl-a), which is a major photosynthetic pigment contained in phytoplankton. Furthermore, estimates of winds (speed and direction) and SST, essential for understanding phytoplankton dynamics, are also available through remote sensing. An advantage of remote sensing is that it provides data at large scales and over many years, allowing studies of time-series at multiple locations in a systematic manner. But some caution should be exercised when using ocean colour derived Chl-a concentration, AOT and AE. Sometimes they are all produced from the same processing chain, and one might argue that, in the extreme case, any relationships observed between the three are purely artifacts of the processing algorithm. Furthermore, the effects of clouds on satellite retrievals are significant and sometimes lead to biases by overestimation or underestimation of aerosol data, particularly for dust aerosols (Levy et al., 2007; Torres et al., 2007; Baddock et al., 2009; Kahn et al., 2010). However, some authors have used cloud-screening techniques to reduce such errors (Kaufman et al., 2005). Therefore, the processing chain issues should be verified to arrive at conclusive results.

In this paper, we examine the relationships of Chl-a with winds, SST, AOT and dust aerosols in the western Arabian Sea, at a selected site off Somalia. The region is characterized by a high dynamic range in Chl-a values that vary seasonally, in response to the reversing wind patterns and associated upwelling (Prasanna Kumar et al., 2001; Schott and McCreary, 2001; Schott et al., 2002; Shankar et al., 2002; Wiggert et al., 2005; Lévy et al., 2007; Wiggert and Murtugudde, 2007; Prakash et al., 2012). Diverse physical forcings of both oceanic and atmospheric origins drive biological production off Somalia region. During summer monsoon season, the Somalia coastal region is characterized by strong upwelling with high primary productivity due to the swift Somali current caused by strong south-westerlies along the coast (Smith and Codispoti, 1980; Schott, 1983; Hitchcock and Olson, 1992; Brock et al., 1994; Schott et al., 2002; deCastro et al., 2016). The anti-cyclonic eddies associated with the Somali current during the same season further enhance production by transporting and mixing upwelled water (Fischer et al., 1996; McCreary et al., 1996; Schott et al., 1997; Koning et al., 2001; Schott et al., 2002; Santos et al., 2015). The consequent nutrient enrichment in the mixed layer of the ocean leads to high phytoplankton production during summer monsoon season (Banse, 1987; Owens et al., 1993). Because of its proximity to the Arabian Peninsula, the region also receives seasonally-varying dust deposition (Pease et al., 1998; Li and Ramanathan, 2002; Prospero et al., 2002; Léon and Legrand, 2003; Zhu et al., 2007; Prasanna Kumar et al., 2010). Thus, the same winds that transport dust aerosols to the western Arabian Sea during the summer monsoon season also induce upwelling, favoring phytoplankton blooms. Hence the relationship between Chl-a and aerosols in this region would be incomplete, unless we examined the effect of winds on phytoplankton dynamics as well. Here, we use 16 years of satellite data (1998–2013) to make a systematic study of the relationship of Chl-a with AOT, winds and SST in the waters off Somalia.

### 2. MATERIALS AND METHODS

### 2.1. Data

Level-3 8-day composite Aerosol Optical Thickness (AOT) at 865 nm and Ångström Exponent (AE) from Sea-viewing Wide Field-of-view Sensor (SeaWiFS) during January 1998–December 2010 and Moderate Resolution Imaging Spectro-radiometer (MODIS) Aqua during January 2011– December 2013 downloaded from National Aeronautics and Space Administration's (NASA's) ocean colour website (https:// oceancolor.gsfc.nasa.gov) were used in this work. The AOT and AE data from NASA are referred to here as NASA-AOT and NASA-AE respectively. The daily AOT at 550 nm and AE data from European Space Agency's (ESA's) Aerosol Climate Change Initiative (Aerosol-CCI) programme (de Leeuw et al., 2015; Popp et al., 2016, see also http://www.esa-aerosol-cci.org) were also used in this study, as an independent source of aerosol data, unconnected with ocean colour atmospheric correction routines. The AOT and AE data from the Aerosol-CCI website are referred to here as CCI-AOT and CCI-AE respectively. The data are available at 1◦ spatial resolution for the period from January 1998–December 2010.

The relationship between the AOT (τ ) at any given wavelength λ<sup>0</sup> and that at any other wavelength λ depends on the AE (α) through the equation:

$$
\left(\frac{\mathbf{r}\_{\lambda}}{\mathbf{r}\_{\lambda\_0}}\right) = \left(\frac{\lambda}{\lambda\_0}\right)^{-\alpha}.\tag{1}
$$

In principle, if the optical thickness at one wavelength and the AE are known, the optical thickness can be computed at any other wavelength using Equation (1).

Chlorophyll-a (Chl-a) concentration, for the period January 1998–December 2013, was obtained from ESA's Ocean Colour-Climate Change Initiative (OC-CCI) website (Sathyendranath et al., 2016, see also https://www.oceancolour. org). One of the major reasons for the choice of the Chl-a data was the improved coverage provided by the OC-CCI data in the Arabian Sea, especially during the summer monsoon season. The 8-day composite AOT data from SeaWiFS are available at only 9 km resolution, so we used MODIS Aqua data at the same resolution (9 km) even though they are available at 4 km resolution. The Chl-a concentration from OC-CCI (version-2), which is available at 4 km resolution, was also re-gridded to 9 km resolution. Since the CCI-AOT data are available at 1◦ spatial resolution, the Chl-a concentration from OC-CCI was also regridded to 1◦ resolution to analyse the correlation between them. The daily value of AOT at 865 nm was calculated from daily CCI-AOT at 550 nm and CCI-AE using Equation (1). The data were merged to genereate 8-day composites and extracted for the region off Somalia. The daily 1◦ gridded Sea Surface Temperature (SST) data were obtained for the period January 1998–December 2013 from Woods Hole Oceanographic Institute's (WHOI's) objectively-analyzed air-sea heat fluxes available at Asia-Pacific Data-Research Centre (APDRC) website (http://apdrc.soest. hawaii.edu). The SST anomaly has been calculated using these data after merging into 8-day composites. In addition, the daily NCEP/NCAR reanalysis U-wind (zonal velocity) and V-wind (meridional velocity) data with 2.5 × 2.5◦ spatial resolution at 10 m above the sea surface were obtained for the same period from their official website (https://www.esrl.noaa.gov/psd). The data have been merged to generate 8-day composites and used to derive the south westerly wind component along the Somalia coast. All the above mentioned information is summarized in **Table 1**.

### 2.2. Methods

The methods used in this study are shown schematically in **Figure 1**, and described below.

### 2.2.1. Correlation between Chl-a and AOT in the Arabian Sea

Correlation between Chl-a and AOT concentration for the 1998– 2013 period over the Arabian Sea was studied using the 8 day composites. The results showed areas of both positive and negative correlation. The western Arabian Sea showed strong positive correlation. A 2 × 2 ◦ box (54–56◦ E longitude and 10– 12◦ N latitude) off Somalia coast, with high positive correlation, was chosen for further analyses.

### 2.2.2. CCF Analysis and Lagged Correlation

We studied the lags in the correlation between Chl-a and AOT using Cross Correlation Function (CCF). CCF analysis produces cross correlations in which the observations of one time series are correlated with the observations of another time series at different lags and leads, to identify the variables which are leading or lagging indicators of other variables. The basic premise is that, if the relationships between the variables were merely a processing artifact, the correlations would peak at zero lag. In instances where phytoplankton might be contributing biological material for aerosol formation, the correlation would

### TABLE 1 | Summary of data sets analyzed in the study.


be maximum when AOT lagged behind Chl-a concentration. On the other hand, if the oceans were fertilized by aerosols, then Chl-a would lag behind AOT.

The CCF analysis was also carried out between Chl-a concentration and alongshore component of wind speed. If windinduced upwelling were a causative factor for the increment in Chl-a concentration in the Somalia coast, then we anticipate that the correlation between them would peak when Chl-a lagged behind wind (because of the finite time it takes for phytoplankton to bloom in response to the nutrients brought to the surface by upwelling). Though the alongshore wind speed over the Somalia coast is a fairly good indicator of upwelling strength, we have calculated the Ekman Mass Transport (EMT) as an upwelling index for the analysis. Since a surface signature of upwelling is a decrease of SST in the upwelling zone, we have also taken SST as another proxy for upwelling.

### 2.2.3. Ekman Mass Transport

For the Somalia region, the alongshore component of the wind stress is favorable for upwelling during summer monsoon season. A positive value for the EMT represents upwelling along the coast of Somalia. The alongshore wind stress for Somalia coast was calculated by the bulk aerodynamic formula from Koracin et al. (2004) as shown in Equation (2):

$$
\pi\_\mathcal{V} = \rho\_a \times C\_d \times \boldsymbol{\omega} \times \boldsymbol{\nu}.\tag{2}
$$

where τ<sup>y</sup> is the alongshore wind stress; ρ<sup>a</sup> is the density of air, which was taken to be 1.2 kg/m<sup>3</sup> ; w is the magnitude of the wind speed; v is the alongshore component of wind speed in m/s; and C<sup>d</sup> is the nonlinear drag coefficient based on Large and Pond (1981) and Trenberth et al. (1990) for low wind speeds. So, the EMT along the Somalia coast can be calculated using Equation (3):

$$M\_{\rm ev} = \frac{\mathbf{r}\_{\rm y}}{f}.\tag{3}$$

where, Mev is mass transport by the alongshore wind, f is the Coriolis parameter (2 × × sinφ), is the angular frequency of the Earth and φ is the latitude.

Multiple linear regression analysis with Chl-a as dependent variable and NASA-AOT (or DOT), alongshore wind speed, EMT and SST as independent variables was carried out. We used 8 day composites with lags of 1–2 time steps for this analysis (these lags correspond to the maximum correlation between Chla and NASA-AOT data). The analysis was repeated by replacing NASA-AOT with CCI-AOT (or DOT) with a lag of 3 time steps, corresponding to the maximum correlation between Chl-a and CCI-AOT. We have also calculated the 8-day climatologies of all these variables, and plotted against time of year, to study their phase relationships.

### 2.2.4. Derivation of Dust Optical Thickness (DOT)

The desert dust transported by winds over the ocean contains micronutrients such as iron, which can regulate phytoplankton activity (Martin et al., 1994; Lenes et al., 2001; Muhs et al., 2007; Donaghay et al., 2015). Since the seasonal monsoon winds bring large quantities of iron-containing dust aerosols to the study area (Li and Ramanathan, 2002; Banerjee and Prasanna Kumar, 2014), we investigated the effect of dust aerosols on Chl-a concentrations. The AE, which is often used as a qualitative indicator of aerosol particle size, and AOT, which indicates the aerosol load, can be used to differentiate dust aerosol from other types of aerosol. Generally, a higher value of AE (α > 1) is indicative of fine, submicron aerosols, whereas lower values (α < 1) are representative of coarse, super-micron particles (Kaufman, 1993; Gobbi et al., 2007; Yoon et al., 2012). The AOT values are lower for fine aerosols and higher for coarse aerosols. In the literature, different criteria have been proposed to identify dust aerosols at different locations: for example, α < 0.6 (Dubovik et al., 2002; Brindley et al., 2015), α < 0.8 (Eck et al., 2005; Che et al., 2013), α < 1 (Eck et al., 1999; Schuster et al., 2006; Papaynannis et al., 2007; Yoon et al., 2012; Valenzuela et al., 2014; Zu et al., 2014; Pakszys et al., 2015) and α < 1.4 (Gobbi et al., 2007; Pereira et al., 2011; Shinozuka et al., 2011); similarly, AOT > 0.11 (Toledano et al., 2007; Balarabe et al., 2016), AOT > 0.2 (Salinas et al., 2009; Pakszys et al., 2015) and AOT > 0.25 (Guleria et al., 2012) have been recommended to identify dust aerosols. After considering all these studies, we have adopted the ranges of AOT and AE for off Somalia as follows: AE less than 1, and AOT at 440 nm (τ440) greater than 0.2 (i.e., α < 1 and τ<sup>440</sup> > 0.2) are designated as DOT or dust aerosols. The NASA-AOT at 865 nm (τ865) and NASA-AE were used to calculate NASA-AOT at 440 nm (τ440), using Equation (1). Similarly, CCI-AOT at 550 nm (τ550) and CCI-AE were used to calculate CCI-AOT at 440 nm (τ440), using Equation (1).

## 3. RESULTS

## 3.1. Relationship between Chl-a and AOT in the Arabian Sea

The correlation between Chl-a and NASA-AOT using 8-day time series from 1998 to 2013 data for the Arabian Sea is mapped in **Figure 2**. The results are based on data for all the seasons rather than for specific seasons as in Patra et al. (2007) or in Banerjee and Prasanna Kumar (2014). The western Arabian Sea exhibits high positive correlations, whereas the south eastern Arabian Sea shows low to moderate positive correlations. There are also regions (south central) where no statistically-significant correlation is evident and extensive regions (north-central and north-eastern) of significant negative correlations. The region off Somalia shows high positive correlation between Chl-a and NASA-AOT and it is located along the path of winds carrying dust aerosols emanating from South Asia, South-West Asia, North Africa (Sahara) and the eastern Horn of Africa (Pease et al., 1998; Ginoux et al., 2001; Goudie and Middleton, 2001; Prospero et al., 2002; Léon and Legrand, 2003). Although there are several studies (Banzon et al., 2004; Kayetha et al., 2007; Patra et al., 2007; Singh et al., 2008; Nezlin et al., 2010; Banerjee and Prasanna Kumar, 2014) that have examined the relationship between Chl-a and AOT in various parts of the Arabian Sea, the region off Somalia has not yet been explored in detail, and it is the region selected for our investigation.

# 3.2. Climatologies of Chl-a, Aerosols, Winds and SST off Somailia

The 16-year 8-day climatological seasonal cycles of Chl-a concentration, NASA-AOT, CCI-AOT, SST and along-shore wind speed are shown in **Figure 3A**, for the selected study area off Somalia. SST data are reported as anomalies from 8 day average. When the aerosols are identified as dust aerosols, they are indicated in the plot using black and purple filled circles. Out of 46 observations involved in both AOT data sets, for the 8-day climatology, 24 observations were dust aerosols for CCI-AOT data whereas 14 were identified as dust aerosols for NASA-AOT data. It was found that the CCI-AOT data showed the presence of dust aerosols not only during the summer monsoon season, but also during the winter monsoon season. The corresponding climatological wind vectors are shown in **Figure 3B**. During the first 100 days of the year, winds are north easterly, the wind speed decreasing with time. These conditions are unfavorable for upwelling off Somalia. During this period, SST increases steadily by some 3◦C. At the same time, the Chla concentrations decrease, and AOT also remains low. After this, the winds reverse direction and intensify, resulting in upwelling (indicated by decreasing SST) that favors phytoplankton growth. We note that the initial response of phytoplankton to the intense south westerly winds is a decrease in concentration, perhaps a consequence of the phytoplankton being mixed into deeper layers. After this, the Chl-a increases, with a lag of a couple of time steps behind the increasing wind speed. Both AOT and Chla reach their respective maxima during the summer monsoon season.

In **Figure 3A**, the NASA-AOT and Chl-a reach their respective maxima during the summer monsoon season. Although the seasonality of CCI-AOT is more or less similar to that of NASA-AOT, the occurrence of peak values is different. The maximum value for CCI-AOT occurred during early summer monsoon season (Day of Year, DoY 170) while the Chl-a is still increasing, whereas the NASA-AOT peak occurred at DoY 224 during the waning phase of summer monsoon season and after the Chla peaks at DoY 216. An interesting feature in the figure is that, towards the peak of the summer monsoon (around DoY 180), when Chl-a concentration reaches ≈ 0.8 mg m−<sup>3</sup> , there is a brief period when Chl-a continues to increase and leads NASA-AOT by up to 3 time steps until DoY ≈ 220. However, this feature was not found in CCI-AOT data. Just before the Chla peak is reached, the wind speed starts to drop, followed by Chl-a and NASA-AOT, until all variables reach minima toward DoY 300, at which point the wind direction again reverses. SST starts to increase when the south-westerly winds drop, reaching a secondary peak at around DoY 310.

The seasonal patterns are consistent with the known geography of the area. However, there is a tantalizing suggestion in **Figure 3** that when the winds speed are at their highest, and Chl-a levels are high, the NASA-AOT concentrations may be enhanced by maritime aerosols, in addition to the dust aerosols,

and that some of these aerosols may have a biological origin, as indicated by Chl-a leading NASA-AOT during this period. However, this observation is not supported by CCI-AOT, and in the absence of additional information, it would be premature to conclude that such is the case. But it would be a point worthy of further investigation.

# 3.3. The Relationship between Chl-a, AOT and DOT off Somalia

Since **Figure 3** indicates that there is a lag in the relationships between Chl-a and the other variables studied here, further analysis has been made for the 2 × 2 ◦ box using Cross Correlation Function (CCF) between Chl-a and AOT. The result (**Figure 4A**) shows that the highest significant positive correlation (r = 0.55) between Chl-a and NASA-AOT in the study region occurred for Chl-a lagging NASA-AOT by 1 to 2 time steps (1 time step is 8 days). The CCF analysis was also carried out between CCI-AOT and Chl-a and shows a significant positive correlation. Further, the maximum correlation (r = 0.54) occurred when Chl-a lagged behind CCI-AOT by 3 time steps (**Figure 4B**). So the analysis using CCI-AOT data confirmed the results obtained using NASA-AOT on the existence of a significant correlation between Chl-a and AOT in the region off Somalia, the magnitude of the correlation and also the sign of the lag.

The relationship between Chl-a and AOT (or DOT) with lag of 8 days is explored further in **Figure 5** using NASA-AOT (or DOT). Scatter plot between Chl-a and AOT is shown in **Figure 5A**, with the fitted curve and the r value of 0.55 for the fit, consistent with the CCF. However, we recognize that the relationship of maritime and dust aerosols with Chl-a would be functionally different (for example, we do not anticipate that maritime aerosols could fertilize the oceans, whereas it would be plausible with dust aerosols). Dust aerosols are present more frequently during the summer monsoon season because of the favorable wind from adjacent land masses, compared with other seasons. Out of 736 observations over 16 years, around 203 observations were identified as dust aerosols. **Figure 5B** shows the relationship between Chl-a and DOT. We see that there is a general tendency for Chl-a to increase with DOT.

We checked further whether the presence of dust aerosols enhances the Chl-a concentration in the subsequent time steps by calculating the difference in Chl-a (1Chl-a) in 1 time step after

a dust event, and plotting it against NASA-AOT (**Figure 5C**). The presence of more positive 1Chl-a following high aerosol events would be indicative of a positive effect of aerosols on phytoplankton concentration. The data (**Figure 5C**) show no obvious relationship between aerosols and 1Chl-a either for all aerosols taken together or for dust aerosol events (circles in red colour) by themselves. However, for all the DOT events considered by themselves, the frequency of 1Chl-a is slightly skewed toward positive numbers, with some 114 values being positive out of 203 events (see histogram of 1Chl-a, **Figure 5D**). So the probability that Chl-a enhancement is associated with the presence of dust aerosols throughout the year is 56% (114 out of 203), compared with 238 out of 532 in the absence of dust aerosols (45%). The higher number of positive 1Chl-a observations is significant (p < 0.05) according to a binomial test. For the non-dust events, there is a higher number of negative values (294) compared with positive values (238) of 1Chl-a. These results are summarized in **Table 2**.

We supplemented these calculations after splitting the data according to monsoon (summer monsoon) and non-monsoon seasons, recognizing the differences in oceanographic and meteorological conditions during these two parts of the year (**Table 2**). Out of 224 observations during the summer monsoon season, 140 are dust aerosol events and 84 are non-dust events. Within these 140 dust events, the number of positive 1Chl-a values is 82 (59%), compared with 58 (41%) negative values. However, for non-dust events during this season, we also find more positive 1Chl-a values (58 events, or 69%) than negative ones (26 events, or 31%). For the non-monsoon season, out of 511 total observations, 448 are non-dust aerosol events and 63 are dust events. Within these non-dust observations, there is higher number of negative values (268, or 60%) when compared with positive values (180 or 40%). But during dust events, the number of positive observations is slightly higher, with 32 (51%) positive values compared with 31 (49%) negative ones. We conclude from all of the above that the probability of Chla enhancement during the summer monsoon season does not depend much on the presence or absence of dust aerosols. In other words, during the summer monsoon season, there is a higher probability of finding positive 1Chl-a values, regardless of whether there is a dust event or not. On the other hand, during the rest of the year, the probability of chlorophyll enhancement is a little higher during dust events than during non-dust events.

The analysis was also repeated for CCI-AOT data to verify the above results and is presented in **Table 3**. For this dataset, the probability of Chl-a enhancement is again more in the presence of dust aerosols when the whole year is considered, at 53% (134 out of 251), compared with 142 out of 344 in the absence of dust aerosols (41%). Out of 208 observations during the summer monsoon season, 154 are dust aerosol events and 54 are nondust events. Within these 154 dust events, the number of positive 1Chl-a values is 83 (54%), compared with 71 (46%) negative values. However, for non-dust events during this season, we also find more positive 1Chl-a values (31, or 57%) than negative ones (23, or 43%). The results from winter monsoon season indicate

that, though the dust aerosol events are fewer in number (49) compared with non-dust (93) within 142 observations, there were more positive 1Chl-a values (28, or 57%) than negative values (21, or 43%) when dust aerosols were present in the region. But, during the absence of dust aerosols, there is a higher number of negative values (48, or 52%) compared with positive values (45 or 48%).

Thus both NASA-AOT and CCI-AOT lead to the conclusion that the probability of Chl-a enhancement during the summer monsoon season does not depend on the presence or absence of dust aerosols. In other words, during the summer monsoon season, there is a higher probability of finding positive 1Chla values, regardless of whether there is a dust event or not. On the other hand, during the winter monsoon season and rest of the year, the probability that dust events may be associated with chlorophyll enhancement is higher than that during non-dust periods.

## 3.4. Relationship of Chl-a with Winds, SST, AOT, and DOT

To elucidate further the relationship between Chl-a and environmental conditions, we next examined the CCF between Chl-a and alongshore wind speed, since it is known that the alongshore winds determine upwelling, and hence influence phytoplankton dynamics in the area (Goes et al., 2005; Gregg et al., 2005; Wiggert et al., 2005; Prasanna Kumar et al., 2010); (see also **Figure 3**). The result (**Figure 6**) shows, similar to the CCF between Chl-a and NASA-AOT, that the correlation peaks with a lag of 1-2 time steps, with wind speed leading Chl-a, but with a higher correlation coefficient (r = 0.69, p < 0.05).

Since the correlation coefficients of Chl-a with both aerosols and wind speed peak with a lag of 1–2 time steps, we chose a lag of 1 time step, for a linear step-wise multiple regression study with Chl-a as dependent variable, and NASA-AOT (or DOT), Ekman Mass Transport (EMT), alongshore wind speed and SST as independent variables. The upwelling indices, the wind speed and EMT both show more or less similar correlation with Chl-a. So, we excluded the EMT from the multiple linear regression analysis (but the results from the multiple linear regression including EMT are presented as Table S1). When the correlations with each of the independent variables are considered individually, the highest r 2 values were found for alongshore wind speed (r <sup>2</sup> = 0.47) for the ensemble of year-round data, with the corresponding r <sup>2</sup> dropping to 0.17 when dust aerosol events are considered separately (140 dust events during the summer monsoon, and 63 outside of it, totalling 203), followed by SST (r <sup>2</sup> = 0.33 and r <sup>2</sup> = 0.20 for the same two cases respectively), and then by NASA-AOT (r <sup>2</sup> = 0.30 and r <sup>2</sup> = 0.08 for the corresponding cases). From the results of pair-wise regression analysis, we see that the addition of NASA-AOT (or DOT) as an independent variable, in addition to wind speed, increases r 2 values by a modest 0.02. With all three variables taken together as independent variables, the explained variance (r 2 ) is 0.52 for all data, and 0.25 for DOT events (**Table 4**). The results for a lag of 2 time steps (not shown) are similar to those for lag of 1 time step, but with lower correlation coefficients.

The multiple regression analysis was also repeated for CCI-AOT data with Chl-a as dependent variable and CCI-AOT (or DOT), alongshore wind speed and SST as independent variables (**Table 5**). Since the correlation coefficients of Chl-a with CCI-AOT peak with a lag of 3 time steps, we chose a lag of 3 time steps for this analysis. When the correlations with each of the independent variables are considered individually, the highest r 2 values were again found for alongshore wind speed (r <sup>2</sup> = 0.49) for year-round data and r <sup>2</sup> = 0.38 for dust aerosol events (154 dust events during the summer monsoon, and 97 outside of it, totalling 251) considered separately, followed by SST (r <sup>2</sup> = 0.32 and r <sup>2</sup> = 0.15 for the same two cases respectively), and then by CCI-AOT (r <sup>2</sup> = 0.29 and r <sup>2</sup> = 0.09 for the corresponding cases). From the results of pair-wise regression analysis, the addition of CCI-AOT (or DOT) on wind speed as independent variables did not make any improvement in the r 2 value of 0.49. However, a small increase in r 2 value by 0.06 or 0.08 was found when adding CCI-AOT (or DOT) respectively to SST.

### 4. DISCUSSION

### 4.1. The Satellite Data Used

Much of the interpretation of results for the region off Somalia depends on the quality of the satellite data used for the analysis, especially during the summer monsoon season, since this is a highly dynamic season, with high winds, high AOT and high Chla concentrations. The OC-CCI Chl-a dataset (Sathyendranath et al., 2016) was selected because of the significantly-improved

FIGURE 5 | (A) The scatter plot between 8-day composite AOT (NASA-AOT) and Chl-a with lag of one time steps, for the study area. The straight line in red colour indicates the regression equation and the correlation coefficient (r) is shown in the top right side. (B) Scatter plot between DOT and Chl-a with lag of one time steps (subset of all the data points in (A). (C) Scatter plot between AOT and 1Chl-a with lag of one time step. The circles in red colour indicate dust aerosols. (D) Histogram for 1Chl-a during the presence of dust aerosols with lag of one time steps.

TABLE 2 | The number of observations with enhancements in Chl-a (+ve 1Chl-a) or reductions in Chl-a (−ve 1Chl-a), for all data, and for the summer monsoon, for the non-monsoon and sorted according to whether the aerosols were identified as dust or not (here, dust aerosols were derived from NASA-AOT data).


seasonal coverage that the data provide in the study area, compared with other datasets, especially during the summer monsoon season. But it is important to reassure ourselves that the data are of sufficient quality for the analysis presented. Though the OC-CCI data have been validated using a global dataset as part of the project, and also for the neighboring Red Sea (Brewin et al., 2015) and the Gulf of Aden (Gittings et al., 2016), we do not have in situ data from off Somalia region for local validation. However, the data are reassuring in some respects: the first one is that, if the relationship between Chl-a and AOT were an artifact of the processing, then one would anticipate that the relationship would peak at zero lag. In fact, we see TABLE 3 | The number of observations with enhancements in Chl-a (+ve 1Chl-a) or reductions in Chl-a (−ve 1Chl-a), for all data, and for the summer monsoon, for the winter monsoon and sorted according to whether the aerosols were identified as dust or not (here, dust aerosols were derived from CCI-AOT data).


that, typically, the maximum correlation occurred with a lag, suggesting a functional relationship between the two variables, rather than an artifact. The second is that the seasonal patterns in Chl-a are consistent with the known oceanography of the area, and appear as a consequence of the seasonal changes in the oceanographic conditions, as indicated by the winds and SST. The AOT and AE data from both NASA and CCI also show seasonal changes with high AOT values and low AE during summer monsoon season and vice versa for the rest of the year.

We have used aerosol data from the NASA ocean colour web site, partly to reassure ourselves that the aerosol and Chl-a products that are outputs of the same processing chain do not show inter-dependencies associated with the assumptions that underlie the processing. In the OC-CCI processing version-2 used here, SeaWiFS and MODIS-Aqua data were processed using NASA's SeaDAS software, consistent with the processing chain that generated the aerosol products at the NASA ocean colour website. That the analysis presented here has indicated that the patterns in Chl-a and in the aerosol properties are consistent with the known oceanography of the study area, and that the correlations vary with region (**Figure 2**) as oceanographic and meteorological conditions change, lends some confidence to the quality of the data, in the absence of direct validation data. To further substantiate the application of satellite data to studies of relationship between aerosol and phytoplankton, Aerosol-CCI data sets were also subjected to identical analysis and the data confirmed our findings.

## 4.2. Aerosols and Phytoplankton in the Western Arabian Sea off Somalia

There have been a few previous studies that dealt with the influence of aerosols on phytoplankton dynamics in the Arabian Sea. A recent study (Banerjee and Prasanna Kumar, 2014) has shown that episodic dust storms could generate phytoplankton

TABLE 4 | Results of multiple linear regression analysis with Chl-a as the dependent variable and NASA-AOT or DOT, alongshore wind speed and SST as independent variables for 1 time step lag.


Number of observations (N), r<sup>2</sup> , adjusted r<sup>2</sup> and r are shown, for each of the analyses and they are statistically significant (p < 0.05). The first set of calculations with 735 observations is for the whole year. The second set, with 203 observations, is for the dust aerosol events. Plus signs indicate variables that were used, and minus signs indicate variables that were excluded in each analysis.

blooms in the central Arabian Sea during the winter monsoon. Nezlin et al. (2010) reported a correlation between Chl-a and aerosols when studying inter-annual variations in the Persian Gulf area. Prasanna Kumar et al. (2010) reported an increasing trend in phytoplankton in the central Arabian Sea during winter months of 1997–2007, and attributed it to increasing supply of iron by dust aerosols. Singh et al. (2008) studied a series of dust storms in the northern Arabian Sea during a 3-year period, and reported chlorophyll enhancement within 1–4 days of dust events, but also pointed out other mechanisms that might be responsible for the relationship observed.

Our results for the western Arabian Sea off Somalia indicate only a possible minor role for dust aerosols enhancing Chl-a concentration during the summer monsoon, supplementing the major role of alongshore winds inducing upwelling favorable for phytoplankton growth. The upwelling component of winds off Somalia during summer monsoon season appears to be far stronger than the classic eastern coastal upwelling zones in the world ocean (Bakun et al., 1998). In the data used here, the wind speed was greater than 15 m/s during summer monsoon season over the Somalia coast. Recently, deCastro et al. (2016) studied the evolution of Somali coastal upwelling under future warming scenarios using models. When the intensity of Somali coastal upwelling during summer monsoon season was projected for the twenty first century, the trends showed that changes in coastal upwelling were mainly related to the wind-induced Ekman transport. Further, our findings are consistent with those of Gallisai et al. (2014) for the Mediterranean: they concluded that the main driver of phytoplankton dynamics is the supply of nutrients from the deep water to the surface layers through



Number of observations (N), r<sup>2</sup> , adjusted r<sup>2</sup> and r are shown, for each of the analyses and they are statistically significant (p < 0.05). The first set of calculations with 595 observations is for the whole year. The second set, with 251 observations, is for the dust aerosol events. Plus signs indicate variables that were used, and minus signs indicate variables that were excluded in each analysis.

vertical mixing. However, the results of the multiple regression presented here do not necessarily imply that the effect of aerosols on Chl-a is only 2%, but only that, because AOT covaries with the other variables, especially wind speed, it is difficult to disentangle their individual effects on Chl-a concentration. Perhaps more interesting is the possibility that the effect of dust events on Chl-a enhancement might be a little stronger during the winter monsoon season and rest of the year than during the summer monsoon season (**Tables 2**, **3**), consistent with the results of Prasanna Kumar et al. (2010) for the central Arabian Sea during winter monsoon season. The direction of the winds during the winter monsoon would suggest an origin in the Asian subcontinent for these dust aerosols, rather than the Arabian peninsula.

We used the cross correlation function to study the phase relationship between aerosol (AOT) and phytoplankton (Chla) dynamics. The correlation between the two variables peaked at a lag of 1–2 time steps, with AOT leading. However, since a similar lag was found in the CCF between Chl-a and alongshore winds, it is difficult to attribute a causal relationship to the aerosols by themselves. The phase relationship also throws light on whether or not the biological particles might be enhancing the production of aerosols in the study area. If such events were commonplace, then one would expect that Chl-a enhancement might occur prior to increase in aerosol concentration. The CCF results do not support this in general, but the climatologies of the studied variables (**Figure 3A**) do show that there is a reversal in the phase relationship for a brief period, with Chl-a leading NASA-AOT when Chl-a concentration approaches its peak during the summer monsoon season. However, this result

is not confirmed by CCI-AOT data. Thus, conclusive evidence for biological enhancement of aerosols remains elusive. The intriguing result with the NASA-AOT certainly merits further investigation.

## 5. CONCLUDING REMARKS

Essential Climate Variables, or ECVs, are our sentinels for observation of climate change. However, to understand climate change, it is not sufficient to study individual ECVs in isolation. Instead, it is also important to study how they interact with each other, and to understand how these interactions might change in the future. Of the marine ECVs, Chl-a concentration is the only biological ECV that is currently amenable to routine observations by remote sensing.

In this paper, we have examined one piece of the puzzle, by studying how the variability of Chl-a in the western Arabian Sea is related to those in three other ECVs: aerosols, winds and SST, focussing more on aerosol- Chl-a interactions, using 16 years of satellite data. What emerges is a complex pattern of relationships, in an area where many ECVs co-vary with each other. While it is difficult to elucidate causal relationships from simple correlations, the phase relationships between the variables can throw some light on the underlying causes.

A question that had to be addressed first, when using satellite data for the analysis, was whether there were artifacts in the patterns in Chl-a, introduced by the atmospheric correction process, which depends to some extent on aerosol optical properties. The correlation between Chl-a and NASA-AOT (a byproduct of ocean colour processing) peaking with a lag provided reassurance on this point, since the peak should have been observed at zero lag had processing artifacts been the cause of the correlation. This point was reinforced by repeating the analysis with data from Aerosol-CCI products, which are derived independently of the ocean colour processing chain.

Though the NASA aerosol properties and the CCI aerosol properties are generally consistent with each other, there is a significant phase shift in the time when they peak during the summer monsoon season. The underlying causes for this difference deserve to be investigated further, but fall outside

## REFERENCES


the scope of this paper. In the Somali region, under upwelling regimes, the Chl-a concentration is strongly correlated with wind. Analysis of Ekman Mass Transport supports the hypothesis that wind-induced upwelling is the underlying cause of the high correlation between wind and Chl-a. According to the linear multiple regression analysis, aerosols have a modest effect on Chla, at best, with a lag of one to two time steps during this period. An unexpected outcome from this study is related to the importance of dust aerosols in stimulating Chl-a enhancement during the winter monsoon season, suggesting that the abundance of dust aerosols might enhance Chl-a in the absence of wind-induced upwelling.

# AUTHOR CONTRIBUTIONS

MS carried out all the data analyses and produced the figures. MS and SS wrote the manuscript. SS conceived the scientific plan, with help from TP. TP provided scientific advice, and led the project. GG contributed to the planning and discussions, and along with AB, provided supervision.

# ACKNOWLEDGMENTS

The authors acknowledge Department of Science and Technology (DST), India for the Jawaharlal Nehru Science Fellowship (JNSF) awarded to TP. The authors thank the Director, CMFRI, Kochi for all support and encouragement. This work is a contribution to the Ocean Colour Climate Change Initiative of the European Space Agency, and to the activities of the National Centre for Earth Observations of UK. We also thank the two reviewers for their helpful comments, which have improved the manuscript.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmars. 2017.00386/full#supplementary-material


records from European satellite observations (Aerosol-CCI). Remote Sens. 8:421. doi: 10.3390/rs8050421


Instrument observations: an overview. J. Geophys. Res. Atmos. 112:D24S47. doi: 10.1029/2007JD008809


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Shafeeque, Sathyendranath, George, Balchand and Platt. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Satellite Radiation Products for Ocean Biology and Biogeochemistry: Needs, State-of-the-Art, Gaps, Development Priorities, and Opportunities

Robert Frouin<sup>1</sup> \*, Didier Ramon<sup>2</sup> , Emmanuel Boss <sup>3</sup> , Dominique Jolivet <sup>2</sup> , Mathieu Compiègne<sup>2</sup> , Jing Tan<sup>1</sup> , Heather Bouman<sup>4</sup> , Thomas Jackson<sup>5</sup> , Bryan Franz <sup>6</sup> , Trevor Platt <sup>5</sup> and Shubha Sathyendranath<sup>5</sup>

<sup>1</sup> Scripps Institution of Oceanography, La Jolla, CA, United States, <sup>2</sup> HYGEOS, Euratechnologies, Lille, France, <sup>3</sup> School of Marine Sciences, University of Maine, Orono, ME, United States, <sup>4</sup> Department of Earth Sciences, University of Oxford, Oxford, United Kingdom, <sup>5</sup> Plymouth Marine Laboratory, Plymouth, United Kingdom, <sup>6</sup> NASA Goddard Space Flight Center, Greenbelt, MD, United States

### Edited by:

Laura Lorenzoni, University of South Florida, United States

### Reviewed by:

Andrew Clive Banks, National Physical Laboratory, United Kingdom Giuseppe Zibordi, Joint Research Centre, Italy

> \*Correspondence: Robert Frouin rfrouin@ucsd.edu

### Specialty section:

This article was submitted to Ocean Observation, a section of the journal Frontiers in Marine Science

Received: 24 April 2017 Accepted: 10 January 2018 Published: 14 February 2018

### Citation:

Frouin R, Ramon D, Boss E, Jolivet D, Compiègne M, Tan J, Bouman H, Jackson T, Franz B, Platt T and Sathyendranath S (2018) Satellite Radiation Products for Ocean Biology and Biogeochemistry: Needs, State-of-the-Art, Gaps, Development Priorities, and Opportunities. Front. Mar. Sci. 5:3. doi: 10.3389/fmars.2018.00003 Knowing the spatial and temporal distribution of the underwater light field, i.e., the spectral and angular structure of the radiant intensity at any point in the water column, is essential to understanding the biogeochemical processes that control the composition and evolution of aquatic ecosystems and their impact on climate and reaction to climate change. At present, only a few properties are reliably retrieved from space, either directly or via water-leaving radiance. Existing satellite products are limited to planar photosynthetically available radiation (PAR) and ultraviolet (UV) irradiance above the surface and diffuse attenuation coefficient. Examples of operational products are provided, and their advantages and drawbacks are examined. The usefulness and convenience of these products notwithstanding, there is a need, as expressed by the user community, for other products, i.e., sub-surface planar and scalar fluxes, average cosine, spectral fluxes (UV to visible), diurnal fluxes, absorbed fraction of PAR by live algae (APAR), surface albedo, vertical attenuation, and heating rate, and for associating uncertainties to any product on a pixel-by-pixel basis. Methodologies to obtain the new products are qualitatively discussed in view of most recent scientific knowledge and current and future satellite missions, and specific algorithms are presented for some new products, namely sub-surface fluxes and average cosine. A strategy and roadmap (short, medium, and long term) for usage and development priorities is provided, taking into account needs and readiness level. Combining observations from satellites overpassing at different times and geostationary satellites should be pursued to improve the quality of daily-integrated radiation fields, and products should be generated without gaps to provide boundary conditions for general circulation and biogeochemical models. Examples of new products, i.e., daily scalar PAR below the surface, daily average cosine for PAR, and sub-surface spectral scalar fluxes are presented. A procedure to estimate algorithm uncertainties in the total uncertainty budget for above-surface daily PAR, based on radiative simulations for expected situations, is described. In the future, space-borne lidars with ocean profiling capability offer the best hope for improving our knowledge of sub-surface fields. To maximize temporal coverage, space agencies should consider placing ocean-color instruments in L1 orbit, where the sunlit part of the Earth can be frequently observed.

Keywords: photosynthetically available radiation, average cosine, attenuation coefficient, ocean color, remote sensing

# INTRODUCTION

From the point of view of biology and chemistry, solar radiation in the photosynthetically active range (roughly 400–700 nm), referred to as PAR, controls the growth of aquatic plants (e.g., Ryther, 1956; Platt et al., 1977; Kirk, 1994; Falkowski and Raven, 1997). It ultimately regulates the composition and dynamics of marine ecosystems. Solar radiation in the ultraviolet (UV) (280–400 nm), by damaging cellular constituents, may stress phytoplankton and inhibit their growth (e.g., Cullen and Neale, 1994; Häder et al., 2011). UV light, via photooxidation of colored dissolved organic matter (CDOM), may increase the bioavailability of nutrients (Sulzberger and Durisch-Kaiser, 2009). In the process, absorption by CDOM is reduced, increasing light penetration. Knowing the distribution (spectral, spatial, and temporal) of UV and visible solar radiation in the upper ocean is critical to understanding biogeochemical cycles of carbon, nutrients, and oxygen, and to addressing climate and global change issues, such as the fate of anthropogenic atmospheric carbon dioxide (CO2).

From the point of view of physics, sunlight absorbed by phytoplankton and other water constituents (CDOM, mineral particles, etc.) heats the upper ocean and distributes heat horizontally and vertically, affecting mixed-layer dynamics and oceanic circulation (e.g., Nakamoto et al., 2000, 2001; Ballabrera-Poy et al., 2007). These changes in turn influence atmospheric temperature and circulation, with remote effects (Miller et al., 2003; Shell et al., 2003). Solar radiation diffusely reflected by the ocean also affects the outgoing radiative flux from the planet (planetary albedo), with climate consequences (Frouin and Iacobellis, 2002). In order to make predictions for future conditions, we need to get some idea of how the phytoplankton concentrations and optical properties will evolve with changing conditions. Many processes and feedbacks in which solar radiation absorption plays a role are involved and difficult to untangle, and a large fraction of the uncertainties in projections of future climate is associated with physicalbiological interactions (Friedlingstein et al., 2006).

This article reviews operational satellite radiation products, the user needs and gaps, and it provides a scientific roadmap for the use of, and priorities for improving existing products and developing new products, i.e., for closing the gaps in ocean biology and biogeochemistry, including studies of biologicalphysical interactions and feedbacks. This roadmap emerged from the presentations (oral and poster) and discussions during the Color and Light in the Ocean from Earth Observation (CLEO) workshop at ESRIN, Frascati, Italy on 6–8 September 2016. The following questions are addressed: (1) Do existing shortwave downward flux products meet the requirements of the dynamics and bio-geochemical communities? What can be done to serve better the needs of the user community in general and the modeling community in particular? (2) What additional products should be added to the processing streams to increase their usefulness? What should be the characteristics of these products in terms of temporal, spatial, and spectral resolution, spectral range, and accuracy? (3) What are the needs in terms of harmonization between sensors, methodologies, ancillary data and radiative transfer tools?

# Current Products

The underwater light field is defined at any point in space by the spectral radiance (W/m<sup>2</sup> /sr/nm) from all directions. Useful properties can be derived from radiance, i.e., planar and scalar irradiance, average cosine, reflectance, and vertical attenuation coefficient (see, e.g., Mobley, 1994 or Kirk, 1994 for definitions). Only a few of these properties are presently inferred reliably and operationally from space, namely daily above-surface downward planar irradiance integrated from 400 to 700 nm (known as "daily PAR product"), above-surface planar spectral UV irradiance at noon, spectral reflectance of the water body, and diffuse attenuation coefficient at 490 nm (derived from water reflectance). These products are generated for each ocean color mission individually (note that UV products are not available in standard ocean-color missions).

Examples of level-3 daily, weekly, and monthly above-surface PAR products at 9 km resolution from MODIS-Aqua data are displayed in **Figure 1**. Dates are March 22, March 22–29, and March 1–31, 2010, respectively. The NASA Ocean Color Biology Group (OBPG) in Greenbelt, Maryland generates and archives these products operationally. Typical uncertainty (RMS) is ±6.5 (19%), ±4.2 (12%), and ±2.6 Em−2d −1 (7%) for daily, weekly, and monthly estimates (Frouin et al., 2012). The daily maps exhibit missing values, especially at low latitudes, due to the limited spatial coverage of the instruments, but the weekly and a fortiori monthly maps are completely filled, except at latitudes where the Sun zenith angle at the time of satellite overpass is above 75 degrees, since the data is discarded in the OBPG oceancolor processing line. The weekly and monthly PAR fields exhibit similar patterns, but the monthly product is smoother (lower variability at small scales), which is expected when the averaging period is longer.

The corresponding maps of level-3 daily, weekly, and monthly diffuse attenuation coefficient (Kd) at 490 nm, also produced by the OBPG, are displayed in **Figure 2**. RMS uncertainty on logtransformed instantaneous estimates is about ±0.1 (e.g., Morel et al., 2007). Since K<sup>d</sup> is only retrieved in clear sky conditions, the

daily map has many gaps (only about 10–15 % of the observed pixels typically pass through the strict glint and cloud filters), and the weekly product show many areas with no information. On a monthly time scale, information is still missing at low latitudes. The spatial gaps limit considerably the utility of the K<sup>d</sup> retrievals for propagating light below the surface. In the open ocean, global coverage every 3–5 days is necessary to resolve variability associated with seasonal biological phenomena such as phytoplankton blooms. In coastal waters, wind forcing create "events" (e.g., upwelling) that occur every 2–10 days, and 1 day coverage is the requirement for resolving the event time scale.

**Figure 3** provides examples of level 3 daily, weekly, and monthly maps of noon surface UV irradiance at 1◦ resolution and 324 nm from OMI-Aura. Irradiance at 305, 310, and 380 nm as well as erythemally weighted daily dose and erythemal dose rate are also available. Dates are the same as those of **Figure 1**. The data are routinely processed at the OMI Science Investigatorled Processing System (SIPS) Facility in Greenbelt, Maryland, and are archived at the NASA Goddard Earth Sciences Data and Information Services Center (GES DISC). Overall uncertainty of UV irradiance estimates ranges from ±5 to over ±30%, depending on atmospheric conditions and geolocation (e.g., Arola et al., 2009). The UV irradiance and PAR fields have similar spatial coverage (**Figures 1**, **3**), with missing values due to instrument swath on a daily time scale, except that UV irradiance estimates are obtained at high latitudes. Variability patterns are also similar, since UV irradiance variability is also governed by Sun zenith angle and cloudiness, although ozone absorption plays a bigger role in modulating the surface values (but gradients of total ozone content remain mostly latitudinal). Spatial resolution is coarser than for the MODIS products (OMI sensor footprint is

13 × 24 km<sup>2</sup> ), , and no daily-averaged values (only daily values at noon or overpass time) are generated.

The situation regarding spatial coverage is summarized in **Figure 4**, which displays the percentage of the ocean surface covered by PAR, Kd, and UV irradiance products on daily, weekly, and monthly time scales (imagery of **Figures 1**–**3**). In the equatorial region, percent coverage is 65, 80, and 5% for daily PAR, UV irradiance at 324 nm, and Kd, respectively. It is increased to 100% for weekly and monthly PAR and UV irradiance, and to 30 and 70% for weekly and monthly Kd. In middle latitude regions, daily coverage is almost 100% for PAR, about 80% for UV irradiance, and 15–20% for Kd. Weekly and monthly coverage is 100% for PAR and UV irradiance, and reaches 70–75 and 100% in sub-tropical regions for Kd. At high latitudes (>70◦ ), monthly coverage is <40% for PAR and Kd. This lack of coverage is limiting in view of the large productivity of high latitude marine ecosystems (Southern ocean, Arctic ocean). Furthermore, monthly K<sup>d</sup> products may only contain estimates during a few days, and therefore may not represent accurately actual monthly values in dynamic regions (IOCCG, 2015).

Multiple satellites can improve the daily ocean coverage, especially for K<sup>d</sup> (can only be retrieved in clear sky conditions). For example, three instruments of MODIS type, flying in a constellation on satellite orbits differing by the mean anomaly (angular distance from pericenter), would increase the daily spatial coverage of water reflectance, therefore Kd, from 15 to 25% over 1 day and from 40 to 60% over 4 days (Gregg et al., 1998). **Figure 5** shows, for March 22, 2010, the increase in daily ocean coverage obtained by combining estimates from MODIS-Aqua (overpass at 13:30 local time) and –Terra (overpass at 10:30 local time) instead of using MODIS-Aqua only. For PAR, the increase is from 60 to 80% in equatorial regions, and complete coverage is reached at sub-tropical latitudes. For Kd, the ocean coverage is more than doubled at most latitudes (e.g., 35–50%, instead of 15–20% in the sub-tropics). Combining PAR estimates from instruments orbiting at different times not only increases spatial coverage, but perhaps more importantly, also takes into account cloud diurnal variability, yielding more accurate estimates.

Satellite instruments in geostationary orbit, by observing the same target multiple times during the day (e.g., every 30 min for GOCI onboard COMS and 10 min for AHI onboard Hiwamari-8) offer an efficient way to account for diurnal changes in cloudiness in daily PAR products. **Figure 6** displays daily PAR imagery obtained with GOCI data acquired on April 5, 2011 at 3:16 GMT and at 00:16, 01:16, 02:16, 0.3:16, 04:16, 05:16, 06:16, and 07:16 GMT. In clear-sky regions (north of Japan), the values are close using one or 8 observations, which is expected since the governing parameter is the Sun zenith angle. In cloudy regions (South of Japan), the PAR spatial field is smoother and the range of values smaller since cloudiness changes are accounted for in the daily average. The lowest value is about 12 Em−2d <sup>−</sup><sup>1</sup> with 8 observations instead of 5 Em−2d −1 with one observation. The two types of estimates compare well, with a bias of 0.11 Em−2d −1 (0.2%), i.e., slightly higher values using one observation, and a root-mean-squared (RMS) difference of 5.92 Em−2d −1 (13.5%), largely due to differences in cloudy situations. **Figure 7** displays an example of AHI daily PAR product (July 20, 2011) and the corresponding MODIS-Aqua product. Observations every 10 min were used to estimate daily PAR from AHI data. The patterns of variability are similar in both products, but as for GOCI, the AHI PAR imagery is smoother and contains less extreme values. The AHI product, unlike the MODIS product, does not exhibit spatial gaps. In terms of comparison statistics, the AHI values are lower by 1.38 Em−2d −1 (4.8%) on average, and the RMS difference is 6.46 Em−2d −1 (22.7%).

In summary, existing satellite products generally do not cover the global open oceans (e.g., retrievals limited to Sun zenith angles < 75◦ ), except for UV irradiance, and they do not provide information below sea ice, where significant blooms may develop (e.g., Arrigo et al., 2012). In addition, cloud diurnal variability

FIGURE 5 | Percentage of ocean coverage by daily PAR and Kd products when using MODIS-Aqua data only (red curves) and combining MODIS-Aqua and -Terra data (black curves). Date is March 22, 2010. Coverage is almost total (>80%) in low to middle latitude regions for PAR (left) and at least doubled at most latitudes for Kd (right).

is not accounted for in daily PAR calculations when using polar orbiting satellites. Geostationary satellites observing frequently during the day account properly for cloudiness changes, but the drawback is a decreased spatial resolution at high latitudes and potentially large uncertainties for slanted viewing geometries. Propagation of surface radiation to depth currently assumes that the ocean is homogenous, neglecting potentially important effects of stratification on the absorption of solar radiation. In sum, our view of the underwater light field from space is limited. Nevertheless, the operational radiation products have been used to address a variety of topics related to aquatic photosynthesis, for example biosphere productivity during an El Niño transition (Behrenfeld et al., 2001), phytoplankton class-specific productivity (Uitz et al., 2010), chlorophyll and carbon-based ocean productivity modeling (Behrenfeld et al., 2005; Platt et al., 2008), climate-driven trends in productivity (Behrenfeld et al., 2006; Kahru et al., 2009; Henson et al., 2010), and inter-comparison of productivity algorithms (Carr et al., 2006; Lee et al., 2015). They have also been used to check the stability of CERES measurements (Loeb et al., 2006).

# Users Needs

User needs vary widely in terms of products, spectral, spatial, and temporal resolution, and acceptable uncertainties, depending on the scientific or societal subject of interest. Applications requiring knowledge of radiomeric quantities and apparent properties (radiance, irradiance, average cosine, attenuation coefficients) are multiple and diverse, including phytoplankton phenology, carbon inventory, heat budget and ocean dynamics, fisheries and ecosystem management, toxic algal blooms, and eutrophication (see National Research Council, 2011 for a comprehensive list). Observational and uncertainty requirements (satellite products) generally range from 1 h to 1 day, 0.1 to 50 km, and ±5 to ±20% for PAR and 0.1 to 10 km, 1 to 7 days, and ±10 to ±25% for spectral K<sup>d</sup> (Malenovsky and Schaepman, 2011), but higher resolution may be needed in some cases, for example rapidly changing phenomena occurring in small water bodies. For applications that need analyzing long-term records (e.g., associated with climate), the products need to be sensor independent, consistent, and continuous across satellite missions.

The satellite products should be defined unambiguously and completely, they should be easily accessible, and they should have associated ATBDs with detailed protocols including description of all ancillary data used and their sources. For example, defining a PAR product merely as downward quantum flux at the surface in the 400–700 nm spectral range is insufficient. One needs to precise whether the flux is just above or just below the surface, whether it is instantaneous or time-averaged (e.g., over 24 h for "daily PAR"), and to indicate spatial resolution. Advantages and limitations should be specified (e.g., product not valid at Sun zenith angles above 75◦ , or over sea ice, or in the presence of Sun glint), and uncertainties assessed, preferentially provided on a pixel-by-pixel basis, to make sure that observed changes are interpreted correctly. Computer codes used to derive the products should be available to users with proper documentation, as well as standardized processing tools (e.g., open-source toolboxes).

A survey about satellite PAR observations was conducted in 2015 by Plymouth Marine Laboratory (PML), requesting feedback from the user community on adequacy of available products, including importance, usage, and accuracy, and additional features one would like to see. **Figure 8** summarizes the results about desired PAR attributes and new products and acceptable PAR uncertainties. Most respondents answered it was very or extremely important to have uncertainties associated with PAR products. About 50% of the respondents indicated that ±10–25% uncertainty was acceptable, and about 40% wanted uncertainty better than ±10%. A substantial majority of

respondents ranked PAR below the surface and the fraction of PAR absorbed by phytoplankton about equally as the top two new products to generate. Based on this survey and in view of the extensive list of research and societal applications, current and potential, a list of required (new) products was compiled, including products that may be challenging to generate from space:


# Gap Analysis

Some new products (such as below surface planar and scalar PAR) can be easily implemented while others require development (e.g., APAR, vertical attenuation of PAR). For some products (e.g., under-ice light fields), the readiness level, in terms of methodology, is low. The state-of-the-art, however, is such that the strategy to obtain the new products described above is known and summarized below.

Sub-surface planar PAR and scalar PAR depend essentially on the sunlight transmission across the air-water interface, therefore the angular distribution of radiance incident at the surface and surface roughness (Mobley and Boss, 2012). The 24 h-averaged quantities, as well as the average cosine for total light, can be parameterized as a function of latitude and daily cloud factor (i.e., the ratio of actual PAR and clear-sky PAR) and wind speed. This may require look-up tables for clear sky and overcast quantities. Details about procedures are provided in section Examples of New Products, where examples of subsurface products are presented. Note that in the OBPG approach to estimating above-surface daily planar PAR, it is actually easier (more direct) to compute the downward flux below the surface. This "penetrative" flux is obtained by subtracting from the incident extraterrestrial solar irradiance the reflected flux and the flux absorbed by the surface/atmosphere system. The "penetrative" flux is then corrected by 1/(1−As), where A<sup>s</sup> is the surface albedo, to yield the incident flux onto the surface (e.g., Frouin et al., 2012). This second step introduces uncertainty, but allows for extensive evaluation at PAR measuring sites, an activity that cannot be performed easily for sub-surface fluxes (lack of data and difficulty to measure sub-surface fluxes accurately).

The above variables, integrated over the PAR spectral range, can be calculated without difficulty for the spectral bands of the ocean color sensors (i.e., visible to near infrared), in which cloud absorption is negligible. Providing the information at regular spectral intervals (e.g., every 5 or 10 nm), a requirement of some primary production models (e.g., Sathyendranath et al., 1989; Antoine et al., 1996), is straightforward since cloud optical properties (extinction coefficient, asymmetry factor) are similar, i.e., cloud albedo can be assumed constant in the entire spectral range, and the coupling between molecules and cloud droplets/crystals is relatively small (i.e., fairly unique relation between cloud transmittance at different wavelengths). In spectral regions of strong gaseous absorption, however, uncertainties may be introduced due to the coupling between absorption and scattering processes (depends on the unknown vertical distribution of the absorbers and scatterers). Extending the calculations to the ultraviolet (e.g., UV-A, UV-B) from measurements in the visible is more complicated, but definitely feasible. The complication is not due to cloud optical properties (they remain similar to those in the visible), but to the coupling between molecules and cloud droplets/crystals, which is effective. In other words, the relation between cloud transmittance (in the presence of molecules and aerosols) in the ultraviolet and visible depends on the type of clouds and their location in the vertical. This indicates that de-coupling the clear atmosphere from clouds, as it is done in the OBPG PAR algorithm, may introduce significant errors in the estimation of ultraviolet irradiance. Suitable modeling of the relation between atmospheric transmittance at visible and ultraviolet wavelengths is therefore required, which can be accomplished via radiation transfer calculations to various levels of accuracy depending on user needs (may require additional information on clouds).

From the spectral downward irradiance at the surface, minimally affected by photons reflected by the surface and backscattered by the water body (spherical albedo of the atmosphere is small, i.e., about 0.15 in the visible) and the spectral upward irradiance at the surface, one may compute the spectral surface albedo. The computation, however, can only be done in clear-sky conditions since water optical properties are necessary but generally not retrieved in cloudy conditions from ocean-color sensors. Estimating the spectral upward irradiance at the surface requires retrieving spectral water reflectance (i.e., the signal backscattered by the water body), which is routinely achieved by satellite project offices, and modeling the bidirectional reflectance function of the surface (e.g., using Cox and Munk, 1954) and the water body (e.g., using Morel and Gentili, 1996 or Park and Ruddick, 2005).

Although daily averaged quantities are often used (therefore required) in applications, instantaneous quantities (i.e., determined at time of satellite overpass) can be easily provided. In fact instantaneous products, unlike 24 h-averaged products, do not require assumptions about diurnal changes in atmospheric and oceanic properties. Diurnal variability can be described well using sensors onboard geostationary satellites, such as GOCI and AHI, see section Current Products, all the more as these sensors have ocean-color capabilities. One limitation, however, is the reduced spatial resolution at high latitudes, and managing data from different instruments, which may be operated by different space agencies, to achieve global coverage. Observations from the Earth-Sun the Lagrangian-1 (L1) point, 1.5 million kilometers from Earth, such as those made by the EPIC camera onboard DISCOVR (1–2 h temporal resolution, 21 km spatial resolution), provide a suitable alternative to several sensors on geostationary orbit. Because of the high orbit, spatial resolution at high latitudes is much less an issue. Diurnal information may also be obtained from a constellation of Sun-synchronous instruments with local overpass times spread during the day, for example MODIS-Terra at 10:30 am, SeaWiFS-SeaStar observing at noon, and MODIS-Aqua observing at 1:30 pm. From a unique instrument in Sun-synchronous orbit, ancillary data about variability of clouds and aerosols is necessary, for example Modern-Era Retrospective analysis for Research and Applications version 2 (MERRA-2) products, available at a 1/2 × 2/3 degree every hour for the day of the satellite observation.

Propagating fluxes vertically below the surface requires knowledge of the vertical profile of diffuse attenuation coefficient, Kd. From space, this can only be achieved in clear sky conditions with passive optical sensors, since K<sup>d</sup> is deduced from water reflectance (empirical algorithms) or inherent optical properties (e.g., IOCCG, 2006). The K<sup>d</sup> estimates are actually weighted averages over one optical depth (from which most of the photons from the water body reaching the satellite sensor originate); no vertical information is obtained. In first approximation, one may propagate light below the first optical depth (shallower than the depth of the euphotic zone) by assuming no vertical variation in spectral diffuse attenuation. However, this is generally inaccurate, as many oceanic regions (e.g., oligotrophic provinces) exhibit maximum chlorophyll concentration well below the first optical depth. One has therefore to rely on statistical relations between concentrations of oceanic constituents at the surface and below (e.g., Morel and Berthon, 1989) to estimate the K<sup>d</sup> depth profile, or to use outputs of predictive coupled physical-biogeochemical numerical models. In the future, with the advent of space-borne polarization lidars such as CALIOP onboard CALIPSO or the Aerosol/Cloud/Ecosystems (ACE) lidar (being designed), one will be able to profile K<sup>d</sup> in both clear and cloudy conditions, day and night, up to 3 optical depths in the green (532 nm) at a vertical resolution of 3 to 30 m (e.g., Lu et al., 2014; Behrenfeld et al., 2016). The sub-surface fluxes and the vertical profile of diffuse attenuation coefficient give access to fluxes at the bottom (important for studies of shallow coastal ecosystems), average fluxes in the mixed layer (requires knowledge of mixed-layer depth, e.g., from ocean circulation models), and the upper-ocean heating rate profile. Knowing mixed-layer fluxes and vertical heat distribution is especially useful to characterize the role of solar penetration and biological-physical interactions on ocean circulation and climate (Olhmann et al., 1996; Shell et al., 2003; see section Introduction).

In a homogeneous ocean, APAR depends on the ratio of the spectral absorption coefficient by live phytoplankton, aph, and total absorption (water, yellow substances, non-algal particles), atot, in the PAR spectral range and the spectral planar irradiance just below the surface. In a vertically heterogeneous ocean, the vertical distribution of those quantities plays a role, as well as the vertical distribution of the diffuse attenuation coefficient for downward irradiance, Kd. Computing APAR from space, therefore, requires estimates of spectral planar irradiance below the surface, and vertical profiles of aph, atot, and Kd. Subsurface spectral irradiance and its vertical attenuation can be obtained as discussed above, and absorption coefficients using various techniques (e.g., IOCCG, 2006). Some of these variables are difficult to retrieve with good accuracy, in particular aph (requires partitioning atot into its components), and vertical information in the euphotic zone is generally not directly available (except from future space-borne lidars, see above). Consequently, uncertainties on APAR computations based on satellite estimates of individual variables may be large. Since APAR strongly depends on the spectral ratio of sub-surface reflectance, R(0−) and pure seawater reflectance, Rw(0−), one may envision approximating APAR by a linear combination of R(0−)/Rw(0−) in the PAR spectral range (Frouin et al., 2014). Since APAR is expressed linearly in water reflectance, the method is applicable to average values of water reflectance (e.g., spatially averaged), which may reduce the impact of water reflectance noise on the APAR estimate.

Primary production under sea ice is considerable, as evidenced from in situ measurements of phytoplankton concentrations and suggested by numerical model simulations. In the Arctic Ocean, it may constitute more than 30% of the total production (Popova et al., 2010). As a consequence, the seasonal cycle is shifted, with maximum primary production occurring in July, not in August-September (case of open waters). There is a need, for primary production modeling and studies of underice phytoplankton blooms, to determine light fields under the ice (Laliberté et al., 2016). This is quite difficult to realize from space, since knowledge of sunlight transmission though sea ice is required, and this parameter is quite variable depending on ice type and thickness, and the presence of snow and melt ponds. Transmission models are now becoming available (Arndt and Nicolaus, 2014), and they can make use of sea-ice thickness and age, snow depth, and melt pond fraction observations from microwave and optical sensors (SIRAL on Cryosat-2, AMSR-2 on CGOM-W1, ATLAS on IceSta-2, MODIS on Terra and Aqua). One issue is separating the contribution of clouds to the planetary albedo, in order to access the surface albedo. Algorithms are not mature for generating operationally shortwave fluxes under sea ice.

The possible methodologies, difficulties, opportunities, and readiness level for developing and creating the new products were discussed above. The following recommendations regarding these products are made:


community will, typically, want to access data using FTP. Most users do not care about the satellite mission from which a product was derived, but rather care about the products being continuous in time and consistent across missions.

6) Cross-agency efforts should be made to homogenize their respective products so it is easy for users to use these products (e.g., the definition of a PAR product should be the same). For climate relevant products, it is critical to merge them (and debias) across missions so that models to not experience secular jumps as they assimilate such data.

# EXAMPLES OF NEW PRODUCTS

# Par Simulator Radiative Transfer Code

To develop new shortwave radiation products from satellite data (such as those discussed above) and assess accuracies, one needs to simulate the TOA radiance measured by a given sensor and the variable to retrieve (e.g., planar or scalar spectral irradiance below the surface, average cosine). For this, we use the Speed up Monte-Carlo Advanced Radiative Transfer using GPU (SMART-G) radiative transfer code (Ramon et al., 2017). This code, based on the Monte-Carlo method, is fast and massively parallel. It computes the complete light field (i.e., radiance, including polarization) in the ocean and atmosphere.

The code simulates the transfer/propagation of solar radiation in a 1-dimensional coupled ocean-atmosphere system with a wavy interface. It accounts for absorption and scattering by molecules, aerosols, and hydrosols, and Fresnel reflection/refraction at the interface. Polarization properties of the various atmospheric and oceanic constituents and the surface are explicitly considered. Inelastic processes (i.e., Raman scattering, fluorescence) are omitted in the current version. The ocean can be infinitely deep or bounded by a reflective bottom at finite depth. The computations are made in either plane-parallel or spherical geometry. The four components of the Stokes vector can be obtained at any wavelength of the solar spectrum and any level of the coupled ocean-atmosphere system.

Gaseous absorption is treated either by correlated kdistribution (Kato et al., 1999) or an equivalent like REPTRAN (Gasteiger et al., 2014; Emde et al., 2016). In the multispectral mode, each photon is assigned a wavelength. All optical properties of the medium are pre-calculated for these wavelengths. The spectrum is computed in one pass but is under the influence of Monte-Carlo noise. For higher spectral resolution computations without spectral noise, for example in order to handle line-by-line (LBL) gas absorption, the ALIS method (Emde et al., 2011) is also implemented in SMART-G. This allows for the calculation of spectra by tracing the photon paths once for all wavelengths (by absorption/scattering decoupling).

### Simulations

In the Monte-Carlo code, photon packets are carrying planar irradiance (Wm−<sup>2</sup> ) perpendicular to their direction of propagation. Therefore downward or upward spectral planar irradiances are obtained by performing the weighted sum of irradiances crossing a unit area of horizontal surface located above or below the air-sea interface. For spherical irradiances, each photon is assigned an additional weight of 1/cos(θi) where θi is the photon's zenith angle arriving on the detector. The runs were optimized for the calculation of spectral daily fluxes above and below the ocean surface. For 1 day, one run was executed by injecting a large number of photons whose wavelength was chosen randomly as well as the injection angles at TOA and in the atmosphere depending on the hour of the day. Typical runtime is 1 s to reach ±1% uncertainty for clear sky conditions and for 1 day. It becomes 30 s for a totally overcast situation with a cloud optical thickness of 50.

The aerosol and cloud optical properties are taken from the OPAC database (Hess et al., 1998) and distributed within the libradtran software package (www.libradtran.org). Aerosols are supposed to be spherical. The clouds are supposed to be composed of liquid water droplets with a varying effective radius. Rayleigh optical depth is computed according to Bodhaine et al. (1999). The gaseous absorption is parameterized as a correlated k-distribution with spectral intervals of 10 nm from line-by-line calculations using the Py4Cats code (Schreier and Gimeno Garcia, 2013) and HITRAN 2012 (Rothman et al., 2013) absorption parameters for H2O and O2. Ozone and NO<sup>2</sup> smooth absorption coefficient are taken from Bogumil et al. (2003). The wind-roughened sea surface is modeled as an ensemble of uncorrelated facets with a slope distribution (Cox and Munk, 1954) and a uniform azimuth distribution. The reflection and transmission coefficients are then computed using Fresnel's laws. Ocean bulk optical properties correspond to Case-I waters with a chlorophyll-a concentration of 0.5 g.m−<sup>3</sup> . The ocean's bottom is black and is located at a depth of 50 m. The ocean phase function is represented by a Fournier-Forand function with a varying truncation angle.

An example of outputs of the SMART-G code in the context of PAR simulations (both spectral irradiance and reflectance) is given in **Figure 9** for various levels in the atmosphere-ocean system (top, just above the surface, just below the surface, and at the black bottom of the water column) and several atmospheric conditions (clear and cloudy situations with different Sun zenith angles). The TOA reflectance in the MERIS bands is also indicated. The various graphs show the importance of Sun zenith angle and cloud optical thickness in controlling the downward PAR above and below the surface. They also illustrate the usefulness of the SMART-G tool for algorithm development.

For the year 2011, and for 14 latitudes ranging from −64.5 to 64.5, the various spectral irradiances listed in the Annex were computed at 11 times regularly distributed throughout the day. The typical spectral TOA reflectance for a MERIS-like instrument measuring at 10:30 local time was also computed. The atmospheric content was changing according to the MERRA-2, hourly, gridded datasets of water vapor and ozone contents, aerosol optical depths of black carbon, dust, organic carbon, sea salt, and sulfates aerosols at 550 nm, and cloud optical thicknesses. Individual clear sky and overcast plane-parallel radiative transfer calculations were then mixed using the Independent Pixel Approximation (IPA) using MERRA-2 cloud cover variable as the mixing value.

The following quantities were then computed: PAR0<sup>−</sup> o (t) and PAR0<sup>+</sup> d (t) (to describe diurnal variation of sub-surface fluxes), PAR0<sup>+</sup> d 24h (the main product), PAR0<sup>−</sup> o 24h (the key product for primary production and photo-chemical processes), PAR0<sup>−</sup> o 24h (λi) (spectral scalar flux, as requested by modelers, at a resolution of 10 nm), and µ 0− 0 24h (to characterize the angular structure of the light field). Definitions are provided in the Annex.

### Algorithm for hPAR0<sup>−</sup> o i24h and hµ 0− 0 i24h Rationale

We define a Cloud Factor (CF) as the deviation from a pure clear sky daily averaged PAR above the surface. It is a "measure" of the influence of clouds on the daily PAR:

$$
\langle CF \rangle\_{24h} = \frac{\langle PAR\_d^{0+} \rangle\_{24h}}{\langle PAR\_d^{0+} \rangle\_{24h}^{clear}}.
$$

One typical day of simulations is displayed in **Figure 10** for a latitude of 55.5◦N. The Sun zenith angle and the length of the day mainly drive the diurnal cycle of both clear sky PAR and partly cloudy PAR. For that particular day, the influence of cloudiness is important (and somehow stable) reducing the daily PAR by a factor hCFi24<sup>h</sup> = 0.34. The average cosine µ 0− 0 is very stable for cloudy conditions, with a value around 0.82 throughout the day. A clear sky has a variable µ 0− 0 culminating at noon. In both cases the spectral variation of µ 0− 0 , not shown here, is weak. The 24 haveraged value of µ 0− 0 should be representative of the direction of propagation of the largest fraction of the daily PAR. That is why we define the 24 h-averaged µ 0− 0 24h as the ratio of the 24 h-averaged net and scalar PAR (and not the 24 h average of their instantaneous ratios):

$$
\langle \mu\_0^{0-} \rangle\_{24h} = \frac{\langle E\_{net}^{0-} \rangle\_{24h}}{\langle E\_o^{0-} \rangle\_{24h}}.
$$

The normalized spectral PAR, obtained by dividing the spectral PAR by its value at 675 nm, is defined in the same manner as:

$$\left\langle \widetilde{PAR\_o^{0-}} \right\rangle\_{24h} (\lambda\_i) = \frac{\left\langle \widetilde{PAR\_o^{0-}} \right\rangle\_{24h} (\lambda\_i)}{\left\langle \widetilde{PAR\_o^{0-}} \right\rangle\_{24h} (675 \text{nm})}$$

It givesthe spectral shape of the PAR and it is very stable and close to the TOA solar irradiance spectrum in most cases. The spectral shape of the clear sky PAR is slightly influenced by the mean Sun zenith angle because the absorption of ozone in Chappuis bands becomes more and more effective. The spectral dependence of the cloudy PAR is smaller. It increases in the blue part of the spectrum as cloud influence increases.

Following the approach of Mobley and Boss (2012), 24 h-averaged secondary radiative quantities may be obtained from a reduced set of parameters, the most important ones being the location and date which control the day length and mean Sun zenith angle, then the influence of the clouds which is between null (clear sky) and maximum (100% cloud cover), and finally the wind speed. The chlorophyll content

just above (0+) or below (0−) the surface, or at the black bottom (superscript B) of the ocean located here at a depth of 50 m. On the right axis is plotted the TOA spectral reflectance at the same resolution, and at the center of MERIS wavebands. The top two graphs are for a clear-sky situation for two Sun zenith angles (SZA) and the bottom two graphs a liquid water cloud, located between 2 and 4 km with droplets of effective radius = 11 mm whose of optical thickness 10 is added. The clear atmosphere model is the US62 standard atmosphere, with maritime polluted aerosols as described in the OPAC database with an AOT at 550 nm equal to 0.1. The ozone column is 300 DU and the precipitable water quantity is 2 g/cm<sup>2</sup> .

of the water is of minor importance for calculating the scalar and net PAR just below the surface. **Figure 11** displays the coefficients to be applied to PAR0<sup>+</sup> d 24 h in order to obtain PAR0<sup>−</sup> o 24h , or PAR0<sup>−</sup> d 24h , and also shows how µ 0− 0 24h and <sup>h</sup>PAR]0<sup>−</sup> <sup>o</sup> i24<sup>h</sup> (λi) vary for various latitudes and wind speeds and for the two extreme cases, i.e., clear and totally overcast. **Figure 12** displays the result of the application of the coefficients described above for estimating the scalar PAR below the surface. We processed one full year of global simulated data (2011, see section PAR Simulator) for 3 wind speeds (0, 7, and 15 m.s−<sup>1</sup> ) and checked the quality of the regression vs. the "actual" (or prescribed) values, i.e., the values obtained by running the Monte Carlo code with the various input variables (MERRA-2 hourly data, see above). These values provide the reference in calculating algorithm performance statistics. The regression is excellent with a residual bias and a R.M.S. difference of about 0.5 mol.ph m−<sup>2</sup> .day−<sup>1</sup> . Wind speed or cloudiness do not apparently impact the quality of the regression.

**Figures 13**, **14** depict the normalized spectral PAR and average cosine for the year 2011, for a latitude of 55.5◦N and a wind speed of 7 m.s−<sup>1</sup> . For both parameters, the clear sky and totally overcast situations are also plotted. They constitute an envelope wihin which the actual values are included, and the deviation from the clear sky value seems proportional to the actual cloud factor hCFi24<sup>h</sup> . This suggests a method to derive <sup>h</sup>PAR]0<sup>−</sup> <sup>o</sup> i24<sup>h</sup> (λi)

and µ 0− 0 24h from look-up tables of the clear sky and overcast situations and from an estimation of the actual cloud factor.

### Equations

We propose to use the observed cloud factor hCFi obs as a proxy of hCFi24<sup>h</sup> and then linearly interpolate between clear sky and overcast look-up tables as a function of hCFi obs. We have:

$$\begin{aligned} \langle \text{CF} \rangle^{\text{obs}} &= \frac{\langle \text{PAR}\_{\text{d}}^{0+} \rangle\_{24.h}^{\text{obs}}}{\langle \text{PAR}\_{\text{d}}^{0-} \rangle\_{24.h}^{\text{clear}}}\\ &\quad \langle \overbrace{PR\_{0}^{0-}}^{\text{obser}} \rangle\_{24h}^{\text{clear}} \langle \lambda\_{i} \rangle - \left\langle \overbrace{PR\_{0}^{0-}}^{\text{outer}} \right\rangle\_{24h}^{\text{opercast}}}{1 - \langle \text{CF} \rangle\_{24h}^{\text{avercast}}}\\ \langle \overbrace{PR\_{0}^{0-}}^{0-} \rangle\_{24h} \langle \lambda\_{i} \rangle &= \text{S1.} \cdot \left( \langle \text{CF} \rangle\_{24h}^{\text{obser}} - \langle \text{CF} \rangle\_{24h}^{\text{opercast}} \right) \\ &\quad + \langle \overbrace{PR\_{0}^{0-}}^{\text{opercast}} \rangle\_{24h} \quad \langle \lambda\_{i} \rangle \\ &\quad \quad \quad \quad \quad \quad \quad \langle \mu\_{0}^{0-} \rangle\_{24h}^{\text{clear}} - \langle \mu\_{0}^{0-} \rangle\_{24h}^{\text{opercast}} \\ &\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \langle \mu\_{0}^{0-} \rangle\_{24h}^{\text{opercast}} \end{aligned}$$

The error will be large when cloudiness changes a lot during the day and thus we may suspect that hCFi obs deviates substantially from hCFi24<sup>h</sup> and when the values of the parameters in clear or totally cloudy conditions are substantially different.

### Look- Up Tables for Clear Sky and Overcast Situations

### **Models**

The clear sky model is based on the AFGL US 62 standard atmosphere with a surface pressure of 1012.15 hPa, an O<sup>3</sup> vertically integrated content of 300 DU, and a H2O vertically integrated content of 2 g.cm−<sup>2</sup> . The aerosol model is the maritime clean model from the OPAC database with an AOT of 0.1 at 550 nm. The air-sea interface is a wind-roughened surface. The ocean bulk optical properties correspond to Case-I waters with a chlorophyll a concentration of 0.5 g.m−<sup>3</sup> . The ocean bottom is black and is located at a depth of 50 m. For the totally overcast model we added a permanent cloud layer between 5 and 10 km consisting of water droplets with reff = 11µm and a cloud optical thickness of 50. The aerosols, hydrosols and cloud phase matrices are computed at 550 nm and are assumed spectrally invariant between 400 and 700 nm.

### **Computations**

For latitudes between −90◦ and 90◦ by step of 10◦ , for every 30 days along the year, and for 3 wind speeds: 0, 7 and 15 m.s−<sup>1</sup> , we computed the following quantities:

$$
\langle \text{CF} \rangle\_{24h}^{\text{overcast}}, \langle \text{PAR}\_o^{0-} \rangle\_{24h}^{\text{overcast}} \quad (\lambda\_i), \langle \text{PAR}\_o^{0-} \rangle\_{24h} \quad (\lambda\_i), \langle \mu\_0^{0-} \rangle\_{24h}^{\text{overcast}}, \quad \lambda\_i
$$

$$
\text{and } \langle \mu\_0^{0-} \rangle\_{24h}^{\text{clear}}.
$$

### First Results

For the MERIS sensor, first examples of the new parameters (daily and monthly global products for May 15 and May 1– 31, 2011) are displayed in **Figures 15**–**17**. Data at high latitudes were masked using ESA CCI Sea Ice Concentration products v2.0 (cci.esa.int). The scalar PAR below the surface (**Figure 15**) follows the planar PAR above the surface (not shown here), but the values are somewhat higher, as expected. The spatial coverage of the daily MERIS product is less than for the MODIS products (**Figure 1**), because of the narrower swath of MERIS and the glitter mask (more glint in the MERIS imagery). The average cosine product (**Figure 16**) is smoother because it is influenced mainly by the average solar elevation. The latitudinal variation is important with the highest values (0.85) in the tropics whatever the cloudiness. At high latitudes the contrast between clear and cloudy sky conditions becomes more marked with an average cosine closer to the tropics values in cloudy sky conditions and the lowest values (0.65) under clear skies. For monthly averages, the mixture between cloudy and clear skies tends to further

= 50, constant along the day) and for several wind speed as a function of latitude. (Top left) Ratio of planar PAR below and above surface. (Top right) Ratio of scalar PAR below the surface to planar PAR above surface. (Bottom left) Average cosine below the surface. (Bottom right) Ratio of spectral scalar PAR below the surface at 405 and 765 nm. The date is 21st of June.

smooth the product, which becomes a simple function of latitude. The normalized spectral PAR at 405 nm (**Figure 17**), like the mean cosine, exhibits a contrast between clear and cloudy sky situations that is increasing toward the high latitudes. However on a monthly time scale the latitudinal gradient is very weak and the dispersion of the product is also very weak with a mean of 0.63 and a standard deviation of 0.024 over the global ocean.

### Algorithm Uncertainties for h**PAR**0<sup>+</sup> **d** i24**<sup>h</sup>**

Associating uncertainties to the satellite radiation products, preferentially on a pixel-by-pixel basis, is obligatory to quantify their quality. This is important to ensure that variability or trends detected in scientific analyses are of geophysical nature, i.e., that the data are interpreted properly in view of their strengths and limitations, and to merge different data sets. This is also essential for data assimilation, a primary application of the products (large uncertainties will have little impact on model runs, small uncertainties will constrain the model to behave like the data). Expressing uncertainties requires modeling the measurement, identifying all possible error sources (e.g., noise in the input variables, imperfect/incomplete mathematical model), and determining the combined uncertainty, as described in JGCM-100 (2008) and subsequent publications.

In the following, algorithm uncertainties associated with PAR0<sup>+</sup> d 24h are considered, i.e., those due to model approximations and parameter errors (e.g., decoupling effects of clouds and clear atmosphere, neglecting diurnal variability of clouds, using aerosol climatology) assuming that the input variables (TOA reflectance at wavelengths in the PAR spectral range) are known perfectly. A procedure is provided to estimate and provide, for each pixel of a product, this uncertainty component of the total uncertainty budget, which is expected to dominate. The uncertainty characterization has been done using an extended simulation dataset covering the 2003–2012 time period still using 1 hourly MERRA-2 input data. The large number of data points allows one to sample well the atmospheric variability and in particular many variations of daytime nebulosity, for all latitudes.

**Figure 18** displays the result of the uncertainty analysis for the whole dataset. The bias and the standard deviation of the daily PAR estimates are plotted as a function of the clearsky PAR (which depends itself mainly on latitude and date) and the actual cloud factor for that day. The Monte Carlo calculations, assumed accurate, provide the reference. This is justified in view of the bias and standard deviation values. The bias exhibits a slight dependence on the clear sky PAR, suggesting an overestimation reaching 2.5 E.m−<sup>2</sup> .day−<sup>1</sup> (∼4%) for the maximum clear sky PAR values, and a slight underestimation 1 E.m−<sup>2</sup> .day−<sup>1</sup> (∼6.5%) when the clear sky PAR reaches a low value of 15 E.m−<sup>2</sup> .day−<sup>1</sup> . When looking at the dependence upon hCFi24<sup>h</sup> , a slight overestimation exists between 1 and 1.5 E.m−<sup>2</sup> .day−<sup>1</sup> . This overestimation is independent of the cloudiness of the day, with the exception of totally clear days (hCFi24<sup>h</sup> <sup>=</sup> 1), for which the bias drops to 0.5 E.m−<sup>2</sup> .day−<sup>1</sup> . The bias is also quite small for totally overcast situations. A bias correction can be considered, based on the clear-sky PAR value.

The standard deviation (SD) is peaked toward intermediate cloud factors, where the risk of deviation of cloudiness at the time of the satellite measurement and the mean cloudiness of the day is maximum. When the clear sky PAR is large and hCFi24<sup>h</sup> is about 0.5, SD reaches 8 E.m−<sup>2</sup> .day−<sup>1</sup> (∼11%). But SD drops to 1 E.m−<sup>2</sup> .day−<sup>1</sup> (1.5%) for clear sky situations (hCFi24<sup>h</sup> = 1). SD is drastically reduced when dealing with monthly PAR estimates. Whatever the cloudiness, SD is lower than 2 E.m−<sup>2</sup> .day−<sup>1</sup> . The main feature is that SD seems to be proportional to the clear-sky PAR value, and thus we can consider associating an uncertainty to each pixel of the product from an estimate of the clear-sky value, in addition to the cloud factor. This model uncertainty component could be extended to situations with multiple satellite

spectral band (400–410 nm) for all the days of 2011 (dots), for a particular latitude and wind speed, as well as predicted values for clear sky (solid line) and totally overcast situations (dashed lines). (Bottom) Same as top but normalized by the band integrated scalar PAR below the surface between 670 and 680 nm. The color of the dots is a function of the cloud factor.

measurements per day, as it is often the case for high latitudes with polar orbiting sensors like MERIS or VIIRS. In that case we anticipate a reduction of the standard deviation of the daily PAR product. For a complete per-pixel uncertainty budget, the uncertainty associated with TOA reflectance noise (radiometric and due to vicarious calibration) should be included, which may require evaluating the sensitivity of the daily PAR to the TOA reflectance and the covariance of the input reflectance in the various spectral bands (since the measurements are correlated).

### SUMMARY AND RECOMMENDATIONS

Studying and understanding the chemical, physical, geological, and biological processes that govern the composition of the marine environment requires knowledge of the underwater light field. Ideally, one wants to figure out and monitor the spectral and angular structure of the radiant intensity at any point in the water body. From space, by means of remote sensing, only limited information about the radiative properties of a water body can be obtained, but the advantage is global and repetitive coverage. Existing satellite products are restricted to planar PAR and UV irradiance above the surface, and diffuse attenuation coefficient (average in the upper layer). These products, despite their drawbacks (e.g., no information at high Sun zenith angles and diurnal variability poorly described for PAR, no retrieval in cloudy conditions for the diffuse attenuation coefficient), have been useful to many studies of aquatic photosynthesis, heat budget, and chemical effects of light. There is a need, however, for other products, i.e., sub-surface planar and scalar fluxes, average cosine, spectral fluxes (UV to visible), diurnal fluxes, APAR, surface albedo, vertical attenuation, and heating rate, and for associating uncertainties to any product on a pixel-bypixel basis. Possible approaches and methodologies to generate these new products, in view of state-of-the-art knowledge and current and planned satellite missions, were discussed, including difficulties, assumptions, and readiness level. A strategy and

FIGURE 14 | Mean cosine PAR for all the days of 2011 (dots), for a particular latitude and wind speed, as well as predicted values for clear sky (solid line) and totally overcast situations (dashed lines). The color of the dots is a function of the cloud factor.

a roadmap, with development priorities and opportunities to obtain the new products, could be established. Examples of new products, i.e., daily scalar PAR below the surface, daily average cosine for PAR, and spectral scalar fluxes below the surface, and their algorithms, were presented. A statistical way to estimate uncertainties for each pixel, based on radiative transfer simulations for expected clear and cloudy situations, was proposed.

In the short term, the focus should be on improving/completing existing products from satellite ocean-color sensors, i.e., extending calculations of abovesurface fluxes to high Sun zenith angles and accounting, at least statistically, for diurnal variability of clouds. One should also work on how to compute similar products from different missions and their likely uncertainties, to obtain platformindependent global products. On the other hand, with existing scientific knowledge, the following new products (see sections Current Products and Users Needs) should be derived from past and current missions: planar and scalar PAR below the surface, average cosine for PAR below the surface, diffuse attenuation coefficient for PAR, and respective spectral quantities (visible to near infrared), spectral albedo, and APAR. These products should be in the form of daily averages, except for diffuse attenuation coefficient. New processing lines will require links with other ocean and atmosphere products and/or ancillary data (e.g., reanalysis products from observations and models, such as MERRA-2). A calibration/validation program should be planned to evaluate the new products and their uncertainty over a representative set of conditions.

In the medium term (longer-term effort, not lower priority), with specific efforts and new avenues in algorithm development, the aim should be to generate the following products, not only from past and current missions, but also from future missions: spectral fluxes below the surface and diffuse attenuation in the UV (or integrated over UV-A and UV-B ranges), in both photon and energy units, especially using TROPOMI on Sentinel 5P/5, average mixed-layer PAR (will require mixed-layer depth fields from Argo-assimilated circulation models), upperocean heating profile, and under-ice light fields (in conjunction with cryosphere missions and modeling). Diurnal-cycle resolving measurements, combining different satellites over-passing at different times with geostationary satellites, should be pursued

FIGURE 18 | Error budget of the daily "MERIS" PAR product above the surface (estimated PAR–actual PAR) from simulations for the period 2003–2012 using 1-hourly resolved MERRA-2 input data. For each day, a set of MERIS spectral reflectance data is simulated for a typical observation at 10:30 UT local time and several viewing geometries (nadir and 20◦ view zenith angle (VZA) with relative azimuth of 0, 90, and 180◦ , with sun glint avoidance). (Left) 2D plot of the error bias as a function of the clear sky daily PAR above surface (x axis) and cloud factor (y axis), along with 1D (marginal) distribution. (Right) Same as left but for the error standard deviation. For the 1D marginal distribution as a function of the clear-sky PAR is also reported the monthly PAR error bias and standard deviation in magenta.

to describe hourly changes in radiation fields and improve dailyintegrated values (e.g., due to clouds). The products should also be generated without gaps (applying gap-filling techniques) to provide boundary conditions for general circulation models. At this stage, evaluation of the products and their uncertainty should be ongoing on a continuous basis.

In the long term (future vision), for significant improvement of sub-surface light fields, space lidars could be used (e.g., the CNES MESCAL), as they can resolve the vertical distribution of material in the ocean (while all the products described above, derived from passive optical sensors, assume a homogeneous upper ocean). This is particularly critical in high-latitude regions (near the ice) and near land. Approaches using hyper-spectrally resolved sensors such as SCHIMACHY to retrieve the availability of light in the ocean (depth-integrated scalar irradiance) from the vibrational Raman scattering effect of water molecules (Dinter et al., 2015) should be explored. Finally, satellites missions with instruments in L1 orbit (such as the NASA EPIC onboard DISCOVR) would offer the opportunity to continuously observe the sun-lit part of the ocean, maximizing the temporal coverage. Space agencies should consider exploiting using such orbit for an ocean-color satellite mission.

### AUTHOR CONTRIBUTIONS

All the authors contributed to the conception and organization of the study, to the analysis of the various products and results, and to the writing of the findings. RF, DR, and EB arranged the contents of the various sections, and wrote the first version of the manuscript. HB provided insight into light and biogeochemical processes. JT and BF carried out the work related to current MODIS and OMI products. DR, DJ, MC, and RF developed algorithms for new products and devised ways to associate

## REFERENCES


uncertainties. DR, DJ, and MC generated examples of new products. TJ, TP, and SS organized the survey about satellite PAR observations and compiled and synthesized user responses. All the authors critically revised the manuscript and helped to bring about its final version.

# FUNDING

The National Aeronautics and Space Administration (NASA) and the European Space Agency (ESA) provided funding for the study under Grant NNX14AL91G and Contract 40001 1 3690/1 4/I-LG, respectively.

## ACKNOWLEDGMENTS

The authors gratefully acknowledge the NASA Ocean Biology Processing Group (OBPG) and the OMI Science Investigator-led Processing System (SIPS) Facility at Goddard Space Flight center, Greenbelt, Maryland for generating and maintaining oceancolor and radiation products used in the study, and Mr. John McPherson, from Scripps Institution of Oceanography and Mr. Monstreleet and Mr. Laurent Wandrebeck from HYGEOS for providing technical support. The survey about PAR products, which involved human participants, did not require ethics approval as per institutional and national guidelines. The participants consented to release the results by virtue of survey completion.


Falkowski, P. G., and Raven, J. A. (1997). Aquatic Photosynthesis. Oxford: Blackwell.


Ocean from satellite ocean color/in situ chlorophyll-a based models. J. Geophys. Res. 120, 6508–6541, doi: 10.1002/2015JC011018


changes of DOM absorption properties and bioavailability. Aquat. Sci. 71, 104–126. doi: 10.1007/s00027-008-8082-5

Uitz, J., Claustre, H., Gentili, B., and Stramski, D. (2010). Phytoplankton classspecific primary production in the world's oceans: seasonal and interannual variability from satellite observations. Glob. Biogeochem. Cycles 24:GB3016. doi: 10.1029/2009GB003680

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Frouin, Ramon, Boss, Jolivet, Compiègne, Tan, Bouman, Jackson, Franz, Platt and Sathyendranath. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# ANNEX: NOTATIONS AND DEFINITIONS

### Irradiances (W.m−<sup>2</sup> .nm−<sup>1</sup> )


$$E\_o^{0\pm} \left(\lambda, t\right) = \int\_{\phi=0}^{2\pi} \int\_{\theta=-\pi/2}^{\pi/2} L^{0\pm} \left(\lambda, t, \theta, \phi\right) \sin\left(\theta\right) d\theta d\phi$$

where L <sup>0</sup><sup>±</sup> is the radiance just above or just below the surface (superscript 0+ or 0−), λ is wavelength, t is time, and θ and φ are zenith and azimuth angles.


$$E\_d^{0\pm} \left(\lambda, t\right) = \int\_{\phi=0}^{2\pi} \int\_{\theta=0}^{\pi/2} \cos\left(\theta\right) L^{0\pm} \left(\lambda, t, \theta, \phi\right) \sin\left(\theta\right) d\theta d\phi$$


$$E\_{\mu}^{0\pm} \left(\lambda, t\right) = \int\_{\phi=0}^{2\pi} \int\_{\theta=-\pi/2}^{0} \cos\left(\theta\right) L^{0\pm} \left(\lambda, t, \theta, \phi\right) \sin\left(\theta\right) d\theta d\phi$$


$$E\_{net}^{0\pm}(\lambda, t) = E\_d^{0\pm}(\lambda, t) - E\_u^{0\pm}(\lambda, t)$$

### Band-Integrated Irradiances (W.m−<sup>2</sup> )


$$E\_o^{0\pm}(t) = \int\_{\lambda = 400nm}^{700nm} E\_o^{0\pm}(t, \lambda) \, d\lambda$$


$$E\_d^{0\pm} \ (t) = \int\_{\lambda = 400nm}^{700nm} E\_d^{0\pm} \ (t, \lambda) \, d\lambda$$

### Band-Integrated Quanta Fluxes (mol.ph.m−<sup>2</sup> .s−<sup>1</sup> )


$$PAR\_o^{0\pm}(t) = \int\_{\lambda = 400nm}^{700nm} \frac{E\_o^{0\pm}(t,\lambda)\cdot\lambda}{hcN\_A} d\lambda$$


$$PAR\_d^{0\pm} \left( t \right) = \int\_{\lambda = 400 nm}^{700 nm} \frac{E\_d^{0\pm} \left( t, \lambda \right) \cdot \lambda}{hcN\_A} d\lambda$$


$$PAR\_o^{0\pm} \ (t, \lambda\_i) = \int\_{\lambda\_i - \Delta\lambda/2}^{\lambda\_i + \Delta\lambda/2} \frac{E\_o^{0\pm} \ (t, \lambda) \cdot \lambda}{hcN\_A} d\lambda$$

where λ<sup>i</sup> is the central wavelength of any spectral interval 1λ in the PAR spectral range, h is Plank's constant, c is the speed of light, and N<sup>A</sup> is the Avogadro number.

### Average Cosines (Dimensionless)


$$
\mu\_0^{0\pm} \ (\lambda, t) = \frac{E\_{net}^{0\pm} \ (\lambda, t)}{E\_o^{0\pm} \ (\lambda, t)}
$$


$$
\mu\_0^{0\pm} \ (t) = \frac{E\_{net}^{0\pm} \ (t)}{E\_o^{0\pm} \ (t)}
$$

# 24h-Averaged Quantities


$$\langle PAR\_o^{0\pm} \rangle\_{24h} = N\_{\text{sec}} \cdot \int\_0^{1day} PAR\_o^{0\pm} \ (t) dt$$

where Nsec = 86400 s.day−<sup>1</sup> and t is expressed in day. -Daily downward planar PAR-integrated quanta flux (mol.ph.m−<sup>2</sup> .day−<sup>1</sup> )

$$\left\_{24h} = N\_{\text{sec}} \cdot \int\_0^{1day} PAR\_d^{0\pm} \ (t) dt$$


$$
\langle \mu\_0^{0 \pm} \rangle\_{24h} = \frac{\langle E\_{net}^{0 \pm} \rangle\_{24h}}{\langle E\_o^{0 \pm} \rangle\_{24h}}.
$$

# Optical Classification of the Coastal Waters of the Northern Indian Ocean

S. Monolisha<sup>1</sup> , Trevor Platt 1,2, Shubha Sathyendranath<sup>3</sup> , J. Jayasankar <sup>1</sup> , Grinson George<sup>1</sup> \* and Thomas Jackson<sup>2</sup>

<sup>1</sup> Fishery Resources Assessment Division, Central Marine Fisheries Research Institute, Kochi, India, <sup>2</sup> Plymouth Marine Laboratory, Plymouth, United Kingdom, <sup>3</sup> Plymouth Marine Laboratory, National Centre for Earth Observation, Plymouth, United Kingdom

Coastal waters are optically diverse; studying their optical characteristics is an important application of satellite oceanography. In coastal ecosystems of the northern Indian Ocean, optical diversity has been little studied, except for the global analysis by Mélin and Vantrepotte (2015). This paper is a contribution toward identification and characterization of optical classes in the coastal regions of the northern Indian Ocean. The study identified eight optical classes using the monthly climatological datasets of remote sensing reflectance for the 1998–2013 period from the Ocean Color Climate Change Initiative (OC-CCI, www.oceancolour.org). The optical classification we adopted uses the fuzzy logic method, based on Moore et al. (2009). The seasonal variations of the eight resultant optical classes of the coastal waters of the northern Indian Ocean were explored. From the mean reflectance spectral signals obtained, it appears that classes 1–6 belong to Case-1 waters and classes 7 and 8 correspond to Case-2 waters. Classes 1 to 2 appear in deeper oligotrophic waters; classes 3–6 are present in intermediate depths; classes 7 and 8 are mostly found within inshore eutrophic regions with high chlorophyll concentrations, sediments from river plumes and land runoffs. The optical variability between seasons (the summer and winter monsoon and the intermonsoon seasons) are influenced by variations in physical forcing, such as surface winds, ocean currents, precipitation, and sediment influx from rivers and land runoff. Optical diversity index ranged from around 0.3 to 1.36. High diversity indices ranging between 1 and 1.36 were found in areas dominated by classes 1–4, whereas low diversity indices 0.3 occurred in areas where classes 7 and 8 dominated. The variations in the dominant optical classes are shown to be related to changes in chlorophyll concentration and suspended sediment load, as indicated by remote sensing reflectance at 670 nm. On the other hand, optical diversity appears to be high in zones of transition between dominant optical classes.

Keywords: coastal ecosystems, satellite ocean color, classification, remote sensing reflectance, ecosystem management

# 1. INTRODUCTION

In an ocean under rapid modification by climate change, the boundaries between marine ecological provinces will move, but in ways that are difficult to predict (Karl et al., 1995; Platt and Sathyendranath, 1999). However, there is a premium on knowing the large-scale structure of the ocean ecosystem as it changes through time, in other words on developing and maintaining a

### Edited by:

Hervé Claustre, Centre National de la Recherche Scientifique (CNRS), France

### Reviewed by:

Pierre Gernez, University of Nantes, France Aneesh Lotliker, Indian National Centre for Ocean Information Services, India

> \*Correspondence: Grinson George grinsongeorge@gmail.com

### Specialty section:

This article was submitted to Ocean Observation, a section of the journal Frontiers in Marine Science

Received: 30 March 2017 Accepted: 02 March 2018 Published: 20 March 2018

### Citation:

Monolisha S, Platt T, Sathyendranath S, Jayasankar J, George G and Jackson T (2018) Optical Classification of the Coastal Waters of the Northern Indian Ocean. Front. Mar. Sci. 5:87. doi: 10.3389/fmars.2018.00087 biogeography of the ocean basin. Conventional biogeography relies on collecting and identifying individual specimens through samplings and survey techniques. At large geographical scales, it is a costly and time-consuming procedure to make even a single survey at one time point; making serial surveys to detect possible changes may be prohibitive on the grounds of expense. An alternative approach would be to use data streams from sensors carried on satellites in Earth orbit. Such data have the advantages of high-resolution at the ocean surface, high frequency of coverage, cost-effectiveness and synoptic coverage (Platt and Sathyendranath, 1999, 2008). Potentially, their use could yield a different kind of biogeography, based on data free from the limitations of coarse resolution in space and time. Visible spectral radiometry of the ocean provides a data stream that is particularly useful for ecosystem analysis: the visible spectrum carries information on the pigments and size of phytoplankton cells, as well as on the optical properties of the other constituents (such as suspended sediments and colored or chromophoric dissolved organic matter) in the surface waters of the ocean (Guzman et al., 1995; Babin et al., 2003; Dowell and Platt, 2009; Garaba and Zielinski, 2013). Mélin and Vantrepotte (2015) have pioneered the classification of coastal waters at global scale using annually-averaged fields of optical radiances from satellite data.

The Northern Indian Ocean is landlocked toward the north and bifurcates into two intra-continental seas: the Arabian Sea and the Bay of Bengal. Seasonally reversing monsoons and reversal of ocean currents are the major distinguishing features of the Indian Ocean basin (Shetye, 1998; Qasim, 1999). The monsoonal cycle, including southwest or summer monsoon and northeast or winter monsoon, determines the climate of the region. Southwest monsoon is the continuation of the southern hemisphere trade winds that bring monsoon rains and floods to the Asian landmass (Tomczak and Godfrey, 2001). Northeast monsoon is characterized by high pressure over the Asian land mass and northeasterly winds over the tropics and northern subtropics (Shetye and Shenoi, 1988). A strong coastal upwelling occurs along the western coast during the southwest monsoon season, whereas during the northeast monsoon season, cold continental winds cause convective mixing and winter cooling along the north Indian coast (Tomczak and Godfrey, 2001). Other oceanographic features of interest in this region include the Indian Ocean warm pool (Vinayachandran and Shetye, 1991) and monsoon depressions and cyclones (Schott and McCreary, 2001; Schott et al., 2009).

In the Northern Indian Ocean, biogeographical analysis has so far been restricted to what can be found using conventional methods (Krishnamurthy et al., 1978; Schills and Wilson, 2006; Obura, 2012, 2016; Jeffries et al., 2015). Few notable studies on global ocean biogeographic partitions using satellite datasets include: Longhurst province classification (Longhurst, 1998), based on regional oceanography of major oceanic basins, and a global database of chlorophyll profiles; and the 56 biogeochemical provinces proposed by Reygondeau et al. (2013) using the datasets of Sea Surface Temperature (SST), Chlorophyll and Sea Surface Salinity (SSS). The current study region includes at least parts of four provinces proposed by Longhurst (1998): the Red Sea and Persian gulf province (REDS), Northwest Arabian Sea upwelling province (ARAB), Western India coastal province (INDW), and Eastern India coastal province (INDE). Studies on biogeographic partitioning of the Indian Ocean region using remotely-sensed datasets are relatively few. Here, we follow the lead of Mélin and Vantrepotte (2015) through a detailed implementation of their optical remote-sensing method to the Indian Ocean region. We extend the temporal resolution to reveal seasonal changes in the optical classification of the coastal waters of the region. We interpret the results in the context of the seasonally-reversing wind and ocean current system that is the unique oceanographic characteristic of the region.

# 2. DATA AND METHODS

# 2.1. Study Area

Northern Indian Ocean is subdivided by landmasses into the Arabian Sea in the west and the Bay of Bengal in the east and it opens into the equatorial Indian Ocean to the south. The Bay of Bengal coast is shared among India, Bangladesh, Myanmar, Sri Lanka, and the western part of Thailand. The Arabian Sea coast is shared among India, Yemen, Oman, Iran, Pakistan, Sri Lanka, Maldives, and Somalia. The area of interest is the coastal waters of the northern Indian Ocean within the 2,000 m isobath (**Figure 1**) (extending from 0 to 30◦ N latitude and 50 to 100◦ E longitude). Rather than using a more shallow depth (100–200 m) as the outer limit of the coastal zone, we have opted to use the 2,000 m isobath for the outer limit. This was to explore whether optical signatures of offshore waters appeared close to shore, and vice versa. In this choice, we were guided by Antony et al. (2002) who suggested that the offshore influence of coastal waters could extend as far out as 400 km from the shore. This region is well-known for the alternate upwelling and downwelling processes occurring during the contrasting seasons of southwest and northeast monsoons.

Here, we use satellite remote-sensing reflectances (Rrs) at six wavelengths (412, 443, 490, 510, 555, and 670 nm) to identify optically-distinct regions of the coast. Figure S1 provides a schematic diagram of the methods used in the current study.

# 2.2. Satellite Dataset

Remote sensing reflectance (Rrs) of six wavelengths and Chlorophyll datasets were obtained from Version 2 of the Ocean Color Climate Change Initiative (OC-CCI, see www.oceancolour.org) (Sathyendranath et al., 2016, 2017) with spatial resolution of 4 km. Chlorophyll concentration was calculated from the remote-sensing reflectance, using the National Aeronautics and Space Administration (NASA) Ocean Color Chlorophyll Version 4 (OC4) algorithm (O'Reilly et al., 1998). This algorithm performed best in an algorithm comparison carried out as part of OC-CCI activities (Brewin et al., 2015). The OC-CCI satellite datasets affords superior coverage for the area of interest, compared with previously available data. These data products are band-shifted, biascorrected and merged data archives obtained from three sensors: Sea-WIFS (Sea-Viewing Wide Field-of-View Sensor), MODIS-Aqua (Moderate Resolution Imaging Spectro-radiometer of the Aqua earth Observing System), and MERIS (Medium Resolution

Imaging Spectrometer). OC-CCI datasets were validated using the in-situ datasets from Teledyne/Webb APEX-Argo floats deployed in the Arabian Sea (Roxy et al., 2016). The OC-CCI dataset are limited to the six SeaWiFS wavebands in the visible. We recognize the limitations of the bandset that were identified by Mélin and Vantrepotte (2015) for coastal optical classification. Therefore, the analyses and interpretation are restricted to the optical differences that are amenable to identification by the available dataset. All grid points of the selected region (depth range of 0–2,000 m) were used in the classification: grid points outside the 2,000 m depth range were excluded. Isobaths of the region were taken from the General Bathymetric Chart of the Oceans (GEBCO) 1-min gridded data set (**Figure 1**).

### 2.3. Normalization of Dataset

The remote-sensing reflectances (Rrs) at six wavelengths (412, 443, 490, 510, 555, and 670 nm) for the years 1998–2013 were used for the study. The remote-sensing reflectance values were skewed in their distribution and to minimize skewness, each Rrs spectrum was transformed to its log<sup>10</sup> values. They were then normalized by its integral from λ<sup>1</sup> (412 nm) to λ<sup>2</sup> (670 nm), where λ is the wavelength.

$$\vec{\boldsymbol{\omega}} = \log\_{10} \boldsymbol{R\_{rs}}(\boldsymbol{\lambda}) / \int\_{\lambda\_1}^{\lambda\_2} \log\_{10} \boldsymbol{R\_{rs}}(\boldsymbol{\lambda}) d\boldsymbol{\lambda}, \tag{1}$$

where (xE) (in units of nm−<sup>1</sup> ) indicates the normalized spectrum. The denominator was computed by trapezoidal integration. The normalization allows analysis of changes in the shape of the Rrs spectra, rather than in their magnitudes. Typically, changes in the shape of the spectra would be more affected by the composition of the materials present in the water, whereas the magnitude of the spectra is likely to be more indicative of the concentration of the substances, especially of highly-scattering substances. In this work, the vector xE<sup>j</sup> of six log-transformed and normalized reflectance values from a particular location and time (pixel, here indexed by subscript j) is referred to as an object. The total number of objects in a classification is N. Notation and Definitions used in this study are presented in **Table 1**.

# 2.4. Fuzzy C Mean Algorithm

Fuzzy classification evolved from classical set theory. The classical clustering approach determines whether an object is a member or non-member of a given set of any system. Only these two options are possible. In contrast, fuzzy logic allows that an object may have partial memberships in more than one set. The classification algorithms based on fuzzy logic are often used in classifying data from natural systems. The method allows for overlap between boundaries of particular classes or sets, and recognizes that more than one class may be represented at a particular location at any given time.

The membership Fij of a cluster i in the object j is given by (1 − Q(ZE<sup>2</sup> ij)) where <sup>Z</sup><sup>E</sup> ij is the Mahalanobis distance given by (xE<sup>j</sup> − ME <sup>i</sup>)/SE<sup>i</sup> where ME <sup>i</sup> is the mean, SE<sup>i</sup> is the standard deviation and Q is a cumulative χ <sup>2</sup> distribution (Zadeh, 1965).

In this study, the log-transformed, normalized reflectance spectra (xE) were analyzed using the Fuzzy C-mean (FCM) algorithm. Our implementation of fuzzy C mean classification follows Moore et al. (2001). It calculates the centres of each class or cluster and the percentage membership of each class in the data at each pixel. The FCM algorithm also uses several validity functions to assess the optimal number of clusters to be chosen for the classification (Bezdek, 1973; Rezaee, 2010).



### 2.5. Optimal Cluster Validity Functions

Cluster validity function is a statistical measure used to select the optimal number of clusters in the classification (too many clusters would imply that individual clusters resemble each other; too few would imply that all possible cases are not covered). We have used two methods: 1. Xie-beni index and 2. Partition co-efficient. These two methods are used only for selecting the optimal cluster number to run the fuzzy C-means classification. Cluster validity methods are statistical functions that determine the performance of a clustering procedure. Criteria of merit for a clustering method include the distance between clusters (separation) and the distribution of points around a cluster (compactness) (Deborah et al., 2010). We can rely on multiple validity functions to aid selection of the optimal cluster number. The principal strategy used is to cluster the data over a range of cluster numbers (nc) and evaluate each clustering result with each validity function (Moore et al., 2009).

The Partition Coefficient and the Xie-Beni index are cluster validity methods designed specifically for use with fuzzy algorithms. These two methods are preferred to aid selection of the optimal number of clusters in fuzzy classification (Halkidi et al., 2001).

### 2.6. Xie-Beni Index

The Xie-Beni index X is one of the measures used to determine the best cluster number for the fuzzy classification of a particular dataset. This index depends on the geometric properties of the dataset and the membership matrix. To calculate X, we need to calculate two quantities: the sum over all clusters of the mean squared distance of each data object from the centre c<sup>i</sup> of cluster Ci ; and the square of the minimum distance between two cluster centers (Xie and Beni, 1991). The ratio of these two quantities is the Xie-Beni index:

$$X = \left[\sum\_{i} \sum\_{\vec{\mathbf{x}} \in \mathcal{C}\_{i}} d^{2}(\vec{\mathbf{x}}, c\_{i})\right] \Big/ \left[\mathcal{N} . \min\_{i, j \neq i} \{d^{2}(c\_{i}, c\_{k})\}\right]. \tag{2}$$

The smallest value of the index indicates the best cluster number (Halkidi et al., 2001; Zhao et al., 2009).

### 2.7. Partition Coefficient

The Partition Coefficient P is a validity function that uses the membership values (Fij) to provide the optimal cluster number. It measures the amount of overlap between clusters. It is defined as the ratio of the sum of squares of the membership matrix elements of all the clusters to the total number of objects.

$$P = \frac{1}{N} \sum\_{j=1}^{N} \sum\_{i=1}^{n\_c} \langle F\_{ij}^2 \rangle. \tag{3}$$

The index values lie in the range [1/n<sup>c</sup> , 1], where n<sup>c</sup> is the number of clusters. The closer the value is to one the better the data are classified. The cluster number with a maximum partition coefficient is said to be the best cluster number to choose for classification (Bezdek, 1973; Bezdek et al., 1984).

### 2.8. Optical Diversity

Optical diversity is an indicator of the overall variability in optical constituents at a given space and time. Optical diversity, (Hj) is defined here, following Mélin and Vantrepotte (2015), by analogy with the Shannon Diversity Index (Shannon, 2001),

$$H\_{\dot{\jmath}} = -\sum\_{i=n}^{n\_{\dot{\varsigma}}} \langle F\_{i,\dot{\jmath}}^{\*} \rangle \ln \langle F\_{i,\dot{\jmath}}^{\*} \rangle,\tag{4}$$

where F ∗ i,j is the normalized membership of the optical classes and n<sup>c</sup> is the number of classes represented. The membership Fi,<sup>j</sup> was normalized by the integral of Fi,<sup>j</sup> over all optical classes to obtain F ∗ i,j :

$$F\_{i,j}^{\*} = (F\_{i,j}) / \left(\sum\_{i=1}^{n\_c} F\_{i,j}\right). \tag{5}$$

### 3. RESULTS AND DISCUSSION

### 3.1. Selection of the Optimal Class Number

The Xie-Beni Index and the Partition Coefficient were calculated for monthly climatologies of xE for the study area, computed from the OC-CCI monthly Rrs climatologies, which are based on years 1998–2013. Monthly values allowed study of seasonal variations in the distribution of optical classes in this region, which is known for its pronounced seasonality. Climatologies were selected to minimize the effect of outliers through averaging, and also to improve the coverage and reduce gaps in data. Climatological data also provide a baseline against which trends in anomalies can be studied at a future date. Both indices varied between months, and the Xie-Beni index often showed a broad minimum, whereas the Partition Coefficient often showed a broad maximum, such that selection of the optical class number was not straightforward. Nevertheless, eight emerged as the optimal number. To aid the selection of optimal class number further, we also studied the maps of cumulative membership (sum of the memberships of all the classes) calculated using class numbers n<sup>c</sup> from 5 to 15. The maps were studied for evidence of over-classification (large areas where the cumulative membership was >1) and under-classification (large areas where cumulative membership was <1). This study also showed that n<sup>c</sup> = 8 gave the best compromise, with low numbers of both under-classified and over-classified pixels. Therefore, finally, eight classes were selected as the optimal cluster number for all the analyses presented here.

### 3.2. Identification of the Optical Classes

The mean spectra ME <sup>j</sup> of the eight selected optical classes are shown in **Figure 2**. Optical class 1 is characterized by a maximum in the blue, with the signal decreasing progressively toward the red, indicative of clear oceanic waters. With increasing class number, the signal decreased steadily at the shortest wavelength (412 nm), and the maximum shifted toward longer wavelengths: the maximum is at 490 nm for class 6 and 555 nm for class 8. Conversely, class 1 has the minimum value in the red at 670 nm, whereas classes 7 and 8 have the highest values in the red. The values of the mean spectra at the six SeaWiFs wavelengths for each of the classes and their corresponding covariance matrix are provided in Tables S1, S2. It is useful to assess how these optical classes relate to Case-1 and Case-2 waters as defined by Morel and Prieur (1977) and Prieur and Sathyendranath (1981). From the shapes of the spectra, it appears that classes 1–6 are representative of Case-1 waters and classes 6–8 of turbid Case-2 waters.

The distributions of the dominant classes of representative months of the four seasons are shown in **Figure 3**. The mean (**Figure 2**) and covariance values of the optical classes were then used to classify the waters of the study area for all the months of the year, using the climatological satellite Rrs data as inputs, after log-transformation and normalization to obtain xE. Seasonal cycles used in description of the optical classes are: 1. southwest monsoon or summer monsoon (June–September), 2. northeast monsoon or winter monsoon (December–March), 3. spring intermonsoon (April–May), 4. autumn (fall) intermonsoon (October–November).

### 3.3. Spatio-Temporal Variations of Optical Classes in the Northern Indian Ocean 3.3.1. Classes 1 and 2

Optical classes 1 and 2 vary strongly with season. They occur along with class 3 over deeper waters (>200 m). During the southwest monsoon season (June–September), these classes represent very few pixels in deeper waters near the Andaman Sea. In the intermonsoon period (October–November) and northeast

i of the eight optical classes, with the envelopes corresponding to the Mean ± SD.

monsoon (December–March), classes 1 and 2 are present in the deeper waters along the Southwest coast, West Bengal, and Andaman Sea. The Eastern India coastal current carries the low salinity waters (Optical classes 1 and 2) of the Bay of Bengal to the southwest coast of India during the months of February and March in the winter monsoon season. In the months of March and April, the nearshore waters of the Somalia coast and Gulf of Aden are represented by optical classes 1 and 2, extending to the Gulf of Yemen and Oman. The region 17–20◦ N, 69– 72◦ E (deeper waters) was also characterized by classes 1, 2, and 3 during the transition period (April–May). The Chlorophyll concentration corresponding to these classes ranged from 0 to 0.2 mg m−<sup>3</sup> and the optical diversity index fell in the range from 1 to 1.3.

### 3.3.2. Classes 3 and 4

Optical classes 3 and 4 occurs in the isobaths of 100–2,000 m (shallow to deeper depths). These classes show irregular boundaries in the offshore waters during southwest monsoon season along west and east coasts of India, extending to the deeper waters of Andaman Sea. The classes are found near Gulf of Aden and Oman waters only in June, i.e., during the onset of southwest monsoon. In the autumn intermonsoon (October–November), the classes were distributed over the shallow depths (0–500 m) along the near-shore waters of Gulf of Yemen, Oman, Arabian Sea, and Bay of Bengal. In the northeast monsoon, these classes occurred around the islands off Somalia coast, Gulf of Yemen, west coast, and east coast of India. The Gulf of Oman waters flowing toward the Arabian Sea represents class 4 in May (spring intermonsoon). Chlorophyll concentration corresponding to these classes fell in the range of 0.5–0.75 mg m−<sup>3</sup> and the diversity index ranged from 0.3 to 0.9.

### 3.3.3. Class 5

This optical class dominates in regions with isobaths of <1,000 m but >200 m. During the onset of southwest monsoon, class 5 is prominent in the inner Persian Gulf, Strait of Hormuz, Gulf of Oman, Somalia, west and east coasts of India. At the end of southwest monsoon season and onset of the fall intermonsoon period, this class is distributed throughout the coastline in the depth range 0–500 m. This trend persists until the month of January (northeast monsoon) along the entire coastline. In the spring intermonsoon this class is present toward the Persian Gulf, Gulf of Oman flowing into the Northwest coast and further extending toward the east coast of India including the Andaman Sea. This class has chlorophyll levels ranging from 0.75 to 1 mg m−<sup>3</sup> and the diversity index falls between 0.2 and 0.8.

### 3.3.4. Classes 6–8

Classes 6–8 dominate in the regions with depths <200 m. Class 6 is dominant in the Persian Gulf characterized by high dense saline waters in all the seasons. Chlorophyll concentration of regions with class 6 varied from 1 to 1.5 mg m−<sup>3</sup> . Classes 7 and 8 are present in the inner shelf regions with shallower depths influenced by boundary currents and river influx. These classes appear in the near-shore waters off the Somalia coast, Gulf of Oman, Inner Gulf of Kutch and Khambhat, Inner Ganges shelf, and Irrawady river basin near Andaman Sea. Local winddriven circulation brings in the waters of optical classes 7 and 8 from the major river deltas and minor rivers. The influxes from rivers are seasonally variable and rain-fed according to changing precipitation. In the northeast monsoon, waters belonging to classes 7 and 8 flow toward the Strait of Hormuz and into the Arabian Sea in the months of December to April under the influence of strong northwest wind during winter monsoon (Hunter, 1983), turning the Gulf of Oman waters into classes 7 and 8 in February. These classes do not show major variations in the transition periods. The chlorophyll concentration in the regions with classes 7 and 8 was high, ranging from 2 to 2.5 mg m−<sup>3</sup> . The diversity index of the classes 6–8 were low, around 0.3.

# 3.4. Optical Diversity Index

The previous section describes the distribution of the dominant classes, but contains no information on contributions to the optical signal from non-dominant classes. The optical diversity index, which depends on membership of all classes represented in a pixel provides complementary information on the extent to which non-dominant classes are contributing to the signal. If a single class contributed to the optical signal of a pixel, then the optical diversity would be a minimum of 0.26 in our classification with 8 classes represented. On the other hand, if all classes contributed equally, then the optical diversity index would reach a maximum of 2.08.

The optical diversity index H (Equations 4 and 5) was calculated for all months to study the seasonal and regional variations in optical diversity. The optical diversity index H (**Figure 4**) fell mostly between 0.3 and 1.36. Regardless of season, higher H values (1–1.36) were found in deeper waters off the south west coast of India (6–15◦ N), around Lakshadweep and Maldive Islands in the Arabian Sea, around Andaman and Nicobar Islands in the Bay of Bengal and off Myanmar. The highest H values are found in these locations during the winter or northeast monsoon season (December–March),

when the areas covered by high H values were also more extensive. High diversity indices were also found in the fall intermonsoon and spring intermonsoon periods. During the summer monsoon season (June–September), the diversity index lay mostly in the 0.5–1 range along the entire study region with some pixels having index >1 appearing in the deeper waters. High diversity indices (1–1.36) occur along the productive upwelling areas, the transition zones between coast and open ocean, oligotrophic waters and regions with the influence of boundary currents. Low diversity indices occur in the regions of most turbid waters, regions of high river water influx and inland waters.

# 3.5. Optical Classes, Optical Diversity, and Chlorophyll Concentration

In coastal waters, we know that the optical remote-sensing reflectance spectra are affected not only by chlorophyll concentration, but also by suspended sediment load. The seasonal variability of the chlorophyll concentration for the representative months is shown in **Figure 5**. The remotesensing reflectance value at 670 nm is often taken to be a measure of suspended sediment load. Therefore, in **Figure 6**, we have plotted the dominant optical class as a function of chlorophyll-a concentration and Rrs(670), for the monthly climatology of February, as an example. Only well-classified pixels (cumulative class membership >0.5) are plotted. We see a gradual progression in the optical classes 1–8, with increasing chlorophyll concentration and increasing Rrs values, clearly indicating that the optical classification is affected by both chlorophyll concentration and suspended sediment load. Since the chlorophyll concentration in Version 2 of OC-CCI was calculated with a single, global algorithm, we can discount the possibility that the relationship seen in **Figure 6** is emerging from the use of different algorithms for different optical classes. On the other hand, it is worth discussing whether a single chlorophyll algorithm would work equally well in all optical classes in the coastal waters of the northern Indian Ocean. Tilstone et al. (2011) reported that there was a good agreement between OC4v6 and another algorithm (OC5) in open-ocean and coastal waters with chlorophyll concentration up to 2 mg m−<sup>3</sup> for the Arabian Sea and the Bay of Bengal. In the current study, the classes 7 and 8 had chlorophyll concentrations ranging from 1.5 to 2.5 mg m−<sup>3</sup> , quite close to conditions discussed by Tilstone et al. (2011), so that we can assume that OC4 algorithm was suitable for even these high-turbid classes. Nevertheless, it would be interesting, in a future study, to explore the advantages of using algorithms designed for coastal waters (e.g., Le et al., 2013; Loisel et al., 2017; Tilstone et al., 2017).

A similar plot (**Figure 7**) for the optical diversity index H reveals a more complex pattern, with very high and very low indices appearing in close juxtaposition to each other in clear waters (where both chlorophyll concentration and Rrs values are low). Strands of high values of the index also appear for higher values of chlorophyll concentration and Rrs. The differences in the distribution of H values, compared with **Figure 6** for the optical classification, suggest that optical diversity H perhaps tends to be high during transition between optical classes.

# 3.6. Comparison of Regional Optical Classes With Results of a Global Classification

The question remains whether the regional classification presented here yields results similar to those found in the global classification of Mélin and Vantrepotte (2015). The first difference we note is that the regional classification yielded

FIGURE 5 | Monthly climatologies of chlorophyll concentrations in the coastal waters of the Northern Indian Ocean. (Top left) July, representative of southwest monsoon season. (Top right) January, representative of northeast monsoon season. (Bottom left) May, representative of Spring intermonsoon season. (Bottom right) November, representative of Fall intermonsoon season. All monthly climatalogies are calculated for the years 1998–2013.

only eight classes, whereas the global classification produced 16 distinct classes. The log-normalized mean reflectance spectra (**Figure 2**) of our optical classes 1–4 are similar to those of classes 10–16 of Mélin and Vantrepotte (2015). The spectral characteristics of their classes 8–16 of Mélin and Vantrepotte (2015) are typical of clear waters, and are similar to those of our classes 1 and 2. We also see similarities between the optical signatures of our classes 6, 7, and 8 and the classes 1–7 of Mélin and Vantrepotte (2015), and both these sets are typical of highly turbid waters, with mineral particles and dissolved organic matter (Vantrepotte et al., 2012). For the optical diversity, the values of H from this study lie in the range of 0–1.3 which was lower than the range (0–3) reported by Mélin and Vantrepotte (2015) globally. These differences in values of optical diversity are associated, by definition, with the differences in the number of classes, which has a direct impact on the values of optical diversity (Mélin and Vantrepotte, 2015). It is important to note therefore, that the values of optical diversity reported here are not directly comparable with those of Mélin and Vantrepotte (2015). Similarly, the differences in the class numbers have to be

accounted for, when comparing our results with those of Mélin and Vantrepotte (2015).

# CONCLUDING REMARKS

We have implemented an optical classification using a logtransformed, normalized, remote-sensing reflectance (Rrs) datasets, with spatial resolution of 4 km. In this study, eight optical classes were obtained in the coastal waters of the northern Indian Ocean. Seasonally-reversing monsoons are a defining oceanographic characteristic of the Indian Ocean. Here, we have discussed variations in optical classes with reference to the southwest and northeast monsoon seasons of the study region. The distribution pattern of optical classes in the study region showed major variations between seasons. An example is the presence of optical classes 1, 2, 3, and 4 in the latitudes (0–18◦ N) during December–March, whereas they were not found in the months of June–September. The influence of class 5 in intermediate coastal waters is consistent in all the regions with fewer variations in each month. Class 6 is also a minimal contributor to the coastal waters of India, restricted the Persian Gulf in northeast monsoon season. These patterns show that in southwest monsoon season, the optical constituents of the coastline are affected mainly by precipitation and river water intrusion; this condition is not prevalent in northeast monsoon season. The regional distribution of dominant optical classes, and how they are related to physical and biological oceanographic features and processes, is presented in the Table S3.

We have also used the memberships of different optical classes in a given pixel, to study optical diversity within a pixel. Both the dominant optical class and the optical diversity index appear to be related to the chlorophyll concentration and the remote-sensing reflectance at 670 nm (used here as an index of suspended sediment load), but in quite different ways. Whereas, the dominant optical classes transition in a systematic matter from classes 1 to 8 with increasing concentration of chlorophyll and increasing Rrs 670, the diversity index appears to be high in areas of transition between optical classes. We also see that the diversity index was high in clear waters around coral islands and in deeper waters away from the shore. Since it is wellknown that biological diversity tends to be high when chlorophyll concentration is high, these results suggest that optical diversity indices might run counter to biological diversity. This suggestion can only be verified when data on phytoplankton diversity in the study area become available on a systematic, and extensive basis. But once such relationships are established, optical diversity and optical classification would pave the way for mapping biological diversity at large scales, using remote sensing.

We opted for the Ocean Color Climate Change Initiative products for the study, because of the long time series of data available, which would facilitate extension of our work to study trends and inter-annual variability, and also because of the better spatial coverage, especially during the monsoon season. However, the dataset is limited to six SeaWiFS wavebands in the visible, which was dictated by the historical sensor capabilities. The number of wavebands available also determined the extent to which optical diversity could be explored. No doubt as better-resolution data become available over long time scales from missions such as Sentinel 3, which carries the Ocean and Land Color Instrument (OLCI) sensor with 10 wavebands in the visible domain, it would become possible to investigate optical classes and optical diversity with higher spectral resolution, which may reveal additional optical classes which were not captured in the present analysis.

The optical classification presented in this work enables us to study the seasonal dynamics in the bio-optical characteristics of the coastal waters of the Northern Indian Ocean, and how they are related to the physical and biological processes. Spatio-temporal variations of the eight optical classes under the influence of seasonally reversing monsoons were profound. This study will aid as a first step for investigations of the inter-annual variations in distribution of optical classes and their shifts in response to changing climatic conditions such as El Niño and La Niña events.

### AUTHOR CONTRIBUTIONS

SM: analyzed the datasets, ran the algorithms, and worked on interpretation of the results; TP and SS: implemented and led the study; JJ: assisted with statistical measures used in the study; GG: contributed to the ideas of physical and biological oceanography of the study region; TJ: provided the computational codes for the work. All the authors contributed to the text.

### ACKNOWLEDGMENTS

The authors thank the Department of Science and Technology, Science and Engineering Research Board, India for the Jawaharlal

### REFERENCES


Nehru Science Fellowship to TP. The Director, Central Marine Fisheries Research Institute, India is acknowledged for the support and facilities to pursue this work. The study also contributes to the Ocean Color Climate change Initiative (OC-CCI) of the European Space Agency. This is also a contribution to the activities of the National Centre for Earth Observations of the Natural Environment Research Council of the United Kingdom. We thank Dr. Tim Moore, Research Assistant Professor, University of New Hampshire, for providing the classification algorithm. We also thank Dr. Mike Grant, Plymouth Marine Laboratory, United Kingdom, Mr. Devi Varaprasad and Mr. Abhinav, National Remote Sensing Center, India, Mr. Tarun Joseph, Cochin University of Science and Technology, India and Mr. Muhammad Shafeeque, Central Marine Fisheries Research Institute, India for their extended support in programming and computation.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmars. 2018.00087/full#supplementary-material


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Monolisha, Platt, Sathyendranath, Jayasankar, George and Jackson. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.