Sensitivity of a Satellite Algorithm for Harmful Algal Bloom Discrimination to the Use of Laboratory Bio-optical Data for Training

Early detection of dense harmful algal blooms (HABs) is possible using ocean colour remote sensing. Some algorithms require a training dataset, usually constructed from satellite images with a priori knowledge of the existence of the bloom. This approach can be limited if there is a lack of in situ observations, coincident with satellite images. A laboratory experiment collected biological and bio-optical data from a culture of Karenia mikimotoi, a harmful phytoplankton dinoflagellate. These data showed characteristic signals in chlorophyll-specific absorption and backscattering coefficients. The bio-optical data from the culture and a bio-optical model were used to construct a training dataset for an existing statistical classifier. MERIS imagery over the European continental shelf were processed with the classifier using different training datasets. The differences in positive rates of detection of K. mikimotoi between using an algorithm trained with purely manually selected areas on satellite images and using laboratory data as training was overall <1%. The difference was higher, <15%, when using modeled optical data rather than laboratory data, with potential for improvement if local average chlorophyll concentrations are used. Using a laboratory-derived training dataset improved the ability of the algorithm to distinguish high turbidity from high chlorophyll concentrations. However, additional in situ observations of non-harmful high chlorophyll blooms in the area would improve testing of the ability to distinguish harmful from non-harmful high chlorophyll blooms. This approach can be expanded to use additional wavelengths, different satellite sensors and different phytoplankton genera.


INTRODUCTION
Toxic phytoplankton species impact human health and the economy world wide (Kudela et al., 2015;Sanseverino et al., 2016). Events of enhanced growth of toxic phytoplankton (or Harmful algal blooms, HABs) are expected to be more frequent in a climate change scenario (Griffith and Gobler, 2020). Because of their event-like nature it is difficult to design monitoring and earlywarning systems using solely in situ sampling (Babin et al., 2008). However, the optical properties of some HABs species favor the use of ocean colour satellite remote sensing as a tool for detection (Cullen et al., 1997;Dierssen et al., 2015). In order to improve HABs satellite algorithms, a better understanding of the variability of the phytoplankton inherent optical properties (IOP) is, therefore, crucial.
The variability of optical properties among phytoplankton species has been studied through laboratory experiments (Bricaud et al., 1983(Bricaud et al., , 1988Ahn et al., 1992). In the context of HAB species, some authors have focused on the absorption coefficient, showing potential for detection using the fourthderivative of the phytoplankton absorption coefficient (a phy , m −1 ) (Millie et al., 1995(Millie et al., , 1997Staehr and Cullen, 2003). Indeed, this technique has been applied to the discrimination among phytoplankton groups using hyperspectral remote sensing reflectance, (R rs , sr −1 ), in preparation for new sensors (Xi et al., 2015(Xi et al., , 2017. Other laboratory studies have taken into account the optical backscattering coefficient (b bp , m −1 ) (Vaillancourt et al., 2004;Whitmire et al., 2010;Harmel et al., 2016), including the effects of different light regimes (Stramski and Morel, 1990;Poulin et al., 2018).
However, few studies have implemented bio-optical knowledge into practical remote sensing applications for HAB detection. From in situ sampling, a HAB dominated by dinoflagellate Karenia brevis in the Eastern Gulf of Mexico (Cannizzaro et al., 2008) presented lower than expected chlorophyll-specific backscattering coefficient and backscattering ratio (i.e.,b bp = b bp /b p ), which was used to establish a HAB detection criteria. These criteria were not applicable to the East China Sea (Shang et al., 2014), and a different index had to be developed regionally. Further, laboratory experiments showed a wide dispersion among phytoplankton species for chlorophyllspecific backscattering and demonstrated a relationship betweeñ b bp and cell size (Whitmire et al., 2010). Given the regional and biological differences recorded, it is therefore necessary to develop algorithms and approaches that can take into account these sources of variability to further develop HAB detection capabilities.
A regional HAB detection algorithm developed for the European Shelf (Kurekin et al., 2014) was developed to allow for these factors. It employs a fully automatic data-driven approach to identify key characteristics of R rs and derived quantities, and then applies these characteristics to classify pixels in satellite images into no bloom, non-harmful bloom, and harmful bloom categories (Miller et al., 2006). The performance of this algorithm depends on the choice of the training data. In the original implementation (Kurekin et al., 2014), the training data were defined by manually selecting regions of interest with a priori harmful phytoplankton species (i.e., Karenia mikimotoi and Phaeocystis globosa) from MODIS and MERIS sensor measurements. Thus, the algorithm was adapted to specific optical conditions, phytoplankton species and sensor characteristics. If any of these changed, the algorithm would need to be re-trained, in addition to the subjective component introduced by the manual selection of the areas in the image.
In this study, we propose an alternative approach to constructing a training dataset for the Kurekin et al. (2014) algorithm in the European Shelf by using laboratory bio-optical experiments, aiming to incorporate regional and species-specific variability of optical properties. In particular, we focused as an example species on the dinophyceae K. mikimotoi as it is known to occur in blooms on the Western English Channel on the European continental shelf (Barnes et al., 2015). Laboratory experiments on this species were used to retrieve chlorophyll specific IOP. These in turn were used to compute simulations of R rs to augment the training dataset for the classifier by incorporating other optically active components and making it more representative of natural conditions. We further performed a numerical experiment, a sensitivity study, using different configurations of the training datasets to investigate the benefit of using laboratory and modeled data. Simulations included a range of phytoplankton concentrations, as well as other optically active components (detritus and yellow substance or gelbstoff) in the absence of other blooms of non-harmful algae. The results of our sensitivity study confirm the potential of this approach to detect HABs based on species-specific bio-optical information, which is applicable to many ocean color sensors.

METHODS
The Kurekin et al. (2014) algorithm is a statistic operator that requires training data based on spectral features of the targeted phytoplankton species. It then calculates discrimination function parameters in the feature hyperspace (Miller et al., 2006). The original set of ocean color features was limited to arbitrary combinations of R rs from specific spectral bands of a particular satellite sensor. In this study, spectral ratios of R rs were also used.
The features are then automatically processed to select a reduced subset of the most relevant features by applying a Stepwise Discriminant Analysis (SDA) algorithm implemented in the statistical package "klaR" (Weihs et al., 2005). The best combination of features is selected iteratively, by adding more significant or removing less significant features one by one. The significance of features is estimated by applying the probability of correct classification criterion. By reducing the number of classification features, improvement of accuracy and computational efficiency was achieved. The features are used to classify pixels into the original classes no bloom, non-harmful, and harmful. For the current study, the unknown class has been added to represent data that cannot be related to any of known classes.
For this kind of statistical algorithm, in addition to the statistical method, the choice of training datasets representative of the different classes is critical and is the focus of this work. In the Kurekin et al. (2014) work the training datasets were subjectively defined from satellite images. Here, alternative ways to produce training datasets are presented using modeled and experimental data.

Model of R rs
As in previous studies (Shang et al., 2014), the above surface remote sensing reflectance (R rs ) was calculated with (Gordon et al., 1988): where the remote sensing reflectance just below the water surface (r rs ) is modeled as a function of the absorption (a(λ)) and backscattering (b b (λ)) coefficients: with g 0 = 0.089 and g 1 = 0.125, respectively. The spectrally varying a (m −1 ) and b b (m −1 ) were modeled as: Here a w and b bw were the absorption and backscattering coefficients of pure seawater (Pope and Fry, 1997), the km subindex refers to K. mikimotoi, and the bg sub-index refers to the background component. The background component was assumed to covary with chlorophyll concentration including: phytoplankton, gelbstoff (or colored dissolved organic matter) and detritus. The background component was therefore modeled using a generic phytoplankton model: the "new Case I model" as described in Hydrolight (Mobley and Sundman, 2016), where the inherent optical properties of particles solely co-vary with chlorophyll concentration, TChla bg in mgm −1 (Bricaud et al., 1998;Loisel and Morel, 1998). With this assumption, Equation (3) was rewritten as: a = a w + a p,bg + a g,bg + a p,km + a g,km where the contributions to background and K. mikimotoi to the particulate (p), and dissolved (g), compartments were modeled as follows. The absorption coefficients of background had contributions from living phytoplankton and detrital components (a p,bg = a phy,bg +a det,bg ) as well as from dissolved matter (a g,bg ). Only phytoplankton (a phy,km ) and dissolved matter (a g,km ) were considered as contributors to the absorption coefficient due to K. mikimotoi, but not particulate detritus. Particle backscattering for the background (b bp,bg ) and for K. mikimotoi were not further separated in other particulate compartments. Optical absorption and backscattering coefficients for background were described as a function of TChla bg using well validated models (see Supplementary Material, section 1). Absorption and particle backscattering coefficients for K. mikimotoi were computed using the formulae below and parameterized using results from a laboratory experiment (section 2.2).
The a ph,km was modeled as a function of its chlorophyll concentration (TChla km ): where the values of A a (λ) and B a (λ) were derived from measurements. The a dg,km was modeled in the same way as a g,bg (see Supplementary Material, section 1) but with a range of f from 0.2 to 1.0 with a step of 0.2, to cover more variability of a g,km .
The backscattering coefficient of K. mikimotoi (b bp,km ) was modeled as Equation (7): where the values of parameters A bb (λ) and B bb (λ) were derived from measurements. Data from the reflectance model were re-sampled to 0.1 nm spectral resolution by spline interpolation, multiplied by MERIS spectral response coefficients and integrated over the full wavelength range to obtain band-averaged R rs .

Laboratory Measurements of Optical Properties of Karenia mikimotoi
The optical properties and TChla km were obtained through sequential additions of a Karenia mikimotoi culture to a flowthrough system (Slade et al., 2010;Browning et al., 2015) following the method detailed in this section. Karenia mikimotoi is a naked unicellular dinoflagellate with diameter between 15 and 40 µm (K-0260, SCCAP). For this experiment it was grown at the University of Lisbon, where also the optical measurements took place. The culturing chamber had a light:dark cycle of 14:10 h. Karenia mikimotoi was grown at a temperature of 15 • C and salinity of 30 psu, in a L1 media and a photon flux density of 43 µmolm −2 s −1 . The health of the culture was monitored using microscopy counts and the optical experiment was performed when they reached exponential growth phase. The experimental setup consisted of a peristaltic pump (Watson Marlow 603S) connected to a hyperspectral absorption and attenuation meter (Wetlabs acs), and to a black chamber housing a three wavelengths backscatter meter (Wetlabs ECO-BB3). The system was cleaned successively with de-ionized water, diluted Extran and diluted HCl, then rinsed with de-ionized water previous to the experiment. After the cleaning cycle, the system was filled with pre-filtered seawater (Pall Acropak 0.2 µm/0.2 µm) and recirculated at 1 lmin −1 (or 30 r.p.m.) for the whole experiment. Different flow speed were tested (10 and 60 r.p.m.) with no significant changes in the data. Data were recorded for at least 3 min for each addition of culture (Browning et al., 2015). After each addition of the culture into the experimental setup the system was left to stabilize for 1-2 min, then data were recorded for 3 min with the acs and ECO-BB3, and finally water was collected for laboratory analysis.
Temperature, T, and salinity, S, were measured with a probe (YSI-85, Model 85/50 FT) before and after each culture addition and the average was used to correct the optical measurements.
Data from the acs and ECO-BB3 instruments were processed following standard protocols (Wetlabs, 2009;Whitmire et al., 2010). Median and interquartile range (IQR) for the inherent optical properties were calculated for each concentration of culture and the median values from the blank were subtracted (see Supplementary Material, section 2 for further details on data processing and data quality control procedures).
Water samples (100-500 ml) were collected from the system after each Karenia mikimotoi culture addition for particulate absorption, pigment analysis, and microscopy. Samples for spectrophotometric and HPLC pigment analysis were filtered onto Whatman GF/F 25 mm filters using low pressure pumps, flash frozen in liquid nitrogen and kept at −80 • C for <3 months. Spectrophotometric measurements (Shimazdu UV2450 with 0.5 nm resolution) of the particulate absorption coefficient of the sample retained on the filter (a p,km ) was used to derive the phytoplankton (a ph,km ) and particulate detritus, (a d,km ) absorption coefficients (Tassan and Ferrari, 1995) with βcorrection from laboratory experiments (Finkel and Irwin, 2001). High performance liquid chromatography (HPLC) analysis was done to determine phytoplankton pigments concentrations (Zapata et al., 2000;Mendes et al., 2007). Total chlorophyll a concentration (TChla km ) was defined as the sum of Chl-a + Chl-a allomers + Chl-a epimers concentrations. The health of the cultures was confirmed as the percentage of phaeopigments to TChla km plus phaeopigments, which was always ≤ 3%. Cell concentration of the culture was determined by microscopy counts. Depending on cell size and cell concentration, different counting chambers were used (Palmer-Maloney and Sedgwick-Rafter) following the standard procedures (Andersen, 2005). Linear regressions on the log 10 -transformed inherent optical data and TChla km were used to compute the coefficients needed for Equations (5) and (6). Spectral interpolation was necessary to match the inherent optical properties. Backscattering coefficient (b bp,km ) was interpolated to 0.5 nm interval to match the spectrophotometric a phy,km measurements by fitting a power-law (i.e., b bp,km (λ) = b bp,km (532) × (λ/532) −γ ).

Simulation Experiments and Evaluation
Three separate simulation experiments were performed using different training datasets combinations ( Table 1). This section describes the training datasets for each Experiment, the evaluation dataset and the statistics used to measure algorithm performance.

Training Datasets
The training dataset in Experiment 1 was taken as the reference. It was generated from MERIS sensor measurements of R rs at wavelengths 413, 443, 490, 510, 560, 620, 665, 681, and 709 nm. Absorption and backscattering coefficients derived from inversion algorithms (e.g., Smyth et al., 2007) were not used in this training dataset. Using IOP derived from MERIS data introduced additional uncertainty (Defoin-Platel and Chami, 2007;Werdell et al., 2018). Historical MERIS scenes with documented K. mikimotoi bloom events in the English Channel in years 2002-2004, were selected (Kelly-Gerreyn et al., 2004;Vanhoutte-Brunier et al., 2008). Areas of these scenes were manually delineated and labeled into no bloom, non-harmful, and harmful classes. In total five MERIS scenes were selected on 19 July 2002, 1 and 7 July 2003, 10 and 23 July 2004. To compose a training dataset the images were first sub-sampled by a factor of 4. Overall, 6.25% of satellite pixels were used for training and 93.75% were used for evaluation (section 2.3.2). The data in the training dataset were arranged in a 2D table format with R rs values and class labels of individual image pixels stored in rows. In total the training table contained 14,072 records of no bloom class, 12,116 records of non-harmful bloom class, and 10,294 records of harmful bloom class.
In Experiment 2, the records in the training table labeled as harmful bloom were removed. R rs spectra harmful bloom class results from the addition of the background optical properties from bio-optical models (section 2.1) to the K. mikimotoi optical properties derived from laboratory experiments (section 2.2), to include in the training dataset the effect of a mixed particle assemblage in a HAB event. Values used for TChla km varied from 1 to 15 mgm −3 with a constant step in logarithm scale resulting in 500 concentrations. TChla bg varied from 0.01 to 2 mgm −3 having 10 concentrations with an identical interval in logarithm scale, therefore a total of 5,000 combinations were obtained. Experiment 3, was the same as Experiment 2 but with the nonharmful bloom class defined by the background optical properties from bio-optical models (section 2.1), with TChla bg between 1 and 10 mgm −3 . From each experiment, a set of classifier coefficients was obtained after training. In all the Experiments, the no bloom class was defined using the manual selection of parts of the satellite images.

Evaluation Dataset
Due to lack of in situ data to perform an independent validation of the classification, the results from each Experiment were evaluated with the part of the training images that had been reserved (i.e., not used for training). The performance was quantified through statistical comparisons with the manually classified pixels. The same 5 MERIS scenes as those used for training were used to generate training and evaluation datasets for the HAB classifier (see section 2.3.1). The same evaluation dataset was applied in all three experiments, but the classification results were different because the training data were different.
The measures of performance included a confusion matrix and derived statistical measures: overall accuracy, kappa coefficient, errors of commission, and producer accuracy. Overall accuracy was calculated as the number of correctly classified pixels divided by the total number of classified pixels. The kappa coefficient measures the agreement between the evaluation  Table 2). dataset and the classification results in the range from 0 to 1. The data can be considered to be in perfect statistical agreement when kappa is 1 and disagree when kappa equals 0. Errors of commission is the measure of false positive, estimated for each class as the fraction of pixels that were classified incorrectly. Producer accuracy is calculated for each class as the probability that the pixels in this class were classified correctly.
Not all of the classified image pixels were used for estimation of the confusion matrix. If the probability of unknown class was higher than 0.6 or the probability of any of other three classes was lower that 0.6, the pixel was considered as unreliable and discarded.

Optical Properties: Modeled Data for Background and Laboratory Experiments for Karenia mikimotoi
The background model is constructed with the underlying hypothesis that at lower TChl bg , the smaller phytoplankton sizes dominate the optical properties. For absorption ( Figure 1A) this means that there is a higher slope for lower chlorophyll concentrations (i.e., less than ∼2 mg Chla m −3 ). Larger cells are expected to have lower per-chlorophyll absorption due to the package effect (Bricaud et al., 2004). Values from this laboratory experiment are greater than those predicted by the model, pointing toward higher TChla per cell in the culture. However, the values are close to other studies for HAB-forming dinoflagellates (Cannizzaro et al., 2008).
The backscattering coefficient is also modeled as a power law for the background (Figure 1B), following the same underlying assumption of smaller cells dominating optical properties of lower chlorophyll concentrations (Loisel and Morel, 1998). Measurements from the current laboratory experiment are lower than those predicted by the background model, but similar to other laboratory experiments. For example, b * bp,km (470) = 0.000525 m −2 mg Chla −1 is comparable to the lower values obtained for other species of dinophyceae (b * bp (442) from 0.0005786 to 0.0009194 m −2 mg Chla −1 ) (Whitmire et al., 2010). These laboratory values are low in comparison with other field studies in coastal waters (Antoine et al., 2011) and in areas with HAB blooms (Cannizzaro et al., 2008). This disagreement is to be expected, as the in situ studies incorporate the b bp signal from the whole population, which is an assemblage of phytoplankton and other particles (Stramski et al., 2001;Martinez-Vicente et al., 2010, 2012. In fact, for Experiment 2, the harmful class b bp (532) can vary from 0.00718 m −1 for 1.01 mg Chla m −3 to 0.01112 m −1 for 17 mg Chla m −3 , when the contributions of background and K. mikimotoi were added. As a comparison, Cannizzaro et al. (2008) predicts b bp (532) = 0.00855 m −1 for 17 mg Chla m −3 . The coefficients resulting from the power law fit to the laboratory data, used by the classifier, are summarized in Table 2.
The second set of parameters needed in Equations (7) and (8) are the chlorophyll-specific spectrally varying inherent optical properties (Supplementary Figure 1). For chlorophyll-specific absorption, the model predicts a flattening of the spectra at higher Data were fitted to the power function Y = A × X B , (see Equations 7, 8) using a Type-II linear regression (major axis), on log 10 -transformed variables. All regressions are significant (p < 0.005), 95% confidence intervals (C.I.) of the coefficients are given in parenthesis. The determination coefficient, r 2 , and the root mean square error, RMSE, are calculated from the log 10 -transformed variables. Considered four TChla concentrations after data quality control. chlorophyll concentrations. The median values obtained from the laboratory experiment are consistent with this prediction for the blue part of the spectra (400-500 nm) but they are higher for the red. However, the experiment is in close agreement with previous laboratory experiments, in particular with observations under low light conditions (see Figure 4 in Staehr and Cullen, 2003). Concerning the spectral chlorophyllspecific backscattering coefficient (Supplementary Figure 1), modeled values for the background follow also the assumption of smaller phytoplankton sizes dominating at lower TChl bg . This leads to higher chlorophyll-specific backscattering values (i.e., smaller phytoplankton is more efficient at backscattering light) and more features due to absorption effects on the backscattering spectral shape. Backscattering ratio (Figure 2) shows that the laboratory measurements are 6% higher than other experimental data from the literature specific to the same species (Harmel et al., 2016), and within range for other dinophyceae algae [b bp (442) from 0.0061 to 0.0210] (Whitmire et al., 2010). The laboratory experiment results also aligns with in situ observations (Whitmire et al., 2007;Cannizzaro et al., 2008).

Training Dataset From Modeled and Laboratory Data
The forward calculated R rs from background and laboratory experiments are shown in Figure 3, for different chlorophyll concentrations, alongside the training dataset derived from pixels in the satellite image manual selection. The no bloom class is defined in all Experiments from the manual selection in the satellite image. Figures 3B,D display the results for the background model reflectances. Increase in TChla controls the shape of the spectra. These data are used for training class nonharmful bloom and harmful bloom ( Table 1). Figure 3F shows modeled R rs spectra for K. mikimotoi using the specific optical properties derived from the laboratory experiments.
The median R rs spectrum for the no bloom class from satellite ( Figure 3A) compares well in magnitude and shape to R rs modeled for the lower TChl bg (Figure 3B). Median TChla for no bloom from satellite, computed using OC5, is 0.35 mg Chla m −3 and dispersion (half the inter-decile range = (Q90-Q10)/2) is 0.13 mg Chla m −3 . The harmful class from satellite ( Figure 3E) and from the laboratory experiments ( Figure 3F) are also in agreement of scale and spectral shape. TChla from the satellite is 8.15 ± 10.6 mg Chla m −3 , which encompasses the range of TChla km simulated. However, R rs spectra of the harmless bloom class from satellite ( Figure 3C) is lower than the R rs modeled (Figure 3D). Satellite TChla is 1.01 ± 0.52 mg Chla m −3 and is representative of the lower limit of the simulated range (i.e., from 1 mg Chla m −3 ) for the background, which could explain some differences in the results below. The classifier coefficients are available and attached to this paper (Supplementary Material, section 3). Figure 4 presents two examples of MERIS scenes with documented K. mikimotoi blooms. These scenes were selected to train and to compare the sensitivity of the classifier to training datasets in Experiments 1, 2, and 3 (see section 2.3). The enhanced MERIS true color images of the bloom are shown in Figures 4A,B. The harmful bloom class can be seen in the images as a reddish patch close to the center of the images. Figures 4C,D show the TChla as retrieved by standard chlorophyll algorithm for the area, highlighting the co-location of elevated TChla with harmful bloom class. The manual delineation of the area around those blooms for harmful bloom is shown in red in Figures 4E,F.

Detection of HABs in Satellite Data
The turquoise and darker greenish colors, mostly toward the Atlantic Ocean, are associated with no bloom and non-harmful bloom classes (Figures 4A,B). They match low to medium TChla (Figures 4C,D). The manually selected areas for these classes are blue and green respectively (Figures 4E,F). In these examples, the pixels manually selected for the non-harmful bloom class are mostly toward the Atlantic Ocean, with TChla∼1 mgm −3 , which explains the R rs (Figure 3C). A special case is the white bright patch on the Western English Channel, close to the coast of England (Figure 4A). This corresponds to a bloom of nontoxic coccolithophores species, which has not been identified as high TChla (Figure 4C) and has not been labeled as a separate class in the classifier. However, the chlorophyll algorithm is not always capable to discern "bright" waters from high chlorophyll concentrations. For instance, toward the North, in the Bristol Channel, an orangy-white patch, is a well documented location of high riverine contributions of suspended particulate matter (Neil et al., 2011) (Figures 4A,B). Finally, at the center of the Western English Channel, dark red patches (Figures 4A,B) match the corresponding extremely high chlorophyll concentration images (Figures 4C,D) from documented blooms (Kelly-Gerreyn et al., 2004;Vanhoutte-Brunier et al., 2008). These patches were used to define the harmful bloom class manually from satellite (Figures 4E,F). are distinguishable. Overall, high TChla can be considered a good indicator of areas with potential HAB, however, highly turbid waters in coastal areas can produce misleading high TChla when chlorophyll algorithms fail. Lack of in situ data to verify high-chlorophyll non-harmful bloom areas, preclude drawing any conclusion about the validity of using the current algorithm and training datasets to identify those, and it is an extension for this study.
The scenes in Figure 4 were further processed using the K. mikimotoi classifier, trained with the datasets in Experiments 1-3 to generate risk maps (Figure 5). Qualitatively, the three experiments produced overall similar maps with some interesting localized differences (Figure 5). The classification results were different in the Bristol Channel, where concentration of sediments was relatively high. No collocated in situ suspended particulate matter are available, but this is well known area of intense bottom resuspension of sediments and river runoff (Uncles, 2010;Uncles et al., 2015). In Experiment 1 (Figures 5A,B) the classifier, trained on satellite data, discriminated this region as HAB. This false positive detection disappeared in Experiments 2 and 3 (Figures 5C-F), Experiment 2, and (E,F) Experiment 3. Pixels classified as Harmful bloom are shown in red, non-harmful bloom in green, and no bloom in blue. Pixels classified as unknown, are presented in gray and black is land within contours and missing data over water, due to cloud cover or sun glint.
highlighting an improvement of performance in this optically complex region. Another observation in these examples is that the coverage by grey areas increased from Experiments 1 to 3. Indeed, the percentage of unknown class pixels (Table 3) shows a slight increase from Experiment 1 to Experiment 2. It has the highest value for Experiment 3, indicating where the algorithm fails to classify a pixel among one of the known categories. Figures 5E,F point to greater areas of grey (unknown class) coinciding with areas of lower TChla (Figures 4C,D). The increase in unknown class for Experiment 3 can be related to using a range of chlorophyll for modeling the non-harmful class with higher values than what is normally encountered in the area (i.e., ∼1.5 mg Chla m −3 from Smyth et al., 2010). It is worth noting that the coccolithophore bloom was classified as unknown in Figures 5A,C,E for the three Experiments. None of the satellite training images or simulated data for the three training classes included the coccolithophore bloom examples and in all three experiments the algorithm correctly discriminated these data as unknown class. No bloom 0.00 0.00 100 The matrix was normalized by the number of elements in each class.
Numerically, the comparisons of the classification algorithm was assessed using the confusion matrix and statistical measures derived from the confusion matrix. The results of classifier assessment in Experiments 1-3 are summarized in Tables 4, 5. Due to the limited availability of in situ data for validation, the evaluation of the experiments has been made using satellite data (see section 2.3), the focus being on the relative changes in the results.
Overall, there are small differences among the results from the Experiments. Importantly, Experiment 1 and 2 percentage classification of harmful bloom class is <1% and the greatest difference is <15% ( Table 4). The confusion matrix demonstrates that the accuracy of non-harmful bloom classification is lower for the Experiment 3. The differences in results between Experiment 1 and 3 may be due to the difference in the range of R rs for the ranges of TChla considered for the non-harmful bloom class, as discussed above. The small differences between Experiment 1 and 2 may point to expected variability of the optical properties within the K. mikimotoi culture. It follows from Table 4 that the classifier results in Experiment 2 have the lowest false positive rate for non-harmful blooms being classified as K. mikimotoi (i.e., 2.52). This observation is in good agreement with the classified maps in Figure 5 that showed reduced false alarms in the Bristol Channel for Experiment 2 and even fewer in Experiment 3.
According to the results, all three classifiers performed well, achieving overall accuracy above 0.91 and kappa value above 0.85 ( Table 5). The best performance was demonstrated by the classifier in Experiment 1 and the worst for the classifier in Experiment 3 with 7% lower overall accuracy, highlighting the importance of the choice of ranges in chlorophyll adapted to the region of study. However, the fact that the difference among using different training datasets was low (i.e., subjective in Experiment 1 vs. laboratory-derived in Experiment 2), supports the use of laboratory data to train this type of classifier.
The errors of commission and producer accuracy statistical measures for different classes are summarized in Table 5. These values are quite similar for Experiment 1 and Experiment 2, indicating that the performance loss is relatively small when the satellite training data for harmful bloom are replaced with laboratory derived data in Experiment 2. This points to a small degradation of the classification accuracy using laboratory data. A more significant loss in accuracy can be observed in Experiment 3, where the producer accuracy for non-harmful bloom class is reduced by almost 20% by using the bio-optical model instead of satellite-derived data for training.
Overall, the results demonstrate that the novel approach based on laboratory experiments can be used for simulation of R rs values and training of a HAB classifier with only slight degradation of the classification accuracy while reducing false positives due to turbid coastal waters. This study has highlighted the impact of the selection ranges for no bloom conditions that should be fit to simulate TChla similar to those found in the environment where the model is deployed. A smaller source of uncertainty could be related to the use of laboratory conditions as representative of natural conditions for K. mikimotoi. A wider set of culture conditions of the phytoplankton could allow for increasing the training dataset to be able to deal with the natural environmental variability to which the phytoplankton is exposed in the real ocean. We speculate that the low light culture conditions in our experiment could have affected the optical properties of the phytoplankton. While they are in agreement with other studies under similar conditions (Whitmire et al., 2010;Harmel et al., 2016), the variability with light intensity and adaptation should be considered and included in subsequent training datasets (Poulin et al., 2018). More complex modeling of R rs by different mixes of phytoplankton species and use of more advanced radiative transfer modeling Stramski and Mobley, 1997) could also make our training dataset more robust. However, it is the lack of in situ observations co-incident with either radiometry from in situ and/or satellite platforms that limits further advance (Tomlinson et al., 2009), as we were unable to validate our results against real-world observations. Therefore, more efforts to collate existing data and to gather new ones are required.

CONCLUSIONS AND FURTHER WORK
In this work, a novel approach, tested through a sensitivity study, to produce training datasets for HAB classification using machine learning methods has been proposed. The novelty of the approach consists in injecting the bio-optical knowledge from the literature and from a purpose built laboratory experiment into a training dataset for detection of a specific phytoplankton species that can cause health and economic harm to the coastal communities. The potential advantages of this approach include moving away from subjectivity in terms of selection of training datasets as well as not being sensor or even region specific. It is easy to envisage that this approach can be applied to radiometric sensors installed in multiple platforms such as ferries, drones or different satellites.
Laboratory measurements of absorption and backscattering coefficients for different concentrations of K. mikimotoi chlorophyll have been performed, matching existing observations at low light conditions of the culture. When these data were used to train a HABs classifier and applied to MERIS images from the European coastal shelf, there was <1% degradation of the capacity to detect the harmful bloom in comparison to using satellite data, but there was a better discrimination of false positives in turbid coastal waters. Further tests on the ability to separate non-harmful from harmful algal blooms when both have high chlorophyll concentrations are needed. Even replacing most of the training datasets with a combination of modeled and laboratory derived R rs did not degrade HAB detection significantly, although regional average chlorophyll concentrations need to be known a priory to improve the results.
The limited availability of suitable datasets to ground truth the results, constrains the conclusions of this study to a sensitivity analysis. If HAB detection from satellite is to progress to become an operational satellite product, in a similar way radiometry or Chla products are, the construction and open availability of standardized and purpose built relevant validation datasets should be the focus of future work. Indeed, when in situ observations are available (Caballero et al., 2020), local tuning of an algorithm can be achieved. The approach proposed here could be ported to different species by modifying the training datasets.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found at: https://doi.org/10.5281/ zenodo.4075095.

AUTHOR CONTRIBUTIONS
VM-V designed the experiment, carried out the experiment, and wrote the first draft of the manuscript. VB, CS, AA, and VV helped out with the experiment in the laboratory. AK and JL did the numerical experiments. PM discussed initial ideas. All co-authors provided comments to the manuscript.

FUNDING
This work benefited from funding from EC-FP7 AQUA-USERS (Grant 607325), Interreg Atlantic Area project PRIMROSE (Grant EAPA 182/2016), NERC ShellEye (Grant NE/P011004/1), Fundação para a Ciência e Tecnologia through the strategic project UIDB/04292/2020. This work was also supported by funding from the European Union's Horizon 2020 Research and Innovation Programme under grant agreement N 810139: Project Portugal Twinning for Innovation and Excellence in Marine Science and Earth Observation-PORTWIMS. Also supported by HABWAVE project LISBOA-01-0145-FEDER-031265, co-funded by EU ERDF funds, within the PT2020 Partnership Agreement and Compete 2020, and national funds through FCT, I.P.

ACKNOWLEDGMENTS
Thanks to C. Beltran for the analysis of HPLC samples and G.Dall'Olmo for initial advice on the setup of the optical chamber. We acknowledge the comments and suggestions from three reviewers who have contributed to improve this work.