Mapping Spatial Distribution and Biomass of Intertidal Ulva Blooms Using Machine Learning and Earth Observation

Opportunistic macroalgal blooms have been used for the assessment of the ecological status of coastal and estuarine areas in Europe. The use of earth observation (EO) data sets to map green algal cover based on a Normalized Difference Vegetation Index (NDVI) was explored. Scenes from Sentinel-2A/B, Landsat-5, and Landsat-8 missions were processed for eight different Irish estuaries of moderate, poor, and bad ecological status using European Union Water Framework Directive (WFD) classification for transitional water bodies. Images acquired during low-tide conditions from 2010 to 2018 within 18 days of field surveys were considered. The estimates of percentage coverage obtained from different EO data sources and field surveys were significantly correlated (R2 = 0.94) with Cohen’s kappa coefficient of 0.69 ± 0.13. The results showed that the NDVI technique could be successfully applied to map the coverage of the blooms and to monitor estuarine areas in conjunction with other monitoring activities that involve field sampling and surveys. The combination of wide-spread cloud-coverage and high-tide conditions provided additional constraints during the image selection. The findings showed that both Sentinel-2 and Landsat scenes could be utilized to estimate bloom coverage. Moreover, Landsat, because of its legacy program, can be utilized to reconstruct the blooms using historical archival data. Considering the importance of biomass for understanding the severity of algal accumulations, an artificial neural networks (ANN) model was trained using the in situ historical biomass samples and the combination of radar backscatter (Sentinel-1) and optical reflectance in the visible and near-infrared (NIR) regions (Sentinel-2) to predict the biomass quantity. The ANN model based on multispectral imagery was suitable to estimate biomass quantity (R2 = 0.74). The model performance could be improved with the addition of more training samples. The developed methodology can be applied in other areas experiencing macroalgal blooms in a simple, cost-effective, and efficient way. The study has demonstrated that both the NDVI-based technique to map spatial coverage of macroalgal blooms and the ANN-based model to compute biomass have the potential to become an effective complementary tool for monitoring macroalgal blooms where the existing monitoring efforts can leverage the benefits of EO data sets. Graphical Abstract Overall research workflow showing data types, study area, model development and biomass results.

Opportunistic macroalgal blooms have been used for the assessment of the ecological status of coastal and estuarine areas in Europe. The use of earth observation (EO) data sets to map green algal cover based on a Normalized Difference Vegetation Index (NDVI) was explored. Scenes from Sentinel-2A/B, Landsat-5, and Landsat-8 missions were processed for eight different Irish estuaries of moderate, poor, and bad ecological status using European Union Water Framework Directive (WFD) classification for transitional water bodies. Images acquired during low-tide conditions from 2010 to 2018 within 18 days of field surveys were considered. The estimates of percentage coverage obtained from different EO data sources and field surveys were significantly correlated (R 2 = 0.94) with Cohen's kappa coefficient of 0.69 ± 0.13. The results showed that the NDVI technique could be successfully applied to map the coverage of the blooms and to monitor estuarine areas in conjunction with other monitoring activities that involve field sampling and surveys. The combination of wide-spread cloud-coverage and high-tide conditions provided additional constraints during the image selection. The findings showed that both Sentinel-2 and Landsat scenes could be utilized to estimate bloom coverage. Moreover, Landsat, because of its legacy program, can be utilized to reconstruct the blooms using historical archival data. Considering the importance of biomass for understanding the severity of algal accumulations, an artificial neural networks (ANN) model was trained using the in situ historical biomass samples and the combination of radar backscatter (Sentinel-1) and optical reflectance in the visible and near-infrared (NIR) regions (Sentinel-2) to predict the biomass quantity. The ANN model based on multispectral imagery was suitable to estimate biomass quantity (R 2 = 0.74). The model performance could be improved with the addition of more training samples. The developed methodology can be applied in other areas experiencing macroalgal blooms in a simple, cost-effective, and efficient way. The study has demonstrated that INTRODUCTION Estuarine and coastal areas play a crucial socio-economic, biological, and environmental role as these environments provide multiple ecosystem goods and services, making them some of the most valuable ecosystems on earth (Costanza et al., 1997;Donkersloot and Menzies, 2015;Norton et al., 2018). Due to their high value, these areas have been focal points of human settlement and resource exploitation (Lotze et al., 2006), resulting in a long history of over-exploitation, habitat transformation, and pollution. This legacy has undermined ecological resilience and has obscured the magnitude of degradation in estuarine and coastal environments (Lotze et al., 2006;Airoldi and Beck, 2007).
Estuarine and coastal waters worldwide have been facing the problem of eutrophication and macroalgal blooms (Teichberg et al., 2010). In Europe, eutrophication is considered one of the main threats for aquatic ecosystems (Airoldi and Beck, 2007;Hering et al., 2010). This process is directly linked with nutrient over-enrichment because of increasing anthropogenic nutrient loadings, which significantly increased after the generalized use of industrial fertilizers following the second world war (Cloern, 2001;Lotze et al., 2006;Diaz and Rosenberg, 2008). Due to the hydrological and ecological characteristics of estuaries, they are particularly susceptible to over-enrichment of nutrients and other pollutants from anthropogenic activities (Sfriso et al., 1992;Eyre and Ferguson, 2002). A clear sign of nutrient enrichment and environmental degradation in estuaries is the development of opportunistic macroalgal blooms and the loss of seagrass meadows (Valiela et al., 1997;Teichberg et al., 2010;Bermejo et al., 2019a). Considering the common usage of the terms, macroalgal bloom and seaweed tides, these are used interchangeably in the present paper.
Macroalgal blooms undermine the ecosystem services that estuaries provide, and affect ecosystem functioning (Smetacek and Zingone, 2013). As in other parts of the world, some Irish estuaries contained large green tides in recent years (EPA, 2006;Ní Longphuirt et al., 2016;Wan et al., 2017). A recent Environmental Protection Agency (EPA) report  has found that transitional waters (i.e., estuaries and coastal lagoons) in Ireland have poorer water quality when compared with other water typologies (i.e., groundwater, rivers, lakes, and coastal waters), with only 38% of water bodies in good or better ecological status.
The EU Water Framework Directive (WFD) 2000/60/EC (European Commission, 2000) and Marine Strategy Framework Directive (MSFD) (2008/56/EC;European Commission, 2008) are two of the most ambitious initiatives to prevent further deterioration of water bodies and associated ecosystems (Wan et al., 2017;Boon et al., 2020). These directives represent a change in the scope of water management from the local to the basin scale (Apitz et al., 2006). They are based on an ecological approach rather than a traditional physicochemical assessment (European Commission, 2000). This more recent approach is more holistic since it puts the ecosystem at the center of management decisions by considering ecology and biology at a larger scale (e.g., the whole river basin or adjacent coastal area) (Borja, 2005). Both directives require that coastal areas are periodically monitored to assess their achievement of "Good Ecological Status" and "Good Environmental Status" as per WFD and MFSD targets, respectively. The large expansion in monitoring required by the WFD and MFSD has created pressure from governments on their regulatory agencies to reduce the costs of monitoring while maintaining coverage and effectiveness (Borja and Elliott, 2013;Carvalho et al., 2019).
Marine macrophytes, including macroalgae and angiosperms such as saltmarsh and seagrass communities, are biological quality elements used to monitor and assess the ecological status of transitional and coastal waters for the WFD (e.g., Scanlan et al., 2007;Bermejo et al., 2012Bermejo et al., , 2013. Across the EU, monitoring of opportunistic macroalgal blooms is used to assess the ecological status Wan et al., 2017), based on the relative coverage and biomass abundance of opportunistic macroalgae (Wilkes et al., 2018). Monitoring of coastal and estuarine environments can be demanding in terms of time, labor, costs, and sometimes can pose significant logistical challenges (European Commission, 2008) such as coordination of field equipment, survey procedures, means of transportation, field crew, and safety. The gathering of this information in muddy environments, especially the mapping of macroalgal blooms, can present several impediments as it frequently requires the use of specialized vehicles such as hovercrafts, or can be very labor intensive. Although these field surveys are systematic and provide high-quality data regarding spatial coverage and biomass, the cost of such works could range from medium to high .
Remote sensing can offer an affordable complementary solution to field-based environmental monitoring. Remote sensing data sets can be freely available, provide wide spatial and temporal coverage, and easily allow methodological standardizations and comparability. A conventional remote sensing technique such as aerial photography has been applied to map seagrass and macroalgal distribution in coastal and estuarine environments (Jeffrey et al., 1995;Hernandez-Cruz et al., 2006;Nezlin et al., 2007). Similarly, the application of earth observation (EO) satellite data to assess the severity and extension of algal blooms in coastal environments has grown in recent years (Cristina et al., 2015;Xing and Hu, 2016;Zhang et al., 2019). With the continued development of technology, unmanned aerial vehicles (UAV) are also being used to monitor green tides and seaweed blooms in marine environments (Xu F. et al., 2017;Bermejo et al., 2019b;Taddia et al., 2019;Jiang et al., 2020). Remote sensing methods have been shown to provide reasonable estimates of the algal coverage on the ground (Hernandez-Cruz et al., 2006;Nezlin et al., 2007), but with the availability of EO data sets with higher temporal, spatial, and spectral resolution, further improvement and development can be attained.
Different techniques such as image thresholding (Cavanaugh et al., 2010(Cavanaugh et al., , 2011Cui et al., 2012;Bell et al., 2015), visual interpretation (Donnellan and Foster, 1999;Gower et al., 2006;Pfister et al., 2017), supervised classification (Volent et al., 2007;Casal et al., 2011;Bermejo et al., 2020), and unsupervised classification (Fyfe et al., 1999;Duffy et al., 2018) have been used in vegetation mapping in coastal and estuarine areas. Among the thresholding techniques, band ratios, vegetation indices, density, and biomass are commonly used (Richards, 2013), whereas for supervised classification, spectral angle mapper and maximum likelihood classifications are more conventional approaches (Schroeder et al., 2019). Regarding classification, ground-truth data are used for supervised classification, whereas such information is not utilized for unsupervised classification. For both classification types, the results need to be validated with ground-truth data. Despite its simplicity, one of the disadvantages of supervised as well as the unsupervised classification is that it results in errors due to digital noise and these must be removed carefully (Schroeder et al., 2019). Unlike classification methods, the image threshold is determined based on the ground-truth data or a validation is performed to check the effectiveness of the threshold.
Many machine learning-based studies applying aerial or remote sensing imagery rely on the greenness of the imagery to map green tides. Although this technique is effective, some areas of the bloom could likely be underestimated, as the technique cannot easily delineate the bloom in its entirety. This unreliability results as the technique does not account for the spectral information available in the near-infrared (NIR) region where plants exhibit the majority of the photoactivity such as reflection (Tucker and Sellers, 1986). The Normalized Difference Vegetation Index (NDVI) is a proxy for vegetation health and greenness, and its value ranges from −1 to 1, where higher value corresponds to the healthy vegetation and lower values correspond to the lack of vegetation (D'Odorico et al., 2013;Ke et al., 2015;Zhu and Liu, 2015;Zhang H. K. et al., 2018). Since the NDVI technique uses the NIR bands, which are not visible to the naked eyes, it can detect the signature of vegetation that can go undetected when only visible green bands are used. Although there are numerous other vegetation indices (Silleos et al., 2006;Bannari et al., 2009;Xue and Su, 2017) used, including Enhanced Vegetation Index [EVI, primarily for Moderate Resolution Imaging Spectroradiometer (MODIS)], Maximum Chlorophyll Index [MCI, primarily for MEdium Resolution Imaging Spectrometer (MERIS)], and Floating Algae Index (FAI), NDVI was employed in the present study primarily because it incorporates the visible and NIR bands at 10 m resolution which are available in Sentinel-2. Additionally, indices developed for detecting vegetation or chlorophyll in aquatic conditions such as FAI and MCI are not applicable in the current context as macroalgal blooms are being mapped on tidal flats. Other studies (Siddiqui and Zaidi, 2016;Siddiqui et al., 2019;Taddia et al., 2019) have used NDVI and similar vegetation indices for mapping seaweed, but these studies mainly focused on seaweed immersed in the water. For the studies involving seaweed species in water, data sets from ocean color sensors have been used to obtain estimates of chlorophyll present in the water (Gower et al., 2006(Gower et al., , 2008. Considering the size of the estuaries and the need to map these blooms at a higher spatial resolution, ocean color sensor-based techniques are not relevant to many estuaries globally. Recent studies have shown that NDVI can be successfully used for mapping biomass and density of intertidal macroalgae (Conser and Shanks, 2019;Praeger et al., 2020;Salarux and Kaewplang, 2020).
The assessments of the algal blooms are usually accomplished by comparing spatial coverage, but biomass can provide greater insights about the severity of the blooms Rossi et al., 2011;Xiao et al., 2019). Furthermore, biomass results are more helpful for allocating resources and adopting mitigation measures. Despite the usefulness of biomass mapping, there are minimal studies that focus on mapping biomass using remote sensing data sets (Hu et al., 2017;Xiao et al., 2017Xiao et al., , 2019. Most of these studies relied on reflectance computed in the laboratory environment in order to develop the biomass model. Those models were later used to generate biomass using the data from the MODIS optical sensor. Ocean color sensors such as MODIS are ineffective in mapping bloom patches that are smaller than a few hundred meters in size due to their coarse resolution of 500 m (Karki et al., 2018). Also, considering the spatial extent of the estuarine areas, it is essential to use remote sensing data with greater spatial resolution that can discriminate between various magnitudes of biomass on the tidal flats.
In addition to optical sensors, application of radar, for example, Advanced Land Observing Satellite-1 (ALOS) PALSAR for biomass estimation, at larger scales such as in forestry is common (Jha et al., 2006;Le Toan et al., 2011;Hame et al., 2013); however, its application can be explored at a finer scale for algal biomass estimation. In recent years, there have been many successful applications of Sentinel-1 technologies for biomass monitoring (Ndikumana et al., 2018;Periasamy, 2018;Crabbe et al., 2019) including those combining radar and optical data sets (Chang and Shoshany, 2016;Laurin et al., 2018;Navarro et al., 2019;Wang et al., 2019) or using radar for bloom forming Ulva species (Geng et al., 2020). The application of machine learning in the field of macroalgal blooms is increasing as demonstrated by recent studies (Zavalas et al., 2014;Kotta et al., 2018;Qiu et al., 2018;Liang et al., 2019;Kim et al., 2020).
Although the application of NDVI is conventional, the present study aims to optimize the benefit of NDVI combined with the application of radar and artificial neural networks (ANN) to predict the biomass of Ulva blooms. The benefit ANN offers over the traditional linear regression-type approach is the ability to model non-linear relationships (Huang, 2009;Karlaftis and Vlahogianni, 2011). An ANN offers the potential to deal with a large number of training samples and model complex relationships taking advantage of multiple input variables (Bourquin et al., 1998). Unlike other models, ANN offers the scope for future additional optimization, by the inclusion of more samples which in turn increases the robustness (Alwosheel et al., 2018). Therefore, the addition of more samples and the consideration of further variables provide a learning opportunity to the ANN model which improves predictability over time (Nadikattu, 2017). These neural networks when adequately trained can model the natural environment making them suitable to big-data applications such as remote sensing. Due to these scalable and expandable qualities, the ANN-based technique was adopted in the current study. Despite numerous benefits, there are some drawbacks of ANN which can be considered a "black box" because of its complex algorithms (Dayhoff and DeLeo, 2001;. In addition, machine learning techniques require a comparatively large number of training samples, which may be difficult for smaller scale studies. More importantly, the requirement of robust computing and programming platforms frequently discourages quick and easy implementation. The primary goal of the current study was to evaluate remote sensing as a supplementary tool for the monitoring of macroalgal blooms in Irish estuaries, where the presence of higher cloud coverage places an additional constraint. In this study, macroalgal bloom mapping based on satellite imagery was compared with in situ mapping for ground-truthing and validation purposes. The potential of machine learning methodologies was explored to map the biomass distribution since the higher resolution of the newer sensors, such as Sentinel-2 with 5-day revisit time and 10 m spatial resolution, accompanied by the greater size of the data, demands robust computing resources. The integration of these approaches in EO can take advantage of the recent technological advances in the field of data science and artificial intelligence (Ali et al., 2015). To address this challenge, the potential of an ANN was explored using the information extracted from Sentinel-1 radar backscatter and Sentinel-2 optical reflectance. The historical biomass data collected from field surveys were combined with the data obtained from the EO to develop the biomass model.

Study Area
The current research was conducted on eight estuarine areas most affected by macroalgal blooms across the Republic of Ireland (Figure 1)  Dublin (4,488,300 m 2 ). These areas show a moderate, poor, or bad ecological status as assessed for the WFD, parameters driving status included loss of seagrass meadows, general physicochemical properties, or the development of large macroalgal blooms (EPA, 2019 1 ). Although there were different numbers of estuaries in each category (moderate: 4; poor: 3; and bad: 1), standard field surveying techniques were conducted regardless of their status. Consistent with the field protocol, identical EO mapping techniques were also applied. It was important to include a varied range of ecological status conditions in this study to make sure that the proposed technique for mapping blooms was not limited to a narrow set of environmental conditions.
In this study, seaweed blooms resembling terrestrial vegetation present in the estuaries and tidal flats, excluding the salt marshes, were mapped during the low tides. These algal patches must be mapped during the low tide condition when there is no water above them. It is crucial to note that estuaries in Ireland are intertidal in nature, and blooms present in the tidal flat region may not have water above them except during high tide conditions. In six of the eight sampling locations, seagrass meadows were absent, or their presence was negligible (i.e., 1 www.catchments.ie Clonakilty), and only Bannow Bay had conspicuous seagrass meadows present. Bannow Bay primarily includes intertidal zones with predominant macroalgal growth, and the distinction between seagrasses and Ulva was not conducted because of the interspersed, or sometimes negligible, growth of seagrasses among the macroalgal blooms. Apart from the practical reasons, green algae and seagrasses are difficult to separate from each other at the current spectral and spatial signature (Kutser et al., 2020). Since the discrimination between seagrasses and macroalgae is not possible with Sentinel-2, the study aims to develop a methodology so that field validation can be performed where substantial levels of macroalgal growth occur and any potential false positive incidences due to the presence of seagrasses can be verified. This is an example of field monitoring and EO complementing each other, reducing logistical and human resource costs, and enhancing environmental quality assessment. Seagrasses in Bannow Bay are routinely monitored and mapped as a part of obligations under the WFD and the data confirm no risk of false observations from the EO mapping.

Field Survey
In Ireland, as well as in other cold-temperate regions, the maximum development or peak of macroalgal blooms occurs during the summer (Jeffrey et al., 1995;Bermejo et al., 2019aBermejo et al., , 2020. For this reason, the monitoring of the estuaries and the field sampling were concentrated through late June to early October. The current study focused on blooms from 2010 to 2018, because of the lack of overlap between field surveys and Landsat acquisitions prior to that date. Bloom extension and biomass abundance were obtained from the WFD surveys conducted by the EPA to assess the ecological status of opportunistic algae blooms on the dates shown in Table 1. The table shows the dates and locations for which both field spatial coverage and biomass data sets were available and were collected as a part of the WFD monitoring. Since the WFD method primarily focuses on surveying of algal mats that are mostly attached, spatial coverage and algal biomass were assessed in situ. The outer edges of the algal accumulations were mapped at low tide using a mapping grade Global Positioning System (GPS) unit with accuracy of a meter. A light hovercraft was used in areas where the sediment was too soft to allow safe access or where the algal beds were too large to allow safe mapping during a single tidal cycle . A series of transects were taken through each patch and haphazardly distributed 0.5 m 2 quadrants were taken along each transect, and their GPS locations were recorded. The percentage cover and algal biomass in each quadrant were recorded. Biomass from each quadrant was collected, washed, and rinsed in fresh seawater to remove sand and debris, squeezed dry, and the weight recorded as g/m 2 wet weight. The data were compiled into five sub-metrics (i.e., total percentage cover, total patch size as a percentage of available intertidal habitat, average biomass on the intertidal area, average biomass in affected area, and percentage of quadrants with algae entrained into sediments) to provide a WFD assessment for the estuaries Wan et al., 2017). To meet the requirement for sufficient training data for model development, additional data collected as a part of the Sea-MAT Project (Bermejo et al., 2019b), not shown in Table 1, were used. This biomass abundance (g/m 2 ) data, collected between The dates of field survey are italicized, and the serial number on the left column corresponds to the locations shown in Figure 1.
June 2016 and August 2017 following a similar methodology, were exclusively used for ANN training and validation.

Earth Observation Mapping of the Spatial Coverage
The mapping of macroalgal blooms using satellite imagery comprises several steps starting from data download to the generation of the map (Figure 2). The study utilizes the EO data sets from the Sentinel-2 and Landsat (5 and 8)

Data Acquisition
The identification of the dates was based on the availability of the field survey data collected from 2010 to 2018 as a part of the WFD monitoring program. In response to the number of field data accompanied by the need to find matching EO scenes with low tide and cloud-free conditions, images acquired either before or after the field surveys were indiscriminately considered for the study. This is considered the accepted practice in the field of remote sensing and is unlikely to affect the outcome of the study for mapping Ulva blooms. For each location, the archival Sentinel-2 data were screened for cloud-free scenes within two and half weeks of the field monitoring program under conditions of low tide. In the absence of Sentinel-2 MSI scenes  Table 1 provides the detailed information about the field survey and corresponding source of EO data sets for each of the eight locations. Considering the small number of historical WFD monitoring data with associated geospatial information followed by the difficulty in finding corresponding EO scenes, data from 2010 were selected despite the fact that there were no matching scenes for the two subsequent years (2011 and 2012). Additionally, 2010 is the only year for which Landsat-5 data were used, which is identical to Landsat-8 from the application point of view. Thus, the inclusion of 2010 despite a gap of two years and the usage of Landsat-5 was important for this study. Sentinel-2 Level 2A (atmospheric corrected) products covering the areas under consideration were identified and downloaded from ESA's Sentinel Hub website. Similarly, Level 2 Landsat-5 TM and Landsat-8 OLI products were downloaded from the USGS Earth Explorer's website.

Pre-processing
The second step involved pre-processing of the scene in order to reduce the size of the files to allow quick processing in freely available Sentinel Application Platform (SNAP) software from ESA. In the case of Sentinel-2, pre-processing was accomplished as a first step to resize the entire image to the resolution of 10 m bands (either blue, green, red, or NIR). This step helped to synchronize all the coarser bands (20 and 60 m) to a resolution of 10 m. This is the most essential step before doing spatial and spectral sub-setting since it helps to significantly reduce the computing time. After resampling, each scene was resized and clipped to the extent of each location under investigation. During the same step, the spectral sub-setting was completed to retain the specific bands required for further processing (blue, green, red, and NIR). In the case of Landsat-5 and Landsat-8, the scene was resampled to the extent of 30 m band prior to spatial and spectral sub-setting.

NDVI Calculation
Although several remote sensing indices are in use for vegetation mapping, NDVI is the most widely used index (Xue and Su, 2017). The method used in this study is based on the NDVI for mapping and delineating the bloom. It utilizes the characteristic increased reflectance in the NIR region and decreased reflectance in the red regions of the electromagnetic spectrum exhibited by the vegetation (Jensen, 1986;Tucker and Sellers, 1986). The NDVI is calculated using Eq. 1.
The next step was to compute NDVI using red and NIR bands in SNAP software. During the band specification, corresponding bands were identified for Sentinel-2, Landsat-8, and Landsat-5. All the processing after this step was done using ArcGIS software and Python 2.7. Since NDVI computation involves the uses of NIR band, it can overestimate the bloom coverage by including microphytobenthos present in the sediment that contributes to the primary production in an estuarine environment (Launeau et al., 2018). To prevent this risk, the results from NDVI needed to be verified with field data. One important advantage of using NDVI was its ability to detect live vegetation, due to the consideration of NIR reflectance exhibited only by photosynthetically active plants.

Generation of Algal Bloom Map
The Corine (Coordination of Information on the Environment) land cover data set 4 was used to define area of interest (AOI, i.e., intertidal mudflats) for each location. Following Corine land cover classification, the classes of interest corresponded with the tidal flats (4.2.3) and estuaries (5.2.2) labels. This facilitated the removal of terrestrial vegetation and saltmarshes from the consideration.
The visual inspection was done to determine the threshold of the NDVI values that corresponded with the bloom-forming seaweed in the natural color composite. In most cases, NDVI values greater than 0.15 and 0.20 for Sentinel-2 and Landsat images, respectively, represented Ulva blooms. For each location, the vegetation pixels were segregated after determining the threshold. This technique was able to segregate bloom patches bigger than a few meters.
Manual verification, subjective judgment, and refinements are essential steps of the mapping workflow to assure that the bloom pixels are represented correctly. Spatial delineation of the macroalgal blooms using the NDVI technique was the initial step, and subsequently the computation of biomass relied on the spatial extent outlined. Correct spatial delineation ensured that healthy Ulva tissue, as opposed to decomposing tissue, was only considered. In specific locations, it was necessary to eliminate areas which corresponded to terrestrial vegetation. Due to the coarse resolution of Corine land cover data sets, a few areas also included artificial structures and salt marshes. The minimum mapping area for Corine land cover data sets is 50,000 m 2 for land cover change and 100 m for linear units. For features smaller than these minimum mapping units, a generalized class is reported in the Corine database. Because of such boundary generalizations for small features, these areas were carefully clipped out from the AOI polygons. The final step involved the generation of the vector and raster outline of the algal blooms for all locations. The area of the bloom was computed, and the percentage coverage was calculated by taking into account the AOI of each site under consideration using the following Eq. 2.
Percentage Cover = Area delineated by NDVI technique in the AOI AOI × 100 (2)

Validation and Statistical Analysis
To test the comparability and consistency of macroalgal blooms mapping using satellite imagery and field surveys, and to identify possible disagreement between methodologies, Pearson correlation analyses and paired t-tests (Xu M. et al., 2017) and Cohen's kappa analyses were conducted. The limited number of samples for Landsat show marginal significance compared to other groups, whereas Sentinel with native bands shows higher significance. Similarly, the root-mean-square error (RMSE) was calculated to examine the error where lower value indicated better estimates. Similarly, to assess the influence of the temporal gaps between field surveys and satellite images, the correlation between relative mismatches and the time lapse between samplings was evaluated. All statistical analyses were performed using Minitab software, ArcGIS, and Python packages. When necessary Shapiro-Wilks and Levene's tests were used to ensure compliance with normality and homoscedasticity assumptions. In all statistical analysis, significance was set at 5% risk error.
The time gap between the EO data acquisition and field survey varied from same day to a maximum of 18 days (+/−) with an average of 12 days for 22 pairs of observations. Out of those 22 observations, 12 were Sentinel-2 and 10 were Landsat acquisitions, as we primarily focused on finding Sentinel scenes and selected Landsat only when the former was not available. Since it was not possible to obtain both Sentinel and Landsat scenes for each location for the same date during the scene selection, more direct comparison between the influences of resolutions on identical conditions was not possible. To address this, the Sentinel bands were resampled to 30 m in order to compare the findings from the 10 m Sentinel-2 bands. This helped to compare the overall effects of resolution on the estimated coverage at the same location with fine and coarse pixels on the same day. Further, the upscaling of the Sentinel bands to 30 m facilitated the comparison with original Landsat bands.

Computation of Algal Biomass Using Machine Learning
Building upon the spatial coverage extracted using the NDVI delineation approach, an ANN model was developed to quantify the biomass distribution in the estuarine area. Sentinel-1 Synthetic Aperture Radar (SAR) and Sentinel-2 MSI were used to develop the ANN model. The in situ biomass samples (point locations in g/m 2 ) collected as a part of WFD monitoring program and those obtained under Sea-MAT project (Bermejo et al., 2019b) were compiled as a response variable for the model. The entire biomass computation process can be broken down into data acquisition and processing, identification of determining variables, model development, and application, as shown in Figure 3.

Data Acquisition and Processing
Sentinel-2 scenes acquired for mapping spatial coverage of Ulva blooms were also used for biomass calculation. In addition to NDVI, two more products, percentages of green and red reflectance, were calculated using visible and NIR bands. Altogether 11 Sentinel-2 scenes corresponding to biomass surveys were used for computing optical variables. Apart from optical data, Sentinel-1 SAR scenes, acquired from ESA's site, provide the radar information in C-band at a spatial resolution of around 10 m. Sentinel-1 images were acquired in either ascending or descending modes covering the study area. The standard radar data processing chain (Small and Schubert, 2008;Small, 2011;Filipponi, 2019;Veci, 2019) were followed using ESA's Sentinel Toolbox. The processing steps include:(i) radiometric calibration and calculation of radar backscatter; (ii) speckle filtering; and (iii) terrain flattening and geometric correction. The final product of the processing was radar backscatter in decibels (Raney et al., 1994). Sentinel-1 SAR is dual polarized, capable of HH/HV and VH/VV acquisition, but VH/VV is the default mode. Polarization can impact on the results, and cross-polarized (VH) data were found to provide better results. Altogether 16 radar scenes corresponding to the biomass field survey data were processed which were acquired within 5 days of the data collection. In addition to satellite derived variables at the resolution of around 10 m, biomass data collected at the field resolution of 0.5 via field survey were used in the study.

Identification of Determining Variables
The independent variables were identified, which would explain the variability of biomass in the study area. At first, all the potential variables that could be related to biomass were considered. Among such variables, NDVI, percentage reflectance in green, red, and infrared wavelengths were selected as determining variables. Similarly, radar backscatter was considered as one of the determining variables since it is the measure of surface roughness. The test of significance and redundancy was carried out using variance inflation factor (VIF) analysis (O'Brien, 2007) through correlation before selecting the independent variables, and only four variables were shortlisted for inclusion in the model training because these were found significant as well as non-redundant. Inclusion of these additional variables for their non-linear contribution is expected to prevent the potential saturation of NDVI at higher biomass values (Huete et al., 2002;Garroutte et al., 2016;Xiao et al., 2017).

Development of the Model
An ANN models the relationship between the dependent and independent variables with the help of training and validation data. The model consists of multiple inputs, activation functions, hidden layers, and an output which are connected via artificial neurons and transmit information through structured layers (Dayhoff and DeLeo, 2001). These interconnected neurons work exactly like the neurons present in the nervous system of organisms where it learns the relationship among input variables through multiple iterations or epochs (Zurada, 1992;Agatonovic-Kustrin and Beresford, 2000;Hu et al., 2018). An ANN consists of different combinations of input variables with associated empirical weights and bias terms (Huang, 2009). Backpropagation is one of the techniques where parameters such as the number of inputs, bias terms, and weights are adjusted in forward and backward fashion until the minimum error is achieved (Rumelhart et al., 1986;Aggarwal, 2018;Hu et al., 2018). The model training and, hence hyperparameter tuning, is attained in several iterations until a stable solution is achieved. With each iteration of the ANN model, these configurations are tuned so that the structured layers of neurons can model the expected output (Bardenet et al., 2013). During the ANN development, a small number of validation data sets are set aside to prevent overfitting (Shahin et al., 2005;Piotrowski and Napiorkowski, 2013). This step assures that the model is generalized enough, and its robustness does not degrade outside of the training samples. Thus, during the model training, the samples should be representative so that the neurons can learn to model the complex relationships adequately and appropriately.
The ANN model was developed using the total of 346 biomass samples where the magnitude of biomass (g/m 2 ) was the response variable, and remaining variables (NDVI, percentage of green reflectance, percentage of red reflectance, and radar backscatter) were used as determining variables. Out of the total samples, 20% of the data points were used as validation data, where these prevent the model from overfitting due to excessive training. The hyperparameters, such as the number of hidden layers and the number of iterations to achieve a stable solution, were determined based on the error statistics reported by the model by using backpropagation-based learning algorithms (Huang, 2009). The fully trained ANN model was achieved with five hidden neurons when the number of epochs reached 50. The learning process was continued until the model performance plateaued where the error was minimal for both validation (20%) as well as training (80%) data sets. The entire process was accomplished using Google's TensorFlow (Abadi et al., 2016), a free and open-source application using Python programming language. The input data preparation was done using ArcGIS, where a table consisting of all input variables was generated.

Application of the Model
After the development of the model, the ANN model was applied to generate the biomass values using the input variables. In order to apply the model, the input data sets were extracted and compiled in the form of a table. The TensorFlow generated results were later converted to geographic information system (GIS) raster for mapping and further analysis. The biomass quantity can be predicted for any area as long as input data sets are available for any area.

Spatial Coverage of the Bloom
The extension of the macroalgal blooms from field surveys ranged between 213,100 and 5,425,900 m 2 for Lower Blackwater Estuary (2017) and Tolka (2017), respectively. There was a total of 22 EO estimates with corresponding field survey data, some of which are shown in Figure 4. A significant correlation between EO estimate and survey measurements was observed, which was independent of the satellite missions used (p-values < 0.01). All the correlation analyses yielded coefficients of determination close to 1 (between 0.93 and 0.98) and a good fit between the observations (Figure 5). The spatial coverages from EO show more detailed delineation than those mapped on the ground since the field campaign was more concentrated on the predominant and accessible regions of the Ulva blooms. The scatter plots for all the observations made for eight estuaries using Sentinel-2 and Landsat are presented in Figure 5 together with R 2 value, slope, and intercept for each plot.
Although resampled Sentinel-2 bands, upscaled to 30 m, are a derivative of bands acquired originally at 10 m resolution, it provided a similar level of performance in terms of delineation of the green algal blooms and one-to-one correspondence with the field data. The EO data sets show a slight overestimation compared with the field data, as shown in Figure 5. This overestimation is slightly higher in Figures 5B,C which corresponds to Sentinel-2 original and resampled products, respectively. For each scatter plot, the equation of the trend line shows the magnitude of the bias, and the slope of the  trend line shows the level of estimation (over or under) between EO and field survey. Comparing the rate of overand underestimation, Landsat seems to be consistent, although exhibiting a slight overestimation as indicated by the upward shift of the trend line compared to the 1:1 dotted line. Sentinel-2, in contrast, shows overestimation for lower magnitudes and a slight underestimation for higher magnitudes of coverage. These under-and overestimations of Sentinel-2 and Landsat imagery seem to be compensating each other on the combined scatter plot in Figure 5A. The slope of the regression lines (Figure 5) was close to 1 for all correlation analyses, and the results of the paired t-test (Figure 6 and Table 2) suggest no methodological bias. The p-value for all groups of observations shows values higher than 0.05, as shown in Table 2. Overall, there was no bias as can be seen from Figure 5 and suggested by the paired t-test.
Regarding the error statistics, the result of the t-test shows that the RMSE is lowest for Landsat, followed by the RMSE for the combination of resampled Sentinel-2 and native Landsat bands at 30 m. The Pearson correlation showed the correlation between EO estimates and field measurements, where Landsat shows the best results followed by the combination of Sentinel-2 and Landsat observations. Based on the RMSE and Pearson correlation values, the Landsat appears to have performed better than Sentinel with finer resolution. Nevertheless, the low number of observations for Landsat may not statistically prove that it is performing better than the Sentinel. Overall, the results show that native Sentinel-2 bands perform better than resampled bands, as evidenced by lower RMSE. In addition, the resampled Sentinel-2 still managed to provide better results and did not offer significantly higher error than the native bands.
The error analysis was done by comparing the distribution of the residuals computed between the EO estimates and the field survey data. After these inter-comparisons of the residuals, it is equally important to see if there was any notable correlation between the difference/discrepancies in percent coverage (EO estimates and field measurements) with the corresponding time lag between them. Thus, the relationship between the differences in percent coverage was analyzed against the time lags. Figure 6 shows the differences in bloom coverage estimated using EO and field surveys. From the magnitude of the residuals, it is evident that the differences cannot be considered different to zero in all the cases except marginal difference in the case of Landsat. With very few observations for less than 1 week and more than 2 weeks, it was difficult to draw any conclusion about the magnitude of the effect due to the time gap. Figure 7 shows the number of days between those observations and the absolute difference between the percentage coverages. The individual breakdown of Cohen's kappa for each Sentinel-2 and Landsat observation is indicated  by its label and color code. This figure aids visualization of the distribution of data in relation to Cohen's kappa which highlights the measure of agreement between field and EO estimations. Cohen's kappa value also provides important information about changes in the position of the bloom or the degree of spatial agreement between EO and field data. Most of the observations clustered around 10-13 days without showing any trend with the difference in the percentage coverage or Cohen's kappa (Figure 7), and the Pearson correlation coefficient indicated no correlation (Pearson's r < 0.01) between them.

Biomass Computation of the Algal Blooms
The ANN model was developed by using NDVI, percentages of green and red reflectance, and radar backscatter to compute the biomass in g/m 2 . Only the non-redundant and significant variables were used in the ANN model development based on the result from the redundancy test using VIF (O'Brien, 2007) and correlation analysis. Figure 8 shows the scatter plots of biomass with each of the determining variables where the correlation with red reflectance was highest, followed by NDVI, green reflectance, and the radar backscatter. The predicted result was compared with the biomass data measured from the field survey with RMSE value of 471.70 and adjusted R 2 of 0.74, as shown in Figure 8E.
The ANN model was used to compute the biomass images for several estuarine areas. The biomass image shows the distribution of biomass blooms within the areas delineated by the spatial coverage mapping technique based on NDVI. Figure 9 shows the biomass distribution where both the EO scenes acquisition and the field survey were conducted in the summer of 2016 and 2018 for Clonakilty ( Figure 9A) and Courtmacsherry (Figure 9B), respectively. Similarly, the computed biomasses for Malahide and Tolka for the summer of 2017 are shown in Figures 9C,D, respectively.

Spatial Coverage of the Bloom
The current study used the visible and NIR bands from the optical sensors and used the NDVI delineation technique along the tidal flats. Overall, the NDVI technique showed a fine-scale delineation of blooms than those measured in the field. The reason for a generally better delineation for the EO estimation could be due to its sensitivity to even small algal patches including microphytobenthos present in the sediment (Launeau et al., 2018). It seems more evident on Sentinel-2 than Landsat, most likely because of its higher spatial resolution. In contrast, the field survey is generally restricted to the main areas of the intertidal environment that can be accessed, as shown in Figure 4. Considering the increasing average monthly rainfall in Ireland (80-130 mm;Walsh, 2012) and associated cloud coverage, it is difficult to limit the temporal gaps to a couple of days or exclusively select scenes either before or after the field surveys. Despite this, a good correlation was obtained followed by a negligible bias and a slope close to one between satellite estimates and field survey results, as during the peak bloom phase (from June to September), the bloom extension might remain more or less constant (relative standard deviation 13-18%; Monagail et al., 2021, unpublished data).
Despite these satisfactory results, there may be small variations in the estimates due to the growth of the bloom during the time between the satellite overpass and the field surveys. This may explain relatively lower Cohen's Kappa in few locations (around 0.4, Figure 7). Similarly, the movement of the large quantities of seaweed biomass because of wind and tidal currents (Gower et al., 2008;Qiao et al., 2009;Cui et al., 2012) has not been considered in this study because algal mats producing blooms in Irish estuaries are mostly attached to the substrate. Additionally, variation in NDVI can arise from the different stages of macroalgal blooms such as healthy and photosynthetically active versus decomposing macroalgal tissue that can give rise to slightly different measures of NDVI leading to small variations in the spatial coverage. The NDVI-based technique eliminates the need to filter out dead or decomposed vegetation mapped in contrast to other indices that rely entirely on the spectral reflectance on the visible part of the spectrum such as red, green, or blue regions. Due to the seasonal growth of the macroalgae, the rate of photosynthesis varies with time, thus the use of NDVI can correctly account for the corresponding variation in electromagnetic signals indicative of vegetation health (Erener, 2011;Turvey and Mclaurin, 2012). This is particularly important in our case because macroalgal blooms may consist of algae at various states of life cycle such as mature algal tissue or actively decomposing mass. Furthermore, minor discrepancies may have resulted from the methodologies currently being adopted during the field measurements including human error and sampling bias. These issues were unavoidable considering environmental constraints such as high-cloud coverage, high-tide conditions, and field limitations such as field safety and accessibility. Planning of the field surveys around the satellite overpass can help avoid inconsistencies due to daily changes in coverages (Carl et al., 2014). Additionally, there could be some unavoidable errors in the estimation due to the signal mixing that occurs when blooms are present along a pixel boundary. Regardless of the sources of error, our data showed an excellent fit when assessing bloom coverage with very high R 2 (0.94) value with average Kappa value of 0.69 ± 0.13 suggesting a good agreement between field and EO observation.
The above observations provide evidence that the mapping results from 10 and 30 m resolutions did not differ significantly at the current scale of monitoring of algal blooms. This finding is especially helpful in an Irish context, where the number of scenes is particularly limited by the cloud cover as well as low-tide conditions. For areas with optimal weather conditions, it offers the additional advantage of more frequent monitoring. Taking into account the average monthly rainfall of 80 mm even during the drier months (April-July) and 130 mm during the wetter months (October-January) (Walsh, 2012) in Ireland, it is important to note that rainfall and cloud conditions are relatively higher here than in many other countries. Amid these limitations, in the situation where appropriate Sentinel-2 scenes are not available, Landsat acquisitions could still be used for mapping the blooms in these estuaries without greatly compromising the quality of the results. In addition, data from the legacy Landsat acquisitions or any other optical missions could be utilized to reconstruct and analyze the trend of historical blooms in any area (e.g., Bermejo et al., 2020).
The results from the current study show the higher overall effectiveness of the NDVI technique compared to similar findings elsewhere (Cavanaugh et al., 2010;Siddiqui and Zaidi, 2016;Siddiqui et al., 2019;Taddia et al., 2019). Intertidal green algae and giant kelp were mapped using the Sentinel-2 imagery, where only 66% of field survey locations showed the presence of kelp within a distance of 300 m (Mora-Soto et al., 2020).
A similar study (Fauzan et al., 2017) conducted using Sentinel-2 for acquiring the percentage coverage of seagrass showed an overall accuracy of 61% with a coefficient of determination of 0.51. Occasionally, the adopted technology was only limited to the size of the study area (area > 10,000 m 2 ; Mora-Soto et al., 2020), whereas the current study could map the bloom to the minimum extent of a few pixels using the NDVI delineation technique. The greater effectiveness in the present study can be explained by the combination of the adopted AOI delineation technique and the greatly reduced biodiversity of eutrophic mudflats which are mostly covered with Ulva alone. This approach may be challenging to implement in other more biodiverse intertidal areas.

Biomass Computation of the Algal Blooms
The biomass showed a high level of correlation with the red reflectance of the Ulva bloom ( Figure 8C). This is expected because of the increased reflectance of the plants in the visible and NIR region (Jensen, 1986). Reflectance in the green region of the electromagnetic spectrum also shows good correlation with the algal biomass, like any green vegetation. The radar indicates low levels of correlation, but the inclusion of the backscatter slightly improved the model performance, probably because of the non-linear contribution of radar that is considered in ANN. The correlation with red reflectance was negative, whereas it was positive with the rest of the variables. Inclusion of NDVI, which is a quantitative measure of healthy vegetation, ensured that only live and photosynthetically active algal tissue was considered. This is the reason why the biomass model was applied only in the area primarily delineated by the NDVI technique.
The biomass predicted using the input variables was compared with the biomass data collected from the field survey ( Figure 8E). There seems to be a higher level of correspondence between the modeled and true value as exhibited by the R 2 value (0.74) and the low RMSE (471.70) considering the relatively small number of training samples (number size: 346). Movement of bloom patches due to tides, wind, positional error due to GPS, or the growth of blooms during the time period between field surveys and satellite acquisitions might be responsible for outliers in the scatterplot which are not explained by the independent variables (i.e., 26% unexplained by R 2 ). With a greater number of iterations and the addition of hidden layers, the R 2 value could be increased further, but that could lead to model overfitting, thereby reducing model predictability (Zhang et al., 1998;Jin et al., 2004;Liu et al., 2008). Thus, to preserve the model generalization, the model was trained optimally with the help of error statistics reported during the ANN development. The model training was stopped when the model convergence was reached where the error was minimum for both training and validation.
The distribution pattern of the biomass corresponds with the point data collected during the field surveys. The range of the observed and computed biomass also matches for all the sites. Figure 9 shows the computed biomass for Clonakilty (2016), Courtmacsherry (2018), Malahide (2017), and Tolka (2017). The higher magnitude of biomass seems to be scattered uniformly over the areas where point samples were collected in Clonakilty. In contrast, higher biomass is concentrated mostly in the areas where most of the point samples were taken in case of Malahide which shows some level of sampling bias in the adopted methodology. The biomass distribution in both Courtmacsherry and Tolka show the uniform distribution and good correspondence with the field survey data points as shown in Figure 9. Although the results from the ANN are not optimal, the model can be significantly improved with the inclusion of more training samples. With the addition of more samples, the robust point-to-point data calibration and validation exercise could be accomplished in the field considering the biomass estimation at 10 m resolution.

CONCLUSION
This work involves the validation of EO data-processing techniques to develop a methodology for mapping spatial distribution and biomass of estuarine and coastal macroalgal blooms that can be easily implemented. The predicted results were compared with previously collected historical field survey data for ground-truthing the model outputs. The application of EO data sets from Sentinel-2, Landsat-5, and Landsat-8 was investigated to map seaweed blooms at eight different estuarine locations covering moderate, poor, and bad ecological status as designated by the WFD monitoring program. The study combined EO imagery and field survey data from 2010 to 2018 to map green algae, mostly Ulva blooms. The percentage coverage was computed and compared with the results from the ground, where EO estimates showed a good correlation with the corresponding field results which is very satisfactory for WFD monitoring. Both Sentinel-2 and Landsat imagery provided better estimates despite their different spatial resolutions. The findings show that the resampled Sentinel-2 bands still provide results close to the ones from the original bands.
An ANN model was developed using several determining variables to compute the biomass in the areas already delineated by the NDVI mapping technique. The results of the biomass computed using the ANN show excellent correspondence with the general distribution of the biomass survey results. The model presents significant potential for improvement with the addition of more data points. The current work presents the exploratory efforts to investigate the use of machine learning and artificial intelligence for remote sensing. The ANN model showed the R 2 value of 0.74 with an RMSE of 471.70. Therefore, this study demonstrates that with the combination of remote sensing derived variables, it is possible to delineate as well as quantify the magnitude of the seaweed blooms.
Future work could involve planning of fieldwork intending to study the influence of resolution on the estimation of coverage. The sampling methodology should be improved to reduce sampling bias and sampling coverage should be increased. These improvements will help to account for variance in the biomass currently not explained by the modeling. Preparation of the fieldwork includes scheduling of the fieldwork during the satellite overpass time around cloud-free days to optimize the number of scenes. This will help obtain more accurate estimates by minimizing the temporal gap between scene acquisition and field surveys. Additional biomass samples will be collected to retrain the ANN model so that the error could be minimized, and optimum performance could be achieved. The developed technique is easily replicable, cost-effective, and has the potential to be implemented as an effective monitoring tool because of its low overhead and extensive geographical coverage. The study has demonstrated that both the NDVI-based technique to map spatial coverage of macroalgal blooms and the ANNbased model to compute biomass have the potential to become an effective complementary tool for monitoring macroalgal blooms where the existing monitoring efforts can leverage the benefits of EO data sets.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material. Further inquiries can be directed to the corresponding author/s.

AUTHOR CONTRIBUTIONS.
SK processed the remote sensing data and prepared the manuscript. LM secured the funding for the project. LM, RB, and RW supervised the project and helped in the manuscript development. RB, RW, LM, and MM collected the field data. RB helped in statistical analysis. AM supported in computing and technical works. ED, MH, JH, P-EM, and OF helped in interpreting the results. All authors contributed to the article and approved the submitted version.