The Wastewater Contamination Index: A methodology to assess the risk of wastewater contamination from satellite-derived water quality indicators

One of the major sources of pollution affecting inland and coastal waters is related to poorly treated or untreated wastewater discharge, particularly in urbanized watersheds. The excess of nutrients, organic matter, and pathogens causes an overall deterioration of water quality and impairs valuable ecosystem services. The detection of wastewater pollution is essential for the sustainable management of inland and coastal waters, and remote sensing has the capability of monitoring wastewater contamination at extended spatial scales and repeated frequencies. This study employed satellite-derived water quality indicators and spatiotemporal analysis to assess the risk of wastewater contamination in Conceição Lagoon, a coastal lagoon in Southern Brazil. Using an analytical model, three water quality indicators were derived from Level 2A Sentinel-2 MSI images: the absorption coefficients of chlorophyll-a and detritus combined with coloured dissolved organic matter, and the backscattering coefficient of suspended solids. The temporal standardized anomalies were calculated for each water quality indicator for the period of 2019–2021, and their anomalies during a known outfall event were used to evaluate spatial variation modes. The spatial mode explaining most of the variability was used to estimate weights for the water quality indicators anomalies in a linear transformation that can indicate the risk of wastewater contamination. Results showed that the wastewater spatial mode for this region was characterized by positive anomalies of backscattering coefficient of particulate matter and absorption coefficient of detritus combined with coloured dissolved organic matter, each with a relative importance of 50%. The application of this spatiotemporal analysis was formulated as the Wastewater Contamination Index. With the aid of photographic records, and additional meteorological and water quality data, the results of the index were verified for wastewater outfall events in the study area. The methodology for constructing the proposed Wastewater Contamination Index applies to other locations and can be a valuable tool for operational monitoring of wastewater contamination.


Introduction
Aquatic systems provide a wide range of valuable ecosystem services, including drinking water, food provisioning, recreation, water purification, and climate regulation (Grizzetti et al., 2016). As a result, their surroundings are historically associated with human occupation (Di Baldassarre et al., 2010), and environmental degradation is often observed (Meyer, 2009;Yang et al., 2019). One of the major disturbances affecting water bodies in urbanized watersheds is related to poorly treated or untreated wastewater spills, especially in developing countries (Nyenje et al., 2010). This can be caused by irregular connections of sewage to the stormwater network, leaks and overflows from the sewerage system itself, bad performance of the wastewater treatment plant and its outfall pipe, or inadequate installation of septic tanks (Fono and Sedlak, 2005). The excess of organic matter and nutrients in wastewater entering the aquatic systems can cause an increase in the trophic state, algal blooms, and oxygen depletion. This results in an overall deterioration of water quality and ecological imbalance (El Mahrad et al., 2020). Pathogens associated with faecal contamination also pose health risks for humans and other living organisms (Cui et al., 2019). Wastewater spills can therefore negatively affect the chain of services and goods provided by water bodies and result in irreversible damage to the ecosystem. Furthermore, climate change has the potential to exacerbate wastewater contamination issues, as pollutant loads have been shown to be positively correlated with precipitation (Buerge et al., 2003), and extreme rainfall events are projected to increase in frequency and intensity in the coming decades (IPCC, 2018).
The dispersion of wastewater entering a water body is influenced by various environmental factors (e.g., hydrodynamics, stratification, wind, etc.) (Hoerger et al., 2014), and its identification is paramount for supporting management actions. Once a wastewater plume is detected, efforts can be efficiently invested in sealing the source, restoring affected areas, protecting vulnerable species, and preventing human intoxication by direct or indirect contact, i.e., bathing, swimming, ingestion of aquaculture products. Customarily, the monitoring of wastewater spills is conducted with in situ measurements of tracers, such as microbiological indicators (Hughes and Thompson, 2004) or chemical compounds (e.g., caffeine and pharmaceuticals) (Buerge et al., 2003;Benotti and Brownawell, 2007). Nevertheless, this approach lacks the combined spatial coverage, frequency, and agility that are required for effectively minimizing the impacts of contamination (Petrenko et al., 1997).
In this respect, remote sensing emerges as a necessary complement that provides near-real-time monitoring at extended spatial scales, especially in the case of recently developed sensors El Mahrad et al., 2020). Satellite observations are moving towards operational monitoring of key spectral and biophysical features (Groom et al., 2019) that can be associated with wastewater plumes. Reduced surface roughness caused by the presence of oil or surfactants can be detected by active microwave sensors (radar), temperature gradients are captured by thermal sensors, while water colour differences, caused by phytoplankton, organic matter, and suspended material, can be tracked with optical sensors . Petrenko et al. (1997), who pioneered the optical characterization of wastewater plumes, showed with in situ measurements that scattering and attenuation coefficients were distinctively higher in wastewater plumes in comparison to background waters. Later on, Marmorino et al. (2010) analysed airborne hyperspectral and infrared imagery to detect sewage discharge on the south coast of Florida, United States, where high levels of coloured dissolved organic matter (CDOM) and lower sea surface temperature were found to be associated with the wastewater plume. In Southern California, United States, Gierach et al. (2017) related wastewater plumes to lower temperatures and reduced radar backscattering. For another outfall diversion event at the same location, Trinh et al. (2017) could associate wastewater with lower temperature and increased chlorophyll-a concentration as derived from Landsat 8 imagery. The only consistent indicator seen in these cases is the lower temperature, which is explained by wastewater being discharged via submerged outfall pipes and resurfacing of deep and colder waters. The optical signals associated with wastewater are highly complex and covary according to the scale and form of the discharge, meteorological conditions, interaction with other local phenomena, e.g., seasonal phytoplankton blooms, and the local optical properties of waters (Gancheva et al., 2020). Emphasizing the need of considering multiple features simultaneously to take into account this complexity, Harringmeyer et al. (2021) showed that the level of chlorophyll-a in similar CDOM-rich waters was distinct for wastewater-contaminated and other land-influenced waters at the same location. Therefore, an optimal combination of different features into one single indicator could constitute a promising approach for the operational monitoring of wastewater plumes. To our knowledge, this has not been explored yet.
This study proposes a methodology for constructing such an ensemble indicator to assess the risk of wastewater contamination in aquatic systems from optical remote sensing. The focus is on the use of satellite-derived water quality indicators, temporal standardization, and spatial variation mode analysis to generate the Wastewater Contamination Index (WCI). The index is intended to spot anomalous water quality indicators across time and capture the spatial differences between wastewater plumes and ambient waters. Here we use Sentinel-2 MSI imagery and an analytical model to retrieve the water quality indicators, although these could be derived from any preferred satellite sensor and hydrooptical model. The applicability of the method is investigated in Conceição Lagoon, a coastal water body in Southern Brazil that suffers from frequent wastewater spills.

Study area
Conceição Lagoon is a subtropical coastal lagoon located on Santa Catarina Island, southern Brazil ( Figure 1). The water body is a popular natural attraction in the region, widely used by the local and tourist populations for multiple purposes: fishing, transportation, recreation, and sports (Martini et al., 2006). The lagoon has a surface area of about 20 km 2 and a heterogeneous bathymetry, with extensive sandbanks of less than 1 m water depth contrasting with muddy sediment regions of up to 8.7 m water depth (Dias et al., 2014). According to the morphological features and water properties, the lagoon is traditionally divided into three subsystems: North Lagoon, Central Lagoon and South Lagoon (Martini et al., 2006;Lisboa et al., 2008;Lüchmann et al., 2008;Fontes et al., 2010). Its only connection to the Atlantic Ocean (a 2.8 km narrow channel in the Central Lagoon) dissipates approximately 84% of the tide effects, especially the short frequency astronomic component (99% attenuation) (Godoy et al., 2008). The lagoon's exchange with the ocean is thus limited, with hydrodynamics studies suggesting water residence times from 150 to 270 days (Silva, 2021). As such, water motion is controlled primarily by wind and hydrological conditions in the contributing watershed (i.e., precipitation, evaporation, runoff) (Silva et al., 2017). The urbanized area in the watershed increased from 13% to 23% between 1990 and 2019 (Odreski et al., 2021). The sanitation infrastructure, nonetheless, was not developed at the same rate. It serves only 57% of the population and there are frequent issues with pumping stations' overflows and irregular sewer connections (Odreski et al., 2021). Both the deficient sewage system and non-sewered areas contribute to the sustained organic matter and nutrient loading into the lagoon (Cabral et al., 2019). This has led to the chronic eutrophication of Conceição Lagoon, with recurrent algal blooms and hypoxia/ anoxia conditions (de Barros et al., 2017;Silva et al., 2017;Cabral et al., 2019). The rupture of the disposal pond of a Wastewater Treatment Plant (WWTP) in late January 2021, discharging 79,000 m 3 of wastewater into Conceição Lagoon (CASAN, 2021), caused severe consequences to the lagoon's ecosystem and evidenced the urgency of monitoring wastewater spills in the area. Due to its magnitude and the presence of a visible plume in the following days, this event serves as a suitable training sample to investigate signals that could be used to identify wastewater from space.

Sentinel-2 MSI images
Sentinel-2 MultiSpectral Instrument (MSI) imagery from the European Space Agency (ESA) was used due to its high spatial and temporal resolution and radiometric quality, yielding recognized suitability for aquatic studies (Caballero et al., 2020). The combination of two polar-orbiting satellites carrying the MSI (Sentinel-2 A and Sentinel 2-B) provides multi-spectral images in the study area every 2-3 days with a spatial resolution of 10 m in the visible and near-infrared bands. Level-2A Products, atmospherically corrected with sen2cor processor, were downloaded from the Copernicus Open Access Hub 1 . This product is only available for images from December 2018, therefore the period of analysis was defined as January 2019 to December 2021. In total, 138 images acquired for the tile that covers the study area (tile index: T22JGQ) were selected to compose the Sentinel-2 time series, taking into consideration cloud cover in the lagoon. Every year the total number of images was consistent in the order of 42 to 49. Each month was represented by at least one image, except for October 2021, when no cloud-free image was acquired (see Supplementary Figure 1). The scene classification map, part of the Level-2A product, was used to mask clouds and cloud shadows.

Auxiliary data
Meteorological data were used to support analysis and investigate their influence on wastewater spills and plume dispersion. Data was acquired from a weather station from the Brazilian Institute of Meteorology (INMET) 2 , located approximately 15 km from Conceição Lagoon (Station Code: A806). Hourly measurements of precipitation, wind direction and wind speed were downloaded for the period between January 2019 and December 2021.
To assess the contamination risk, we used in situ measurements of Escherichia coli (E. coli), a faecal indicator bacteria. These are systematically collected and made available by the Environmental Institute of Santa Catarina (IMA/SC) as part of their bathing water monitoring programme 3 . They have 9 monitoring points in Conceição Lagoon (Figure 2), which are sampled weekly in the high season (November-March) and monthly in the low season (April-October). For the 2019-2021 period there were 80 measurements available per point.
Other water quality measurements and technical reports from researchers at the Federal University of Santa Catarina (UFSC) were considered to characterize wastewater outfall incidents and the status of the lagoon. In addition, photographic records of anomalous conditions observed in Conceição Lagoon were used to further evaluate the results. These were gathered online from news media and are referred to when used.

Retrieval of water quality indicators
The 2SeaColor 4 (Salama and Verhoef, 2015) is an analytical model that retrieves Inherent Optical Properties (IOPs) from remote sensing reflectance (R rs ). The retrieved IOPs reflect the composition and state of the water (IOCCG, 2006) and therefore represent Water Quality Indicators (WQIs). The model simulates light interactions within the water by analytically solving the twostream radiative transfer equations (forward model), including the three radiation components: downwelling direct and diffuse fluxes and upwelling diffuse flux. To reduce the dimensionality of unknowns, parametrizations are implemented as in Yu et al. (2016). An inversion scheme, based on spectral optimization with non-linear least-squares fitting, is used to derive water surface IOPs. Held as important factors in non-linear optimization, the initialization values are defined based on Lee et al. (1999). The equations of the forward model, parametrizations and initialization values are detailed in the aforementioned references and summarized for convenience in Supplementary  Table 1.
In this study, we adapted 2SeaColor to be applicable to the Sentinel-2 MSI images for deriving three WQIs: i). a chla (440)-absorption coefficient of chlorophyll-a, ii). a dg (440)-combined absorption coefficient of detritus and coloured dissolved organic matter (CDOM), and iii). b bspm (440)-backscattering coefficient of suspended particulate matter (SPM). These indicators were selected considering that wastewater plumes have been previously associated with high levels of CDOM (Marmorino et al., 2010;Ayad et al., 2020) and SPM (Ayad et al., 2020). High chlorophyll-a concentration has also been observed in the wake of wastewater outfalls .
The 2SeaColor has been shown to perform well in both freshwater (Salama and Verhoef, 2015) and coastal turbid waters (Arabi et al., 2016;Yu et al., 2016;Arabi et al., 2018). Since the proposed method for formulating the WCI relies on the WQIs anomalies and their spatial and temporal variability (see section 2.4), validation of IOP retrievals specifically for the study area was not considered a requisite. Areas with water depths shallower than 1.5 m, based on the bathymetric map provided by Horn et al. (2022), were masked to avoid the bottom reflectance effect on the retrievals. The resulting time series of WQIs' images was arranged as a raster cube, with the third dimension representing time.

Formulation of the wastewater contamination index-WCI
The steps taken for the formulation of the WCI from the WQIs raster cube are summarized in Figure 3 and the processes are detailed in the sub-sections below.

Standardization of WQIs in time
Standardization with respect to time removes the effects of sitespecific characteristics and enables the comparison between different locations based on the distance of values to the average conditions (Dabernig et al., 2017). Large values of standardized anomalies (positive or negative) indicate unusual conditions for each specific location.
Therefore, the first step in the formulation of the Wastewater Contamination Index (WCI) was to standardize each WQI image in

FIGURE 3
Flowchart of the formulation of the Wastewater Contamination Index (WCI). x, y, and t are the three-dimensional components of the raster cube, where x and y represent the coordinates of pixels and t the time step of the image. Numbers denote the section in which the process is described.
3 https://balneabilidade.ima.sc.gov.br/ 4 Code for processing point radiometric measurements available at https:// github.com/suhybsalama/2SeaColor Frontiers in Environmental Science frontiersin.org time and obtain their anomalies according to the following equation (pixel-by-pixel): Where WQI(x, y, t) is the water quality indicator at a pixel (x, y) and specific time step (t), and μ WQI and σ WQI are the temporal mean and standard deviation of the pixel time series, respectively.
This process generates a time series of the standardized anomalies images for each WQI, arranged as a raster cube. The images in the raster cube show, in a comparable range, the spatial distribution of how the individual WQIs at a specific time-step are deviating from the time series average. The presence of seasonality in the generated WQIs anomalies raster cube was also evaluated by investigating the mean monthly WQI anomaly per subsystem of the lagoon and the local climatology.

Selection of training sample
The image acquired on 08-02-2021 was selected as a training sample to investigate signals of the WQIs anomalies associated with wastewater contamination in Conceição Lagoon. Wastewater is defined here as untreated or partially treated domestic sewage originating from incidental ruptures or leaks. This specific image captured the spatial dynamics of a wastewater plume (hereby named reference plume) coming from the area where the Conceição Lagoon WWTP disposal pond ruptured on 25-01-2021. Assumably due to the large proportion of the outfall (79,000 m 3 ) and favourable hydrodynamic conditions (southerly winds), a plume was still identifiable from the natural-colour image acquired 14 days after the rupture (Figure 4). Note that no rainfall was recorded on the three preceding days, so it was not a runoff-related plume. An assumption is made that the signals associated with the plume of this specific event would be representative of other wastewater plumes in the area.

Spatial variation mode analysis
The WQIs anomalies of the training sample were extracted and used as input for a Principal Component Analysis (PCA). This technique allows for reducing the dimensionality of the data and finding a linear combination of WQIs anomalies that best captures the differences between the wastewater plume and the background waters.
PCA is widely used for data compression, and its applicability in remote sensing to derive indices that synthesize the effects of multiple indicators has been demonstrated (Ingebritsen and Lyon, 1985;Hu and Xu, 2019;Guo et al., 2020;Yu et al., 2021). According to Hu and Xu (2019), the formulation of indices based on the first principal component is an unbiased method that can objectively provide weights to variables in such a way that a linear combination best explains the data variation. Furthermore, Ayad et al. (2020) have shown that PCA of reflectance spectra enabled the distinction of stormwater, wastewater, and clean water on the Southern California coast.
The classic approach of PCA is based on the mathematical technique of Eigen-decomposition of the covariance matrix. Eigenanalysis suggests that any symmetric matrix, such as the covariance of a centred matrix (COV(A)) can be decomposed as: Natural colour composite of the image used as a training sample (08-02-2021), in which the reference wastewater plume coming from the area of the disposal pond burst is visible.

Frontiers in Environmental Science frontiersin.org
Where V is an n × n matrix with orthogonal eigenvectors as columns that give the direction of transformation and L is an n × n diagonal matrix of eigenvalues. The projection of the centred matrix A onto the orthogonal matrix V gives the Principal Components (PC) matrix: Each column of PC is a Principal Component, each eigenvector (column of V) associated with a PC gives the direction of transformation of the data, and their corresponding eigenvalue gives a quantitative assessment of the explained variance. The PCs are organized in order of variance explained, with the first PC, transformed by the first eigenvector, being the one that contains most of the information in the dataset.
For this case, we have the training sample with the three WQIs anomalies as variables, which can be represented by the following array: anom a chlam anom a dg m anom b bspmm Where anom a chla , anom a dg and anom b bspm are the standardized anomalies of the WQIs in the training sample, n is the number of variables (3) and m represents the number of pixels from the reshaped 2D image subset with k rows and i columns (m k × i).
PCA is applied to A following the procedure of standardizing the variables (columns) of the input array before computing the covariance matrix. This is done to avoid bias towards the variable with higher values and variance. The first PC is the transformation given by the first eigenvector (V 1 [v 1 , v 2 , v 3 ]): PC 1 v 1 . anom a chla + v 2 . anom a dg + v 3 .anom b bspm (4) PC 1 is then used to establish weights for a linear combination (LC) that explains most of the spatial variability in the training sample, capturing the difference between the plume and background waters. Provided elements of the eigenvector are all positive or negative, the weights (w) can be calculated as follows: LC w 1 . anom a chla + w 2 . anom a dg + w 3 .anom b bspm , w n v n v Where v n is the n-th element of the first eigenvector.
To evaluate if the training sample had distinct spatial dynamics when compared to images of other days, we applied the same process to the other 137 images and compared their first eigenvectors. The comparison was done based on the Spectral Angle Mapper (SAM) technique (Kruse et al., 1993), which calculates the angle difference between a test and a reference vector of n-dimensions. The larger the angle between the reference and other image's eigenvectors, the less similar they are.

Application of the weights and rescaling
The weights found in the previous step are applied to all the images of WQIs anomalies, generating a new raster cube of LC images. From the empirical distribution function of the raster cube, we defined the minimum (LC min ) and maximum (LC max ) values to contain 98% of the distribution (excluding the values with less than 1% density at both extremes) and rescaled the dataset to obtain the Wastewater Contamination Index (WCI): The result is a raster cube in which the pixel values of images vary mostly between zero and one.

Risk classes definition
To classify the WCI values according to the risk of wastewater contamination, we used the E. coli measurements in Conceição Lagoon and compared their Cumulative Distribution Function (CDF). The data used to construct the E. coli empirical CDF comprises 720 measurements taken between 2019 and 2021, 80 for each of the nine sampling points (sites) in the lagoon. The WCI empirical CDF was generated by extracting the mean WCI value in a buffer of 100 m from the E. coli sampling points for the whole time-series. There were between 123 and 132 values per point (varying due to cloud cover), totalizing 908. Points P61 and P66 did not have values extracted because the buffer area coincided with the shallow-water mask.
Since WCI and E. coli are assumed to represent wastewater contamination, their CDFs were compared. The E. coli is taken as a reference independent variable that directly indicates wastewater contamination. The bacteria are abundant in human and animal faeces, having only been found in natural waters that have received recent faecal contamination. According to Brazilian legislation 5 , waters are considered of excellent quality if they have E. coli levels up to 200 MPN 6 /100 mL in 80% of samples collected at the same place over 5 weeks, indicating low risk of contamination. When E. coli is higher than 800 MPN/100 mL in more than 20% of samples collected over 5 weeks, water is considered unsuitable for primary contact due to the high risk of contamination. Based on this, we defined the following E. coli thresholds to characterize the risk of wastewater contamination: • Low: <200 MPN/100 mL • Medium: 200 to 800 MPN/100 mL • High: >800 MPN/100 mL The CDF percentile matching these E. coli thresholds can then be extracted and transferred to the WCI CDF. This allows finding the WCI values that corresponds to the low, medium and high risk of contamination classes.

Detection of wastewater spills in Conceição Lagoon
To evaluate the WCI capabilities in detecting wastewater plumes in Conceição Lagoon, the signals produced on images associated with known outfalls that occurred in the area between 2019 and 2021 were investigated. Although at least six incidents have been reported in that period, most of them did not coincide with satellite overpass or there was cloud cover preventing analysis. For this study, two reported wastewater outfall events were analysed by inspection of the WCI maps at the locations of the expected plume. Environmental conditions (i.e., wind, precipitation, water quality measurements) and photographic records were also considered in the analysis.

WWTP disposal pond rupture-25-01-2021
The disposal pond of the Conceição Lagoon Wastewater Treatment Plant (CL WWTP) burst on 25-01-2021 after intense rainfall, releasing approximately 79,000 m 3 of the effluent-sediment mixture directly into Conceição Lagoon 7 . The pond is where the sewage is disposed of after treatment, but monitoring showed that on several occasions the efficiency of the WWTP was below what is required by law, presenting high biological oxygen demand (BOD), nitrogen and phosphorus levels (ARESC, 2021). It is estimated that the event released to the lagoon 1.44 tons of BOD, 2.04 tons of ammoniacal nitrogen, 0.43 tons of nitrate and 0.36 tons of total phosphorus (Odreski et al., 2021). Following the date of the event, three (partially) cloud-free images were available, for which the WCI maps were investigated: 03-02-2021, 05-02-2021, and 08-02-2021.

Leak from sewage pipe-19-05-2020
On 19-05-2020, a leak from a sewage pipe was identified in Centrinho da Lagoa, the neighbourhood's urban centre, next to the South Lagoon. Inspection by the city's Environmental Agency (FLORAM) verified that, due to a broken sewage pipe, raw waste was being discharged into the stormwater system and directly into the lagoon 8 . It was expected to be leaking for a few weeks already, as laboratory analysis from 27-04-2020 showed high nutrient levels, low dissolved oxygen and the presence of faecal indicators (UFSC, 2020). In addition, reported large patches of scum (decaying biofilm of bacteria, diatoms, cyanobacteria and dinoflagellates) and fish dieoffs between the 14th and 20th of May (UFSC, 2020) evidenced the effect of the contamination that was taking place. To evaluate the capability of the WCI to capture this event we used the maps of 11-05-2020 and 16-05-2020.

Formulation of the wastewater contamination index-WCI
In Figure 5 we have the mean monthly WQI anomalies for the three subsystems of the lagoon (North, Central, and South), in addition to the local temperature and precipitation climatology. It is not possible to observe a clear seasonality pattern, although there are indications that high positive anomalies of all WQIs coincide with an increase in precipitation at the beginning of (austral) Spring in September. Still, since in this study we are analysing events with scenes from February and May, potential effects of seasonality would not cause interference on the interpretation. Figure 6 shows the WQIs anomalies of the training sample, which was used as input to the spatial variation mode analysis. It is possible to observe that a chla has homogeneous positive anomalies in the entire lagoon, with little spatial variability. a dg , on the other hand, has extremely high positive anomalies throughout the Central and North subsystems, while values indicate average conditions in the South Lagoon. For b bspm we see high positive anomalies originating from the area of the disposal pond burst, with hotspots in the Central Lagoon and on the margins of the North Lagoon. Reflecting the spatial variability of the WQIs anomalies, the first Principal Component (PC 1 ) of the training sample, which explained 56.6% of its variability, was given as:

FIGURE 5
Monthly means of (A) climatological temperature and precipitation in the study area, (B) anomalies of suspended particulate matter backscattering coefficient in the studied period, (C) anomalies of combined detritus and coloured dissolved organic matter absorption coefficient in the studied period, and (D) anomalies of chlorophyll-a absorption coefficient in the studied period. Error bars represent the standard deviation. Temperature and precipitation climatologies were obtained from the Brazilian Institute of Meteorology (INMET) and refer to the period of 1992-2020. 7 https://g1.globo.com/sc/santa-catarina/noticia/2021/01/25/alagamentoatinge-casas-e-arrasta-carros-na-lagoa-da-conceicao-emflorianopolis-assustador-diz-coordenador-da-defesa-civil.ghtml 8 https://ndmais.com.br/meio-ambiente/teste-constata-esgoto-nalagoa-e-casan-pode-ser-multada-em-r1-milhao/ Frontiers in Environmental Science frontiersin.org PC 1 0.02 .anom a chla + 0.92 . anom a dg + 0.92 . anom b bspm (7) The comparison between the first eigenvectors of the reference (training sample) and the other 137 images showed angle differences ranging from 17˚to 178˚, with a mean of 54˚. This indicates that the reference eigenvector provides a distinct rotation when compared to the other eigenvectors, and confirms the atypical conditions seen in the training sample.
The reference PC 1 rotation implies that the anomalies of SPM backscattering and absorption by combined detritus and CDOM are equally important in the differentiation of the wastewater plume (49.5% each), whereas absorption by chlorophyll-a provides a very low contribution (1%). This suggests that the wastewater plume in this case is characterized by simultaneous positive anomalies of a dg and b bspm . For that reason, the optimal Linear Combination of WQIs anomalies (LC) for wastewater detection was defined as: LC 0.50 . anom a dg + 0.50 . anom b bspm (8) Figure 7 shows the empirical cumulative distribution function (ECDF) of the LC raster cube, from which the minimum (1% probability) and maximum (99% probability) values for rescaling were defined as −1.35 and 2.28 respectively.
In Figure 8 we have the comparison between the empirical CDFs of the E. coli measurements and the WCI values. The E. coli CDF shows that 58% of the measurements fall within the threshold of low risk of contamination (<200 MPN/100 mL) and 80% within the medium risk upper threshold. Transferring these percentiles to the WCI CDF, the thresholds for the risk of contamination are defined as follows: • Low: WCI < 0.41 • Medium: 0.41 < WCI < 0.56 • High: WCI > 0.56 As an indirect verification, we checked the distribution of the risk classes based on the WCI and E. coli measurements for each of the sampling sites during the studied period (Figure 9). For most of the sites, the WCI was able to capture very similar distribution features as the indicated by E. coli measurements. Exceptions are seen for points 38 and Maps of WQIs temporal anomalies for the training sample (08-02-2021).

FIGURE 7
Empirical Cumulative Distribution Function (ECDF) of the Linear Combination (LC) raster cube, from which the minimum (prob = 1%) and maximum (prob = 99%) values were extracted for rescaling LC into WCI.

Frontiers in Environmental Science
frontiersin.org 62, where especially the high-risk class frequency was underestimated by the WCI. For point 43 the WCI overestimates the medium and high risk classes, and underestimates the low risk. It should be noted that these three points are located at the transition between the South and Central subsystems of the lagoon (see Figure 2), which per se has complex dynamics. In addition, the morphology of this region makes it highly susceptible to adjacency effects.

Analysis of wastewater spill events 3.2.1 WWTP disposal pond rupture-25-01-2021
In the WCI map of 03-02-2021 ( Figure 10A) (8 days after the rupture) we observe a very high risk of contamination towards the East of the area where the contents of the disposal pond waste entered the lagoon. This suggests the transport of the wastewater plume into that area, which shows high positive anomalies for a dg and b bspm (Supplementary Material). The predominant wind direction between the day of the accident and this map was from the North, with mostly light breeze intensity (<3.5 ms −1 ). Yet, there is a residual hydrodynamic flux from the area of the burst into the East (Silva et al., 2017) that could be surpassing the effect of the wind in this case and transporting the wastewater plume there. As verified in the corresponding naturalcolour image (not shown), clouds and shadows that were not masked caused noises restricting the analysis in other parts of the lagoon, especially in the North and South sectors.
The WCI map of 05-02-2021 ( Figure 10B) shows a medium-low risk (0.25-0.45) of wastewater contamination throughout the lagoon. The very high-risk spots seen in the North Lagoon result from clouds that had not been entirely masked in Sentinel-2's scene classification map. Because there are no relevant spatial patterns, this indicates that, at this point, the lagoon's water has been well-mixed. Between the dates of the last WCI map and this one, the predominant wind was North (N) and North-East (NE), with a relatively high frequency (5%) of stronger winds (3.5-5.5 ms −1 ) from the NE, which could have caused good mixing and dilution conditions.
In the WCI map of 08-02-2021, we see a plume coming directly from the area where the disposal pond burst, with a high risk of wastewater contamination ( Figure 10C). This suggests that the burst area was a source of contamination up to 2 weeks after the outfall event. This could be explained by the large amount of contaminated sediment carried during the accident and accumulated around the margins of the lagoon, forming a delta 9 , and by the favourable dispersion conditions. From the date of the previous WCI map until this one, the predominant wind direction was South-East (SE) (>50% frequency), most of the time with speeds higher than 3.5 ms −1 . This wind direction facilitates the resuspension and transport of the accumulated sediment towards the North of the lagoon, as was captured by the WCI map. A very high risk of contamination is seen in the extreme North of the lagoon, which is in agreement with reported events of massive fish die-offs, oxygen depletion and toxic algae bloom in the region in the following weeks (starting on 22-02-2021) (UFSC, 2021). It was speculated whether this was an effect of the disposal pond burst, another pollution spill in the North sector itself caused by the opening of drainage ditches around the irrigation field of Barra WWTP (see location on Figure 1) on 16-02-2021, or a combination of factors that produced synergistic effects. No conclusion was reached, but considering that there were signs of contamination already on 08-02-2021, and between then and the 22-02-2021 the predominant wind was still from the SE, our results support the hypothesis that the events were probably related.

Leak from sewage pipe-19-05-2020
In both the WCI maps of 11-05-2020 and 16-05-2020 (Figure 11), which correspond to 8 and 3 days before the discovery of the leak, we see a low risk of wastewater contamination in most of the lagoon. The presence of shallow waters in the margins prevents evaluation of the conditions closer to the leak area, but we observe a path with a medium risk of contamination (>0.41) originating from the leak and spreading to the East margin of the South Lagoon on 11-05-2020. On 16-05-2020 the higher risk area spread towards the rest of the South Lagoon, reaching the West branch. Even though the risk is higher and can be distinguished from the background, the values still represent a medium risk of contamination. This is due to the conditions seen in the WQIs anomalies of the corresponding dates (Supplementary Material), in which the plume is characterized by positive anomalies for a dg and negative for b bspm . This is probably related to the characteristics of the event, i.e., a small-scale continuous leak from a broken pipe and no association with stormwater or sediment entrainment. Supporting the indication of the plume transport to the South Lagoon given by the WCI, a technical report (UFSC, 2020) describes scum patches coinciding with the areas of higher risk of contamination during the event. Photographic records also show scum patches on that area (Canto da Lagoa) on 18-05-2020 10 .

Contribution of WQIs anomalies in distinguishing the reference wastewater plume
The spatial variation mode analysis of the training sample provided a linear combination that best captured the reference wastewater plume with equally high weights for anom b bspm and anom a dg (50%), and no contribution from anom a chla . Note that during PCA the variables were standardized to avoid bias towards the one with higher overall values and variance.

FIGURE 9
Distribution of the risk of contamination classes (Low, Medium, and High) for each of the in situ sampling points based on the E. coli measurements and extracted from the WCI maps.
10 See first picture in https://www.nsctotal.com.br/colunistas/renato-igor/ poluicao-na-lagoa-da-conceicao-em-florianopolis-nao-pode-serpolitizada Frontiers in Environmental Science frontiersin.org  Frontiers in Environmental Science frontiersin.org 12 The high weight obtained for anom b bspm is physically sound considering that, given the nature of the reference plume (i.e., wastewater-slurry mixture from disposal pond flash-flood), elevated concentrations of suspended solids are expected. This is in agreement with Ayad et al. (2020), who showed that mixed stormwater and wastewater had medium to high turbidity levels associated with suspended sediments. As for a dg , wastewater is known to be closely associated with dissolved organic matter (Hudson et al., 2007), which makes it logical for its anomaly to have an important role in distinguishing the plume. This is also in line with the results of Marmorino et al. (2010), who found sewage discharge to be related to high levels of CDOM.
The negligible effect of anom a chla to distinguish this plume is in accordance with other studies that were not able to identify a detectable satellite-derived feature of chlorophyll-a in wastewater contaminated areas Seegers et al., 2017). Possible reasons given by the authors include the short duration of the outfall, chlorination in the case of treated wastewater, or strong mixing and dilution. We also argue that phytoplankton growth due to a wastewater spill would likely be a post-effect following nutrient enrichment and dependent on other environmental conditions, i.e., temperature, stratification and light availability.

Limitations of the WCI
A major assumption in this study is that the signals associated with the reference wastewater plume (derived from one specific outfall event) would be representative of other wastewater plumes in the area. This was a large-scale event, with an extensive visible plume even after weeks of the rupture. Therefore, it constituted a useful training sample to investigate the satellite-derived features associated with wastewater. Nevertheless, the weights given to the WQIs anomalies in the linear combination to formulate the WCI carry embedded characteristics of this one event, such as the large influence of suspended sediments due to the flash-flood nature, in combination with organic material. We would therefore expect that plumes that resemble the reference would be more easily detected and classified with a higher risk of contamination. In contrast, when only a dg is positively anomalous, which could in many cases also indicate wastewater contamination, the WCI may not capture the risk level well enough (false negative alarm). It is also possible that b bspm is anomalously high due to strong resuspension related to winds or extreme events, and the WCI captures a high risk of contamination that does not reflect the truth (false positive alarm). It should be noted that, the index being used as an operational tool, it is always possible to consult the WQIs anomalies maps individually (as these would have to be generated in the process), which could help assessing situation.
Examples of potential underestimation of contamination risk were seen in the WCI maps of 05-02-2021 ( Figure 10B), 11-05-2020 and 16-05-2020 ( Figure 11). All of these maps were associated with wastewater spill events, but the coefficient of backscattering by SPM was low in comparison with average values. Therefore, even with markedly positive anomalies of a dg dominating the light attenuation, the risk was still interpreted as medium-low (0.3-0.5) as a cancellation effect of negative b bspm anomalies. It should be noted that CDOM is more conservative than SPM, with SPM decreasing rapidly due to settling and CDOM decaying over a time frame of weeks to months with photodegradation (Nezlin et al., 2008). The WCI does not take this into consideration at this point.
In addition, overestimations in SPM backscattering could also result in misinterpretation of the contamination risk.
Overestimations of b bspm in the 2SeaColor model are caused mainly by unmasked glint, clouds, bottom reflectance, foam, or boats, which all cause high NIR reflectance. These overestimations could be minimized by having a more efficient cloud masking algorithm for coastal and inland waters, such as WiPE (Ngoc et al., 2019), the inclusion of bottom reflectance in the model so that shallow waters can also be evaluated, and a routine to detect boats or other artificial objects.
Ideally, a larger number of images portraying known wastewater plumes in varied conditions should be used to have more representativeness and statistical strength in the weights of the linear combination and formulation of the WCI. It would also be preferred to have more water quality in situ measurements coinciding with satellite overpass on days of known contamination. Still, challenges arise from the fact that information about wastewater spills is not commonly disclosed by sanitation companies, especially in developing countries, as this might have negative implications for them. In addition, overflows from sewage pumping stations are typically associated with heavy rainfall and cloud cover. This constitutes a constraint of the method since optical remote sensing relies on clear skies. Yet, depending on the scale, duration and circulation conditions during the outfall, plumes might still be detectable after a few days, as was seen in the cases analysed in this study. Ayad et al. (2020) highlight the importance of considering circulation patterns in wastewater studies, as there is a knowledge gap regarding residence time and dilution of the plumes. According to the study of Silva (2013), Conceição Lagoon's circulation patterns are governed primarily by wind, flow in the contributing streams and the meteorological tide. The wind was the only parameter available for the study period and considered in the analysis, which highlighted its influence on the transport of wastewater plumes and the overall dynamicity of Conceição Lagoon system. Nevertheless, the inclusion of flow and meteorological tide data could enhance the robustness of the analysis by allowing the isolation of the processes that contribute to the detection of contamination.

Applicability of the WCI to other locations
The methodology applied here for the construction of the WCI is easily transferable to other locations and it is desired for further verification. For that, it is only necessary to have a time series of WQIs, with at least two dates capturing a wastewater plume (one for training and another for testing). It should be noted that the span of the time series influences the calculation of the WQIs anomalies (and consequently the WCI), as the mean and standard deviation will depend on that. It is also important to investigate the seasonality of the WQIs in the area of study and, if present, the anomalies should be calculated based on seasonal values. This is because if the WQIs have distinct seasonality, calculating the mean and standard deviation over the whole period will not remove it, and seasonal characteristics will still be present in the standardized anomalies, biasing the interpretation.
The choice of the training sample(s) is also important. It should consist of scenes that capture the spatial dynamics of wastewater plumes that are considered representative of the study area. A higher number of training samples should enhance the representativeness and is therefore preferred. The coverage of the training sample should comprise the whole area of interest in which the WCI will be applied, as to consider the site-specific characteristics in the differentiation of the background waters and the plume. The optical complexity of the area and the characteristics of the wastewater spill will influence the weights given to the WQIs anomalies in the spatial variation mode analysis, and provide an optimal linear combination to indicate the risk of contamination that is unique to the location. Although the weights are expected to vary from location to location, the overall methodology is transferable and could constitute a useful management tool for water management.

Conclusion
This study investigated how satellite-derived water quality indicators could be used to detect wastewater contamination in aquatic systems. We proposed a novel methodology that uses temporal anomalies of water quality indicators and their spatial patterns during a known outfall event to derive the Wastewater Contamination Index (WCI). The WCI indicates the risk of wastewater contamination, and was verified qualitatively for two outfall events in Conceição Lagoon, Southern Brazil with the aid of meteorological data, photographic records, and technical reports. One of the main findings was that a typical wastewater plume in Conceição Lagoon could be differentiated from background uncontaminated waters by simultaneously high positive anomalies of SPM backscattering coefficient and combined absorption coefficient of detritus and CDOM. This characteristic is explained by the elevated concentration of solids in outfalls associated with stormwater and the high organic matter content in waste. The method can be applied to other inland and coastal waters, and can therefore constitute a valuable management tool for monitoring wastewater contamination.

Author contributions
AD, JT, DW, and MS contributed to the conception, methodology, investigation, and analysis of the study. AD and MS were responsible for coding and processing the data. AD did the writing of the original draft and visualizations. MS, DW and JT reviewed and edited the manuscript. All authors contributed to the manuscript revision, read, and approved the submitted version.