The Application of an Artificial Neural Network to Quantify Anthropogenic and Climatic Drivers in Coastal Phytoplankton Shift

The overlapping effect of anthropogenic activities and climate change are major drivers for a shift in coastal marine phytoplankton biomass. Linear regression analyses are not sufficient to detect the nonlinear relationship between complex environmental factors and phytoplankton shift. Here, an Artificial Neural Network (ANN) model is applied to quantify the relative contribution of pearl oyster farming, temperature and rainfall on phytoplankton increases in Cygnet Bay, Australia. The result shows that increased oyster farming ranks among the most important factors for phytoplankton increases, with a relative importance of 54% for diatoms and 74% for dinoflagellates; temperature plays a second important role with a positive impact on diatoms (relative importance of 25%) but a negative impact on dinoflagellates (relative importance of 19%); rainfall is the least important which enhances diatom biomass only (relative importance of 21%). Our ANN analysis provides a useful approach for quantifying the complex interrelationships affecting phytoplankton shift.


INTRODUCTION
Over the past several decades, coastal phytoplankton has been facing strong and rapid changes forced by a combination of human activities and climate change (Boyce et al., 2010;Cloern and Jassby, 2010).Nutrient enrichment caused by aquaculture, coastal development and pollution are just a few of the many ways humans impact coastal seas altering phytoplankton dynamics (Liu et al., 2013;Glibert, 2020).Meanwhile, climate-driven changes in temperature, precipitation and large-scale climate patterns can have major consequences for phytoplankton through changes in hydrologic conditions or macronutrient concentrations (Lewandowska et al., 2014;Thompson et al., 2015;Lin et al., 2020).In general, it is thought that climate change will result in a decrease in marine phytoplankton productivity as ocean warming enhances stratification, reducing nutrient supply to the upper ocean (Behrenfeld et al., 2006;Boyce et al., 2010).However, the situation tropical coastal seas may well be different with changes to rainfall patterns and cyclone frequency and intensity, coastal land use and even wildfires, resulting in increased nutrient inputs which enhance coastal phytoplankton productivity (Yuan et al., 2020;Liu et al., 2022).Understanding how these multiple environmental variables have influenced phytoplankton in the past is fundamental for the prediction of future changes, and to help develop policies for ecosystem health (Levin et al., 2015).
Paleoecological analysis of geochemical proxies in the sediments is increasingly being applied to investigate and interpret environmental changes, due to its high efficiency to acquire long-term ecological records (Smittenberg et al., 2004;Xing et al., 2016;Yuan et al., 2018).Most of these studies have assumed a simple linear relationship between environmental factors and phytoplankton, and attributed to the most relevant factor via straightforward visual inspection or linear regression analyses.However, the complex nature of phytoplankton dynamics, in particular multivariate (Irwin et al., 2012;Mutshinda et al., 2013) and nonlinearity (Jochimsen et al., 2012;Feng et al., 2021), make traditional statistical tools such as Pearson correlation analysis and Principal component analysis (PCA) difficult to explain all the complex interrelationships affecting phytoplankton dynamics.There is an imperative to apply more elaborate methods to better understand the complex relationship between the environmental forces and phytoplankton response.Artificial Neural Network (ANN) is a type of machine learning algorithm inspired by the functioning of the brain and nervous system (Rumelhart et al., 1986).The capability of approximating nonlinear multivariate functions with high accuracy and stability makes them particularly suitable to solve complex ecological problems (Gevrey et al., 2003).This approach has been proved to attain better performance than traditional linear statistical tools (Maravelias et al., 2003) and successfully applied to solve a range of relation recognition problems from landscape (Buckland et al., 2019) to microbial community (Santos et al., 2014).Specifically for phytoplankton research, ANNs have been applied in different aquatic environments such as rivers (Jeong et al., 2006), lakes (Li et al., 2007), estuaries (Coutinho et al., 2019), and open ocean (Mattei & Scardi, 2020) to understand the influence of different environmental factors on phytoplankton.
The Kimberley coast, tropical Australia (Figure 1), is a remote region with a small population living in small townships spread of a very large area, but is undergoing increasing pressures from anthropogenic activities and climate variability (Lough, 2008;O'Donnell et al., 2015).Field observations and paleoecological data suggested that significant increases in phytoplankton productivity have occurred over the past several decades (Furnas & Carpenter, 2016;Liu et al., 2016;Yuan et al., 2018).Most evidence demonstrated that climate-induced changes in temperature and rainfall were the dominant drivers because warming directly affected phytoplankton metabolic rates, and rainfall associated with increased tropical cyclones can enhance nutrient supply (Yuan et al., 2018;Yuan et al., 2020).In addition, nutrient enrichment from mariculture (Liu et al., 2016), seasonal urban runoff and contaminated groundwater inputs (Estrella et al., 2011;Gunaratne et al., 2017) can elevate phytoplankton production significantly at the local scale.It is acknowledged that multiple factors may jointly increase phytoplankton biomasses in this region, however, the effects of each driver still lack quantitative assessment.The climatic factors, temperature and rainfall, and anthropogenic history are significantly correlated and so estimated effects will also be correlated.Ideally, a processbased ecological model would be used to simulate phytoplankton responses to climatic and anthropogenic stressors.However, these mechanistic models are not yet established in the Kimberley coast.Therefore, the data-driven empirical approach of ANN analysis would be highly useful, considering its strong capability to unravel the complex relationships between ecological variables.
In this study, we explored the potential for novel applications of machine learning techniques (ANN) to improve our understanding of the relationship between climatic and anthropogenic disturbances and long-term phytoplankton changes.Firstly, we analyzed four biomarker proxies (TEX 86 H index, long-chain n-alkanes, brassicasterol, and dinosterol) in the sediment core obtained from Cygnet Bay by Liu et al. (2016) to reconstruct the records of Sea Surface Temperature (SST), rainfall, and the diatom and dinoflagellate biomasses, respectively.The applications of these biomarker proxies in coastal waters of north-western Australia have been validated by previous studies (Burns et al., 2003;Yuan et al., 2018).Then, an ANN model was built with the paleoecological data and collected historical archives.This ANN model was carefully validated to ensure that it could provide accurate results without biases or overfitting.We demonstrate the use of the ANN model to quantify the relative importance of pearl oyster farming, SST and rainfall for diatom and dinoflagellate changes during the past century.

Core Information
The core used in this study was from Liu et al. (2016) collected by SCUBA divers in 2011.Liu et al. (2016) chose two sites, with one site under the direct influence of pearl oyster farming and the other a reference site 1.5 km away from pearl oyster farming and aimed to understand the impact of pearl oyster farming on the sediment quality in Cygnet Bay.Chronology and geochemical parameters including total organic matter, carbon and nitrogen isotopes, and C/N were analyzed in both sites (Liu et al., 2016).A subsequent study by Yuan et al. (2018) chose the reference site and analyzed biomarker proxies including TEX 86 , long-chain

Biomarker Analysis
The core was sectioned at 1 cm intervals.Then, the subsamples were extracted with dichloromethane/MeOH, and the neutral lipids were extracted with hexane and separated into two fractions using silica gel chromatography.The nonpolar lipid fraction was eluted with hexane, and the polar lipid fraction was eluted with dichloromethane/methanol.Subsequently, the polar fraction was divided into two parts, one derivatized using N, O-bis(trimethylsilyl)-trifluoroacetamide (BSTFA) and the other filtered by PTFE membrane (0.45 μm).The long-chain n-alkanes, brassicasterol, and dinosterol were quantified by GC (Agilent 6890N) with an FID detector.Glycerol Dialkyl Glycerol Tetraethers (GDGTs) analysis was performed by HPLC-MS (Agilent 1200/Waters Micromass-Quattro UltimaTM Pt) with APCI probe and Prevail Cyano Column.
., ., Eq. ( 2) where H stood for high temperature regions, the numbers 1-3 indicated the number of cyclopentane rings in GDGTs and Cren' was the regioisomer of crenarchaeol.
The long-chain n-alkanes (C 27 + C 29 + C 31 ), specific to higher land plants, were widely used to access the impact of terrestrial input in marine sediments in terms of changes in rainfall, river discharge or dust input (Eglinton & Hamilton, 1967;Seki et al., 2003).A previous paper in Cygnet Bay have compared the downcore variations of long-chain n-alkanes with rainfall as well as the river discharge, and they found very similar increasing patterns (Yuan et al., 2018).The increasing trend of long-chain n-alkane in Core CB1 also matched well with the instrumental rainfall observations (Figure S1).Thus, we used long-chain n-alkanes to represent rainfall (terrestrial influence) in this study.

Data Source and Statistical Analysis
The practice of the pearl industry in Western Australia has typically involved two steps: first, wild pearl oysters Pinctada maxima with appropriate sizes were collected by fishing vessels in shallow coastal waters; second, the collected pearl oysters were cultured to produce pearls in pearl farms.The pearl production and pearl farm size were closely related to the annual caught numbers of wild pearl oyster shells (Fletcher et al., 2006) (Rodionov, 2004;Rodionov and Overland, 2005) was written in VBA for Microsoft Excel to detect the timing and magnitude of the shift changes in each record.The cutoff length (l) was set to 10 with a significant level of 5%.

Artificial Neural Network (ANN) Model
A typical ANN's structure consists of 3 interconnected layers: an input layer, one or more hidden layers, and an output layer, with each layer containing one or more neurons (Supplementary Figure S2).The neurons are connected with the next layer neurons with adjustable weights and each connection represents one computation.During training, the weights are adjusted over each iteration towards values that can minimize the errors between calculated output and actual output.This procedure is repeated until the errors become small enough.A more detailed description can be found in Coutinho et al. (2019) and Krogh and Anders (2008).
We carried out ANN analyses in the R environment version 4.1.1(R Core Team, 2021).The 'nnet' function from package 'nnet' (Ripley et al., 2016) was used to train networks.The 'garson' and 'olden' functions from package 'NeuralNetTools' (Beck, 2018) were used for importance analysis.The 'lek profile' function from package 'NeuralNetTools' was used for sensitivity analysis.A three-layer ANN was built with three input neurons (pearl oyster farming, SST and rainfall), one hidden neuron and two output neurons (diatom and dinoflagellate abundances) (Supplementary Figure S2).We chose pearl oyster farming, SST and rainfall as the input variables in the ANN model because they were explicitly suggested as the main drivers for phytoplankton changes in this region (Liu et al., 2016;Yuan et al., 2018;Yuan et al., 2020).The pearl oyster farming activities were assumed zero before it was established in 1960.All input and output data during the period from 1910 to 2011 were normalized by scaling it into the [0, 1] interval and randomly divided into the training and validation samples prior to training.ANN performance was evaluated by calculating the Pearson Correlation Coefficients for training (PCCt) and validation (PCCv) samples and the Root Mean Square Errors for training (RMSEt) and validation samples (RMSEv).High PCC (>0.5) and low RMSE values (<1) for both training and validation samples indicated that networks were considered valid.To achieve the best performance of ANN networks without overfitting, these networks were carefully validated following Coutinho et al. (2019).Briefly, ANN was tested with different numbers of hidden neurons (from 1 to 10), training iterations (from 2 to 1024) and percentage of samples used for training (from 5% to 95%).One hundred networks were trained for each combination of parameters.We found no evidence that the changing hidden neurons and percentage of samples used for training will influence the network performance (Supplementary Figures S3, S4).The networks with training iterations above 32 produced comparable PCCt and RMSEt as well as PCCv and RMSEv values (Supplementary Figure S5).Thus, we built the final ANN using the parameters as follows: hidden neuron was set to 1; samples were randomly and evenly divided into training and validation sets; maximum number of iterations was set to 1000; reltol was set to 0.01; and weight decay was set to 0.001.Then, we proceeded to put all data (n = 91) for training and selected the network with the lowest RMSEt as the best network.Finally, the relative importance of the input variables was estimated in the best network based on the 'Olden' method (Olden et al., 2004).This approach calculated the summed product of connection weights between each input-hidden neuron connection and hidden-output neuron connection.The positive values indicated that the input variable had a positive association with the output variable, and the negative values indicated the input variable had a negative association with the output variable.A sensitivity test was also performed in the best network to understand how each input variable individually influenced the output variables.The output variables were modelled with changing one input variable at a time, while all the other input variables were held constant at their 10th, 30th, 50th, 70th and 90th percentile values (Supplementary Figure S6).

Temporal Patterns of Biomarker Proxies
The brassicasterol contents, representing diatom biomass in this study, varied from 7.7 to 57.7 μg g -1 TOC.They remained steady until the late-1970s and then showed two increasing shifts in 1978 and 1995 according to the STARS analysis (RSI: 0.6 and 1.3; Figure 2A).The dinosterol contents, representing dinoflagellate biomass, varied from 15.8 to 85.0 μg g -1 TOC.They displayed a gradual increasing pattern with three increasing shifts in 1935, 1978and 1998 (RSI: 0.4 (RSI: 0.4, 0.2 and 0.2; Figure 2B).
A previous study validated the application of TEX 86 H index for SST reconstruction in Cygnet Bay (Yuan et al., 2018).In Core CB1, TEX 86 H temperature ranged from 25.3 to 29.4°C and displayed three different periods (Figure 2C): a low temperature period of 1900-1957 (average of 26.5°C), a warming period of 1957-1986 with three increasing shifts (RSI: 1.1, 1.0 and 0.2), and a high temperature period of 1986-2011 (average of 28.5°C).The contents of long-chain n-alkanes ranged from 36.5 to 135.5 ng g -1 .They showed an increasing trend over time with three increasing shifts in 1961, 1983 and 2003 (RSI: 0.5, 0.9 and 1.5; Figure 2D).The historical archives showed that the pearl oyster farming industry in Cygnet Bay was first started in 1960 and intensified in the 1980s, which can be reflected in the total pearl oyster catch in the Broome area (Figure 2E).
Pearson correlation analysis showed that SST, rainfall and pearl oyster farming were significantly correlated with diatom and dinoflagellate biomasses (Table 1).Due to their close associations, PCA analysis failed to further distinguish the difference between each environmental factors and phytoplankton biomasses.Only one meaningful principal component, which accounted for 77.5% of the total variance, was extracted in the dataset containing the four biomarker proxies and pearl oyster farming history.

Relative Importance of Environmental Variable
The ANN model was successful in predicting the responses of diatoms and dinoflagellates.The best network displayed a PCC value of 0.86 and RMSE value of 0.11 for diatoms, and a PCC value of 0.75 and RMSE value of 0.15 for dinoflagellates.For both diatoms and dinoflagellates, pearl oyster farming consistently ranked as the most important factor, followed by SST, with rainfall the least important (Figures 3B, C).Pearl oyster farming was positively associated with diatoms and dinoflagellates with a relative importance of 54% for diatoms and 74% for dinoflagellates (Figure 3).SST had a positive association for diatoms with a relative importance of 25%, while had a negative association for dinoflagellates with a relative importance of 19% (Figure 3).Rainfall had a positive association with diatoms with a relative importance of 21% but was not as relevant with dinoflagellates (Figure 3).

Environmental Factors Influence Phytoplankton
In this study, diatom and dinoflagellate biomasses show significant increases in the 1970s and the 1990s (Figures 2A,  B).The main drivers modulating phytoplankton can be approximately evaluated by linear statistical methods.Both Pearson correlation analysis and PCA analysis suggest that SST, rainfall and pearl oyster farming could contribute to the phytoplankton increases, but their separate effects cannot be distinguished.The 'Olden' method in the ANN enables visualization of the relative importance of each input variable and generates a number of new observations (Figure 3).The estimated relative importance shows that pearl oyster farming overwhelms SST and rainfall controls on diatom and dinoflagellate increases, which is consistent with current understanding.According to the total organic carbon and nitrogen records from the same site, there is a significant increase since the pearl oyster farming established in the 1960s; the δ 13 C, C/N and δ 15 N records indicate that the increased organic matter and nitrogen are mainly contributed by autochthonous sources, suggesting a more significant influence from pearl oyster farming than terrestrial input (Liu et al., 2016).A direct comparison of diatom records (biogenic silica) inside and outside the farming area also indicates that the impact of pearl oyster farming on diatom abundance is much greater than that of climate change (Liu et al., 2016).These lines of evidence are in agreement supporting that pearl oyster farming plays a dominant role in phytoplankton increases in Cygnet Bay.
Bivalve farming affects phytoplankton primarily through two distinct mechanisms (Forrest et al., 2009): bottom-up control of stimulating phytoplankton growth caused by nutrient remineralization in pseudofeces and feces (Sarà et al., 2011;Gallardi, 2014;Liang et al., 2019) and top-down control of depleting phytoplankton caused by bivalve grazing (Petersen et al., 2008;Jiang et al., 2019).The effect of bivalve grazing on phytoplankton may be most significant with a high stocking density of bivalve farming (Smaal et al., 2013) when decreasing bivalve biomass reaches a point at which the bottom-up effect begins to increase phytoplankton production.This process has been demonstrated in mesocosm (Prins et al., 1995) and field studies (Trottet et al., 2008;Sarà et al., 2011), both reveal enhanced phytoplankton abundances at low stocking density.The stocking density of pearl oysters in Cygnet Bay and other parts of north-western Australia is a very small size (<16 250 shells per square nautical mile) compared with other bivalve farming (Jelbart et al., 2011), hence, it can create a mutually beneficial environment for both phytoplankton and bivalve.The modern long-line culture in Cygnet Bay since the 1980s uses long line cages to suspend pearl oysters in waters.This culture method can reduce current speed significantly (Plew  et al., 2005), which induces nutrient accumulation within the farming area and could also favor phytoplankton growth.Temperature and rainfall are important determinants, but their influence markedly varies between diatoms and dinoflagellates (Figure 3A).Tropical phytoplankton used to have their optimum temperatures close to annual mean temperatures (Thomas et al., 2012).A small amount of warming can therefore easily exceed their optimum temperatures and lead to sharp declines in growth rate (Thomas et al., 2012).However, culture experiments demonstrate that evolutionary thermal adaptation can help phytoplankton to avoid the negative effect of warming, but this ability differs across phytoplankton species (Padfield et al., 2016;Thomas et al., 2016;Jin and Agustí, 2018).Several diatoms, particularly tropical species, can adapt to the warming environment rapidly by increasing their optimal temperature and maximum growth rate (Jin and Agustí, 2018;Schaum et al., 2018).Thus, the different adaptation capacities of diatoms and dinoflagellates could result in their opposite response to warming SST in Cygnet Bay.In common with previous studies (Yuan et al., 2018;Yuan et al., 2020), the ANN results also indicate that diatoms obtain greater benefits from increasing rainfall than dinoflagellates.This is because increasing rainfall not only brings rich silicate nutrients through river discharge but also creates turbulent conditions, both are better for diatom growth (Margalef, 1978;Irwin et al., 2012).

ANN Model for Palaeoecological Research
The application of palaeoecological data in ANN modeling allows us to combine the strength of both approaches.The palaeoecological approach is uniquely valuable in acquiring ecological data and assessing the impacts of environmental changes at supra-decadal timescales.As the raw data accumulate, the use of statistical models to advance our understanding of ecological dynamics is becoming increasingly important.Several conventional statistical approaches, including multiple linear regression (MLR) (Liu et al., 2022), principal component analysis (PCA) (Yuan et al., 2020), canonical correspondence analysis (CCA) (Liu et al., 2018) have been widely applied for a long time in palaeoecological studies.However, increasing evidence indicates that many ecological issues possess the characteristics of nonlinearity, time variation, and multiple forcing mechanisms Multiple Factors Driving Phytoplankton Shift (Andersen et al., 2009), and these conventional approaches to data analysis have presented limitations in revealing inner relationships and providing results prediction.The ANN methodology is capable to handle big data and complex nonlinear relationships; thus, it can make up for the inadequate understanding of mechanisms by using palaeoecological and conventional statistical approaches.The present study illustrates the feasibility and usefulness of neural networks in the field of palaeoecology; and it may prove promising to be applied to similar palaeoecological datasets from other ecosystems.Despite the clear advantages, limitations of our ANN model still exist.For example, both sediment records and collected historical archives (pearl oyster catch numbers) are used as input data in our ANN model.There are asynchronous errors between the two different datasets, which could have hampered the ANN performance.Finding a pearl farming indicator in the sediment would be helpful to reduce this error.The model also did not incorporate some factors that likely influence phytoplankton dynamics such as nutrient structure (the ratio of macronutrients) (Burford et al., 2012;Armbrecht et al., 2015), tidal mixing and light variability (Mclaughlin et al., 2019;Mclaughlin et al., 2020).These factors are regarded as important drivers determining phytoplankton abundance in the modern Kimberley coast, but lack for an understanding of their historical changes as well as their impacts on phytoplankton in the past.In addition, the ANN model cannot be used to predict the values outside the range of training data.The model presented in this example is trained based on the palaeoecological data during the past century and is unable to predict how future phytoplankton under higher sea temperatures will change.Thus, the training dataset in the ANN model should cover as wide a range as possible to improve the predictable capabilities.

CONCLUSIONS
We have shown the capacity to use an ANN model coupled with paleoecological datasets to improve our understanding of anthropogenic and climatic factors driving long-term phytoplankton changes.The results in Cygnet Bay highlight that pearl oyster farming plays a dominant role for both diatoms and dinoflagellates increases, while climatic factors exert different influences between diatoms and dinoflagellates determined by their ecophysiological traits.Our ANN analysis may prove promising in solving complex paleoecological issues.Possible elaborations of this approach in future work will introduce more related environmental factors and incorporate more paleoecological data.

FIGURE 1 |
FIGURE 1 | A schematic map showing the sampling sites * Core CB1) along the Kimberley coastline in North-Western Australia.

FIGURE 2 |
FIGURE 2 | Profiles of TOC normalized brassicasterol and dinosterol contents (A, B), TEX 86 H temperature (C), long chain n-alkane contents (D), and total pearl oyster catch (E).The solid lines show shift trends assessed by sequential t test analysis of regime shift and numbers are regime shift index.Significant level is set to p< 0.05.The dashed line in total pearl oyster catch is calculated by assuming an average weight of 1 kg for each pearl oyster.

FIGURE 3 |
FIGURE 3 | Bar (A) and pie plots (B, C) representing the relative importance of environmental factors for diatoms and dinoflagellates estimated from ANN weights through the 'Olden' method.The pie plots are calculated by the absolute values of the relative importance of environmental factors.
Liu et al., 2016)y, who own and manage Cygnet Bay Pearls, started farming in 1960 at low density; the farm size and stock level expanded rapidly since modern long-line culture established in the 1980s (www.cygnetbaypearlfarm. com.au/; also seeLiu et al., 2016).These changes show an excellent coincidence with the total pearl oyster catch in the Broome area from 1960 to 2011 (Figure2E; data obtained from the Department of Primary Industries and Regional Development; www.fish.wa.gov.au).Despite being a quota managed fishery, generally, pearl oyster catch is regarded as a good indicator the scale of pearl farming activity despite some fluctuations in demand for wild shell driven by hatchery produced shell supplementing wild caught shell from the mid-1990s to 2006 (when disease affected hatchery production) and the 2008 global financial crisis affecting demand for pearls (James Brown, Cygnet Bay Pearls, personal communication).Thus, we used pearl oyster catch numbers in the Broome area to represent the farming intensity in Cygnet Bay.The data from 1960 to 1978 was collected with catch weight (tonne).To keep consistency, we calculated the catch number for this period by assuming an average weight of 1 kg per pearl oyster.Interpolation analysis used Origin 8.0 software (mathematics: interpolate) to reduce asynchronous errors between sediment records and collected historical archives.Pearson correlation analysis and principle component analysis (PCA) used SPSS 16 (Statistical Package for the Social Sciences Inc.) to detect linear correlations between phytoplankton biomass and environmental variables.The sequential t test analysis of regime shifts (STARS; www.BeringClimate.noaa.gov)
*Correlation is significant at p< 0.01.