Missing the Reef for the Corals: Unexpected Trends Between Coral Reef Condition and the Environment at the Ecosystem Scale

It is incontrovertible that many coral reefs are in various stages of decline and may be unable to withstand the effects of global climate change, jeopardizing vital ecosystem goods and services to hundreds of millions of people around the world. An estimated 50% of the world's corals have already been lost, and those remaining may be lost by 2030 under the “business as usual” CO2 emissions scenario. However, the foundation of these predictions is a surprisingly sparse dataset, wherein ~0.01–0.1% of the world's reef area has been quantitatively surveyed. Further, the available data comprise observations at the 1–10 m scale, which are not evenly spaced across reefs, but often clustered in areas representing focused survey effort. This impedes modeling and predicting the impact of a changing environment at the ecosystem scale. Here we highlight deficiencies in our current understanding of the relationship between coral reefs and their environments. Specifically, we conduct a meta-analysis using estimates of coral cover from a variety of local surveys, quantitatively relating reef condition to a suite of biogeophysical forcing parameters. We find that readily available public data for coral cover exhibit unexpected trends (e.g., a positive correlation between coral cover and multi-year cumulative thermal stress), contrary to prevailing scientific expectations. We illustrate a significant gap in our current understanding, and thereby prediction, of coral reefs at the ecosystem scale that can only be remedied with uniform, high-density data across vast coral reef regions, such as that from remote sensing.


INTRODUCTION
Coral reefs are distributed throughout the world's tropical oceans, directly occupying an estimated area of 250,000-600,000 km 2 [ Figure 1; (Smith, 1978;Kleypas, 1997;Spalding and Grenfell, 1997)]. These values correspond to ∼0.05-0.15% of the global ocean area, respectively, and about 5-15% of the shallow sea areas within 0-30 m depth. Coral reefs have ecological and economic importance that is disproportionately large relative to their areal extent. The global economic valuation of the direct and indirect use of coral reefs has been estimated near $10 trillion annually (Costanza et al., 2014). While the accuracy of this dollar amount is debatable, it is certain that coral reefs are important to the cultural and economic lives of hundreds of millions of people around the world, providing food for innumerable small subsistence economies, shoreline protection, superlative recreational resources, and biological storehouses for the biotechnology industry (Moberg and Folke, 1999).
Hundreds (if not thousands) of research papers have documented human impacts to reefs at the local scale. These are well summarized in several review papers and volumes (Smith and Buddemeier, 1992;Connell, 1997;Dollar and Grigg, 2004) and include destructive fishing practices, mining calcium carbonate, anchor damage, oil spills, and land-use practices leading to terrestrial sediment deposition, with subsequent resuspension that decreases water transparency vital for reef photosynthesis. The common result of all these impacts has been an ecological shift from coral-to algal-dominated reef benthic community structures. Accompanying this shift has been a similar decline in diversity of reef flora and fauna (Pandolfi et al., 2003). Compounding local scale impacts, reefs are thought to be among the first ecosystems to respond critically and dramatically to global climate change (Parry et al., 2007). The convolved influences of rising sea surface temperature and increasing ocean acidification are predicted to exacerbate and entrench a global phase-shift in reef benthic community structure away from coral-rich to coral-poor (Smith and Buddemeier, 1992;Hoegh-Guldberg et al., 2007, 2017Pandolfi et al., 2011;Gattuso et al., 2015).
The concern for reef futures-and indeed documented ongoing coral loss at several reefs-has motivated assessment and monitoring efforts around the world. Local and regional surveys have disparate objectives and often utilize different methods, but they invariably share a common metric for reef status: benthic cover, or the relative abundances of corals, various algae, and sediment, which are fundamental reef benthic types (Done, 1992(Done, , 1995Connell, 1997). It is coral cover, in particular, that occupies most focus among those assessing and predicting reef trends (e.g., Gatusso et al., 2014b).
The typical assessment approach is to make small-scale observations (10s of m) of benthic cover at multiple, wellseparated (100s of m to km) sites using three common methods: (1) site-specific photo-quadrats or photo-mosaics that detail species composition at 1-10 m scales; (2) tape-measure or video transects that quantitatively describe community structure at 10-100 m scales; (3) manta-tows (pulling an observer behind a boat) that provide semiquantitative descriptions of the reef community at 100+ m scales. The different methods produce data with varying accuracies (Jokiel et al., 2015). The first two produce statistically rigorous results, with typical accuracies (95% confidence interval) of ±10% cover. The manta-tow is highly subjective, relying on individual diver estimates of coral cover into broad categories (e.g., coral = 0, 1-10, 11-30, 31-50, 51-75, 76-100%) at 2-5 min intervals (Miller and Müller, 1999). Boat and underwater logistics for these surveys limit total global surveyed reef area to 10-100s of km 2 , or on the order of 0.01-0.1% of the world's reef area.
Our current understanding of integrated global reef condition derives from syntheses of these local studies (Connell, 1997;Pandolfi et al., 2003;Wilkinson, 2008;Moritz et al., 2018). These reports provide the best available estimates for current reef condition, as well as predictions for trajectories of future reef condition. Ultimately, because these reports are based on disparate and sparse data sources of variable quality, it is not possible to determine the accuracies of their assessments. Importantly, to the best of our knowledge, there has been no attempt to investigate patterns of global reef condition in relation to local and global influences. While many environmental factors have been invoked to explain loss of coral for specific reefs, there are no generalizable models. In this study, we employ a variation of space-for-time substitution (Pickett, 1989;Blois et al., 2013;Damgaard, 2019) to explore the relationships between coral cover and the biogeophysical forcings that are generally accepted as impacting reefs. We assume that reefs globally have similar functions and similar responses to forcing factors (an assumption that is implicit in the aforementioned global syntheses). Thus, coral cover is an emergent property that varies from reef to reef due to differences in the forcing factors. Using only existing data sets, we investigate coral cover response to individual factors and to the combined suite of all factors.

METHODS
Geospatial processing was performed using QGIS and MATLAB R . All statistical analyses were performed using MATLAB R .

Local Survey Analysis Approach
Coral cover estimates used in the analysis are available from the Coral Reef Information System and the Australian Institute of Marine Science (see Data Availability Statement). Gathering and collating local surveys from 1999 to 2017 was a non-trivial task. Few data are readily available online, and those that are available exist in a wide variety of formats. Each original data source used different observation methods, with corresponding differences in accuracy and precision. Available surveys from quantitative methods (e.g., photoquadrat/point-counts and line transects) were organized to include geographic coordinates, survey year, and estimates of coral cover. Some datasets required calculation of benthic cover from raw data (i.e., photoquadrat point counts, line-point transect counts, and line transects). Observations shallower than 30 m were retained, because that is the depth threshold typically invoked to separate shallow reefs from deeper, mesophotic reefs (e.g., Gould et al., 2021). The most recent year's observation was retained when multitemporal observations existed for a given geographic site. All quality-controlled data from quantitative survey methods were ingested in GIS. Geographic coordinates were converted to a global cylindrical equal area projection. In all, there were 32,887 individual reef survey data points. Data for 3,289 sites were derived from line-point-intercept transects, and the remainder were from point-counts of underwater photographs. Table 1 lists the number of source data points from each year. None of the sources provided information about data accuracy.
Data were binned to 1 km × 1 km grids to approximate the reef ecosystem scale by (1) finding all data points in a grid cell, (2) identifying the most common (mode) year of observation, and (3) averaging only data from the mode (most commonly observed) year and the year prior. For example, if 2014 was the most represented year in a cell, then data were averaged across 2013-2014. If the data in a cell were multimodal or amodal, data from later years were retained preferentially to earlier years. The  total dataset included 3,826 grid cells. Table 1 lists the number of final data points from each year considered in the study. The selection of a 1-km cell size balances upscaling from the surveyed communities to the ecosystem level, while moderately downscaling the oceanic data products (see below).

Biogeophysical Forcings Selection
Natural, balanced coral reefs are mosaics of coral, algae, and sand. Together, these benthic types drive the structure and function of the ecosystem (Stoddart, 1969;Kinsey, 1985). When corals die, algae rapidly colonize their skeletons. Healthy reefs usually increase coral coverage during recovery from stress, with coral often returning to the pre-disturbance level (Connell, 1997). In contrast, algae and rubble gradually dominate degraded reefs, and there is little or no recovery of coral. Because corals are largely responsible for reef construction and maintenance, their environmental limits determine the distribution and condition of reefs (Kleypas et al., 1999). The abiotic parameters most affecting the distributions of coral are sea surface temperature (specifically the accumulation of thermal stress), light (photosynthetically available radiation, PAR), salinity, carbonate saturation state (specifically for aragonite, arag ), and wave/water motion (Smith and Buddemeier, 1992;Kleypas, 1997;Kleypas et al., 1999;Hoegh-Guldberg et al., 2007;Pandolfi et al., 2011).
The ocean absorbs about 30% of human CO 2 emissions (Sabine et al., 2004;Khatiwala et al., 2009), reducing pH and thus arag . This, in turn, is hypothesized to negatively affect the rate of calcification of reef-building corals (Gatusso et al., 2014a). Similarly, the ocean also absorbs about 90% of excess heat trapped by greenhouse gases (Levitus et al., 2012), increasing ocean temperatures. Coral bleaching is a stress response that is famously dependent upon a given coral species' limits with respect to intensity and duration of temperature excursions above local summer climatological maxima. Accumulated thermal stress has been used to estimate potential areas of coral bleaching and subsequent mortality (bleaching alert area, or BAA; . Wave action influences the zonation, morphology, and distribution of reef corals, and is a driving force for sediment and nutrient transport, coastal dynamics, and geomorphology (Stoddart, 1969;Grigg, 1998). The general observed pattern is that greater wave action, represented by significant wave height (SWH), negatively influences coral cover through sediment scouring or direct mechanical removal (Dollar and Tribble, 1993;Grigg, 1998).
Light, specifically photosynthetically available radiation (PAR), is critical for the fundamental reef processes of photosynthesis and calcification, which is correlated with photosynthesis (Gladfelter, 1985;Kinsey, 1985;Yentsch et al., 2002). Factors that influence PAR include cloud cover and water turbidity. The former limits delivery to the ocean. The latter arises largely through resuspension of sediments by wave action, but also through the second order effect of anthropogenic activities, such as land-use change (e.g., deforestation, agriculture, dredging, coastal development) and marine pollution. There is evidence for a positive correlation between coral cover and PAR (Yentsch et al., 2002).
Enhanced nutrient loading (marine pollution) stimulates planktonic productivity, reducing the light that reaches the benthos for coral utilization (Smith et al., 1981). Human-sourced nutrient loading is not directly a major factor in the relative balance between coral and algae in most cases, though it can have an impact under specific conditions (Szmant, 2002;Atkinson, 2011).
Herbivorous fish regulate algal growth and are important to maintain coral reef ecosystem processes (Ogden and Lobel, 1978). However, reef ecosystems are experiencing increased fishing pressures (overfishing), as well as destructive fishing practices, in association with a growing population densities (Edwards et al., 2014). Reduced regulation of algae is expected to negatively impact coral populations.

Biogeophysical Forcings Analysis Approach
Biogeophysical forcing data used in the analysis include • 5 km, daily BAA data from NOAA Coral Reef Watch; • 3-hourly SWH data from Wavewatch III hindcast models run by CSIRO (Durrant et al., 2019) and NOAA; • 9 km, monthly SeaWiFS and 4 km, monthly Aqua-MODIS PAR data from the NASA Ocean Biology Distributed Active Archive Center; • Coral species richness as the number of species ranges that overlapped a given grid cell. Species ranges were taken from the IUCN Red List of Threatened Species (IUCN, 2014); • arag from literature (Jiang et al., 2015); and • Coastal development threat, marine pollution threat, overfishing threat, sedimentation threat, and integrated local threat data from the World Resources Institute, Reefs at Risk Revisited (Burke et al., 2011).
BAA, PAR, and arag were interpolated by natural neighbor to the 1 km grid of the coral cover dataset. SWH was interpolated by fitting thin-plate spline surfaces to the points in the neighborhood surrounding each coral cover grid cell. Coral species richness and local threats were interpolated by nearest neighbor. BAA, PAR, and SWH were compiled over multiyear periods (i.e., 1995-2017 for BAA, 1997-2017 for PAR and SWH). For BAA and PAR, the mean was calculated at each grid cell using all data from the cell year to 10 years prior (or as far back as the dataset permitted). For BAA, the 10-year maximum value and 10year linear trend were calculated for each 1 km × 1 km grid cell. For SWH, the occurrences of wave heights >3 times the median absolute deviation from the median were tallied as SWH "events." Coral cover was evaluated individually against each biogeophysical forcing parameter by calculating the Spearman rank correlation coefficient. The parameters' convolved influences were explored via construction of multivariate regression models, with model accuracy determined by comparing predicted against actual coral cover. Each model utilized ten-fold cross-validation. Among several models considered, including both parametric and non-parametric, ensemble regression trees provided the best predictive ability. The model was further refined via Bayesian optimization of hyperparameters (MATLAB R Regression Learner App), including ensemble method (bagged), minimum leaf size (7), number of learners (365), and number of predictors to sample (11).
The relative importance of each predictor in the final model was estimated by first finding the mean squared error (MSE) for each predictor for each node in the tree, which was calculated as the error of the node weighted by the probability of the node. Next, the importance of a predictor for each split was calculated as the difference between the MSE of the parent node and the total MSE of the two child nodes. Finally, the relative importance of the predictor was calculated by summing MSE across all splits at each branch node in the model. Partial dependence plots were generated, following Friedman (2001), to illustrate the average relationship between predictor variables and predicted responses. To further visualize the relationships between predictor variables and predicted responses, individual conditional expectation plots were created for each disaggregated observation, following Goldstein et al. (2015).

RESULTS
Synthesis of readily and publicly available coral cover data revealed sparseness of surveys across regions and within reefs. Depending on the estimate used, these 3,826 reef cells represent 0.64% (of 600,000 km 2 ) to 1.53% (of 250,000 km 2 ) of the world's reef area. (Note the areas actually surveyed are far smaller.) The clear majority (86.8%) of binned 1 km × 1 km cells had fewer than 10 reported observations (Figure 2).
Observed trends in coral cover were as expected with respect to 10-year maximum BAA (Figure 3B, see number at top-right of panel for Spearman rank correlation coefficient), 10-year BAA trend (Figure 3C), number of SWH events (Figure 3E), number of coral species (Figure 3G), and local marine pollution threat ( Figure 3J). However, the remaining individual correlations of coral cover with environmental factors known to impact reef distribution and condition did not exhibit expected trends (Figure 3). We found no significant trend with respect to arag (Figure 3D), and a decreasing trend in association with mean BAA (Figure 3A). We found a negative trend of coral cover in association with light availability (Figure 3F). With the exception FIGURE 2 | Spatially sparse and methodologically nonuniform in-water surveys. Readily available, publicly downloadable reef survey data highlight the current data paucity paradigm within reefs and across regions. The dataset bins 32,887 scattered observations into 1 × 1 km grid cells. Colored dots indicate number of observations in grid cells (higher numbers overlay lower numbers). Inset shows histogram for observations cell −1 ; note log scale for vertical axis. Of 3,875 cells, 60.7% comprise only a single observation, 83.3% have ≤ 5 observations, only 9.1% have ≥ 20 observations. Some observations presented as single values are actually averages of several underlying data points (e.g., Australian Institute of Marine Science data for the Great Barrier Reef), while other observations require computation of benthic cover from raw counts (e.g., most NOAA data for Pacific islands). This compounds the fundamental issue that data from different regions are acquired using different methods for different purposes. These are major hinderances for regional and global reef syntheses.
of marine pollution, we found contradictory trends for local human threats (Figures 3H,I,K,L).
Correlations were weak between coral cover and each forcing parameter, regardless of direction or statistical significance (Figure 3). Of course, the various biogeophysical forcings do not act individually, but simultaneously. However, the optimized bagged ensemble of regression trees obtained an adjusted r 2 of 0.43 (Figure 4A), which means that there was more unexplained variance in coral cover than explained by the combined set of biogeophysical forcings.
Interestingly, despite showing no correlation with coral cover on an individual basis, the most important predictor in the multivariate model was arag (Figure 4B). This was followed by 10-year mean BAA, then the group of 10-year BAA trend, PAR, SWH events, and number of coral species. The remaining variables had relatively low importances. These importances were borne out in the partial dependence and individual conditional expectation plots (Figure 4C). For example, arag showed tight clustering of predicted responses, with variation across the range of arag . Conversely, the local threat variables showed a wide spread of predicted responses, with virtually no variation across threat levels.

DISCUSSION
At the very least, some of the results are unexpected. Both 10-year mean BAA and arag show individual trends (or lack thereof) counter to prevailing scientific thought, but both are important in the context of multiple forcings. All but one of local threats have trends counter to prevailing scientific thought, and none are important in the multivariate model. Perhaps most surprisingly, coral cover is poorly predicted by all of the biogeophysical forcings, both individually and collectively. Taken at face value, these results would necessitate a reevaluation of our understanding of the relationship between coral reefs and their environment.
Before challenging scientific wisdom built through innumerable peer-reviewed studies and across decades, it is prudent to ensure the results are wholly valid. It is possible that the underlying coral cover data are fundamentally flawed. To be sure, no error estimates are provided with any of the source data. However, those data are produced from multiple, independent research and monitoring programs, each comprising several individual observers and technicians. While errors are inherent to collated data sets of this size, it is not reasonable to expect that all, or even the bulk, of these programs generated bad data. It must be assumed that the data represent best available estimates of coral cover.
A similar issue holds for the biogeophysical forcings data. BAA and PAR are derived from satellite data products, SWH from regional wave model hindcasts, and local threats from statistical models of diverse data sets. arag and coral species richness are derived from observational data, extrapolated through spatial modeling. Thus, each of these parameters has its own associated errors, which can impact the analyses in this study. However, each parameter is supported by painstaking, peer-reviewed research, and each therefore represents a best available estimate. FIGURE 3 | Unexpected trends in the relationships between coral cover and biogeophysical forcings at the ecosystem scale. Gridded coral cover values are plotted against environmental factors, revealing trends contrary to prevailing scientific expectations. Scatterplots relate coral cover to continuous variables, with colors indicating point densities (blue is lower density, red is higher density). Boxplots show coral cover segmented by local anthropogenic threats, with first and third quartiles (blue box), median (red line), extent of data (black whiskers), and outliers (red plus symbols, determined using 1.5 × IQR rule). Numbers at top-right of all plots indicate Spearman rank correlation coefficients (ns = not statistically significant).
A clear limitation of this study is the size of the data set. Countless other coral cover data sets are sure to exist, but they are not made readily available by their owners. Their inclusion would undoubtedly increase both the density and the geographic extent of the current data set, adding confidence to the analytical results. So, a strong recommendation can be made that, in the interest of fostering information exchange and scientific advancement, data sets should be made publicly available. The ideal situation would be an online active archive center to which researchers and managers can submit data, which can then be searched, collated, and compiled by users around the world. Data should have a consistent format, and error estimates should be required. All submissions should be accompanied by complete metadata.
Despite its limitations, the present data set comprises the best available information on collocated coral cover and biogeophysical forcings. This study's true limitation is that the data are not suited to the task at hand. Space-for-time substitution may be inappropriate across two decades and two oceans of reef survey data, because reefs in different regions and times may have different responses to environmental forcings. Certainly, there is the issue that the scales of virtually all existing surveys can easily miss ecosystem-scale variability (Edmunds FIGURE 4 | Poor predictability of coral cover using combined biogeophysical forcing parameters. Of multivariate models fit to the data set, a bootstrap aggregated (bagged) ensemble regression tree provides the best predictions of coral cover. However, the model only explains 43% of the variability in coral cover, reinforcing the proposition that available data are insufficient for quantitative modeling of coral reef condition with respect to environment. (A) Model predictions vs. actual coral cover, with colors indicating point densities (blue is lower density, red is higher density). (B) Relative importance for each predictor in the multivariate model. (C) Partial dependence plots (red lines) and individual conditional expectation plots (blue lines) illustrating model prediction responses to each predictor variable. and Bruno, 1996). A few 10s of m 2 of in-water reef survey simply cannot adequately represent 1 km 2 (= 1 × 10 6 m 2 ) of reef. Ultimately, these bivariate and multivariate results highlight the problem of using sparse, disparate data to synthesize a global model of reefs. Available data utilized in this study are not sufficient to quantitatively relate ecosystem-scale coral reef condition to the surrounding environment. Non-sparse data of uniform quality are required.
Owing to logistical and cost limitations inherent to field surveys, it is not possible to collect uniform, high-density, inwater data at the ecosystem scale across reef regions, much less globally. Remote sensing offers the only viable approach (Mumby et al., 1999(Mumby et al., , 2001Hochberg, 2011;Hedley et al., 2016). However, current satellite sensors are largely inadequate for assessment of global coral reef status as they lack the necessary combined spectral and spatial capabilities to accurately quantify benthic cover (Hochberg and Atkinson, 2003;Dekker et al., 2018). Numerous case studies have demonstrated the ability for multispectral sensors with moderate or high spatial resolution (e.g., Landsat, WorldView, PlanetScope) to produce reasonably accurate maps of reef "habitats" or "biotopes, " which has led to global efforts such as the Millennium Coral Reef Mapping Project and the Allen Coral Atlas. In these efforts, habitat labels are created as qualitative descriptors comprising various combinations of substrate (e.g., sand, limestone, rubble), benthic functional type (e.g., coral, algae, seagrass), reef type (e.g., fringing, patch, barrier), and/or location within the reef system (e.g., slope, flat). Mapping of these non-standardized "habitats" has become common (reviewed in, Hedley et al., 2016). These efforts have generated good depictions of reef location and ecological zonation within reefs, but their underlying data lack the ability to spectrally discriminate between the fundamental reef benthic types-i.e., live coral from algae (Hochberg and Atkinson, 2003). Thus, their map products fail to answer one of the most fundamentally important questions to reef ecology: How much coral do the world's reefs possess?
Several satellite missions are under development that may meet the necessary requirements for measuring reef condition at the ecosystem scale (Dekker et al., 2018), such as the NASA Surface Biology and Geology (SBG) mission (National Academies of Sciences and, 2018;Cawse-Nicholson et al., 2021). SBG and others may provide useful data for reefs, at least to reasonable light penetration depths of 10-15 m (Hochberg et al., 2003). However, given that many reefs are already in decline, the information provided by these missions may come too late for application to policy and management. Airborne remote sensing offers an alternate solution. One example is the NASA Earth Venture Suborbital-2 (EVS-2) COral Reef Airborne Laboratory (CORAL) mission (https://coral.jpl.nasa.gov/, https://coral.bios. edu). During 2016-2017, CORAL used the NASA Jet Propulsion Laboratory Portal Remote Imaging SpectroMeter (PRISM, https://prism.jpl.nasa.gov/) to survey sections of the Great Barrier Reef, Main Hawaiian Islands, Mariana Islands, Palau, and Florida, ∼5,000 km 2 reef area in all (∼1% of the world's reef area). CORAL quantified benthic cover and modeled primary production and calcification at the reef scale, with the aim of improving our understanding of the relationship between reef ecosystems and their environments. Results are forthcoming after final data and analyses quality assurances.

AUTHOR CONTRIBUTIONS
EH conceived the project and led the data analysis. EH and MG collected data and wrote the manuscript. All authors contributed to the article and approved the submitted version.

FUNDING
This work was supported by funding from the National Aeronautics and Space Administration (NASA) Grant NNX16AB05G. A portion of this research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with NASA. The work of EH was carried out at the Bermuda Institute of Ocean Sciences.