Experimenting With the Past to Improve Environmental Monitoring

Long-term monitoring programs are a fundamental part of both understanding ecological systems and informing management decisions. However, there are many constraints which might prevent monitoring programs from being designed to consider statistical power, site selection, or the full costs and benefits of monitoring. Key considerations can be incorporated into the optimal design of a management program with simulations and experiments. Here, we advocate for the expanded use of a third approach: non-random resampling of previously-collected data. This approach conducts experiments with available data to understand the consequences of different monitoring approaches. We first illustrate non-random resampling in determining the optimal length and frequency of monitoring programs to assess species trends. We then apply the approach to a pair of additional case studies, from fisheries and agriculture. Non-random resampling of previously-collected data is underutilized, but has the potential to improve monitoring programs.

Long-term monitoring programs are a fundamental part of both understanding ecological systems and informing management decisions. However, there are many constraints which might prevent monitoring programs from being designed to consider statistical power, site selection, or the full costs and benefits of monitoring. Key considerations can be incorporated into the optimal design of a management program with simulations and experiments. Here, we advocate for the expanded use of a third approach: non-random resampling of previously-collected data. This approach conducts experiments with available data to understand the consequences of different monitoring approaches. We first illustrate non-random resampling in determining the optimal length and frequency of monitoring programs to assess species trends. We then apply the approach to a pair of additional case studies, from fisheries and agriculture. Non-random resampling of previously-collected data is underutilized, but has the potential to improve monitoring programs.
Keywords: statistical power, population trends, data-poor fisheries, species monitoring, resampling 1. LONG-TERM ENVIRONMENTAL MONITORING Long-term monitoring programs are an essential piece of modern ecological research and conservation science (Hughes et al., 2017). Numerous studies have demonstrated that long-term monitoring can have disproportionately large contributions in terms of advancing scientific understanding and policy (Giron-Nava et al., 2017). Environmental monitoring programs, like the USA-based Long Term Ecological Research (LTER) Network, as well as compilations of time series, like the Living Planet Index, show the scope of long-term datasets now available (Maguran et al., 2010;Foundation, 2016). Furthermore, with the advent of infrastructure that connects and stores data collected by a wide variety of professional and amateur naturalists, monitoring should continue to become more feasible and cost-effective. For example, large-scale citizen science programs, like iNaturalist (https://www.inaturalist.org/) and eBird (https://ebird.org/home), allow for increased collection of data documenting species occurrence, extent and relative population size, as well as providing platforms to support data use and reuse across a variety of applications (Sullivan et al., 2009;Joppa, 2017). Similarly, numerous new technologies, including eDNA and drones, will bring down the cost of monitoring through automation and increasing the sheer taxonomic, temporal, and spatial resolution of observations (Bohmann et al., 2014;Hodgson et al., 2018). All of these efforts will lead to increases in the number of species monitored as well as the quantity and quality of the data collected, to previously unimaginable levels.

CHALLENGES OF MONITORING PROGRAMS
Despite the recognized importance of long-term monitoring programs, key questions remain. Several authors have pointed out that long-term monitoring programs are not necessarily designed in a way to best address key questions of interest (Legg and Nagy, 2006;Nichols and Williams, 2006;Field et al., 2007;McDonald-Madden et al., 2010;Lindenmayer et al., 2012Lindenmayer et al., , 2020. For instance, suppose that a monitoring site was selected to monitor dynamics in a bird population near a university. A long-term study could certainly reveal the population trend for that specific location. However, the site may have been chosen specifically because it was at high abundance at the beginning of the study-this causes a site-selection bias (Fournier et al., 2019). Populations naturally vary, both in time and in space, so the very act of initially selecting a site to monitor with particular population attributes can potentially confound the very patterns they seek to monitor. Birds in this hypothetical population may undergo a cyclic dynamic related to resource exploitation, or rotate between different patches for nesting from year to year. Thus, when we ask new questions of long-term monitoring data, we have to think carefully about how the monitoring program was originally designed and whether or not we have adequate statistical power (Lindenmayer and Likens, 2010;White, 2019) as well as the risks of making type I vs. type II errors (Mapstone, 1995). These considerations, amongst others, are especially relevant when data from different sources are combined for comparisons-which is increasingly performed (Maguran et al., 2010;Keith et al., 2015;Giron-Nava et al., 2017;White, 2019). Lastly, the tradeoff between information gained from monitoring and the cost of monitoring has to be considered (Bennett et al., 2018).

DESIGNING MONITORING PROGRAMS
Principles from experimental design, including randomization and replication, are key components in designing any monitoring program (Seavy and Reynolds, 2007). However, many tools from experimental design are inadequate for monitoring programs. For example, by their very nature, data from monitoring programs require handling spatial and temporal auto-correlation between sampling points. In order to manage these issues, a lot of work on optimizing monitoring programs has its roots in decision science (McDonald-Madden et al., 2010). Decision theory allows one to build a structured process to decide between alternative solutions while accounting for costs and benefits (Raiffa, 1968;McDonald-Madden et al., 2010;Conroy and Peterson, 2012). In addition, decision theory allows for the incorporation of uncertainties (McCarthy, 2014). In the context of environmental monitoring, decision theory can help formalize the process of selecting sites and the specific survey design (Gerber et al., 2005;Chades et al., 2011;Tulloch et al., 2013). As an example, Hauser et al. (2006) explored how frequently a managed population of red kangaroo (Macropus rufus) in Australia should be monitored. They used a simulation approach to determine how frequently monitoring should occur given tradeoffs between the costs of monitoring and the potential insights for management. They found that an adaptive monitoring program outperformed the standard fixedinterval monitoring.
This leads to one of the most important contributions of decision theory to environmental monitoring-the value of information (Canessa et al., 2015;Maxwell et al., 2015;Bennett et al., 2018). Value of information theory explicitly accounts for the information gained from performing some action, and the costs associated with doing so, when designing a management plan. For example, Bennett et al. (2018) used information theory in the context of threatened plant management in Southern Ontario. They simulate a situation where a conservation agency is willing to pay landowners to protect part of their land. However, species occurrence is uncertain on each plot of land. Thus, it is necessary assess the value of the information gained from monitoring versus its costs. They were able to demonstrate how the information gained from monitoring increased with high levels of species detectability and low costs of monitoring.

Simulations
To address the challenges associated with designing monitoring programs, there are three classes of tools available to design and evaluate monitoring programs. First, the most commonly used approach are simulation models (Gerrodette, 1987;Rhodes and Jonzen, 2011). Using prior knowledge about the system under question, a virtual ecologist (Zurell et al., 2010) approach would use simulation models constructed to incorporate key factors that affect species dynamics. With an appropriate model, simulations can then be run for a variety of scenarios, including changing the number of samples taken per year, altering the number of sites sampled, and sampling for different lengths of time (Rhodes and Jonzen, 2011;Barry et al., 2017;Christie et al., 2019;Weiser et al., 2019;White, 2019). Simulations can also be useful in deciding which streams of data to use (Weiser et al., 2020) or the effect of changing sampling methodology during the course of a study (Southwell et al., 2019). A lot of prior work has also used simulations to better understand optimal sampling schemes for invasive species (Chades et al., 2011;Holden and Ellner, 2016). Although powerful, this approach is limited to systems in which many aspects of the biology are already known, or can be reasonably estimated.

Experimental and Comparative
Second, experiments can also be used to test the effect of different sampling protocols. As in the case of simulation models, experiments with different levels of monitoring, or different monitoring approaches, can be used. Experiments provide replication, which is important to understand the probability that monitoring is likely to achieve the desired goals given inherent variability in the system . A related approach would simply be to compare different sampling regimes across systems to evaluate which are the most successful at realizing FIGURE 1 | (A) The general process of non-random resampling of past data from left to right (i.e., sequentially, starting with data from farthest into the past) includes: dividing data into non-random subsamples (based on the question of interest), calculating metrics on those subsamples, and comparing the subsample metrics to the combined (i.e. "true metric") dataset, (B) same process as Table (a), but for specific example of examining the minimum number of years required to detect long-term population trends (White, 2019). The pair of figures on the bottom right show how the (C) average slope and the (D) probability of correcting identifying a trend change with the number of years monitored. monitoring goals. Indeed, integrated population modeling was developed as an analytical approach to identify and address data discrepancies between data taken by differing methodologies or at differing times in a species's life history (Saunders et al., 2019). This method has been applied with great success to advance understanding of the trajectories of populations of well-monitored taxa such as waterfowl (Arnold et al., 2018). However, the key disadvantage of this approach is, like simulation models, integrative modeling approaches are reliant on the availability of large amounts of data, documenting multiple facets of a species' biology. Of course, these types of experiments providing multi-faceted data are often infeasible or impossible for many systems.

Non-random Resampling
Here, we advocate for the expanded use of a third approach: non-random resampling of previously-collected monitoring data. Non-random sampling involves artificially "degrading" a complete data set into smaller data samples for comparison. This concept leverages existing information by starting with long-term monitoring data already collected for a system. The data is then subsampled, or divided, in non-random ways depending on the question of interest (Figure 1A). Then a metric (for example, a mean or a slope) is calculated for each subsample. Each subsample metric is then compared to the metric for complete data (all the data combined). The complete data acts as a "true value" for comparison. This is analogous to simulation studies where the true parameters are known (Bolker, 2008). Thus, the assumption that the complete data set can serve as a "true" comparison is critical. Non-random sampling differs from random sampling approaches (e.g., jackknifing, bootstrapping) where random subsamples are taken, allowing estimation of various statistics. With non-random sampling, we learn about the elements of a good monitoring program by examining which subsamples of the data are most influential and the number of subsamples needed to have a high probability of detecting the true value of the metric. Bahlai et al. (2020) describes this technique from a computational viewpoint specifically for time series. Although this approach has been used previously (Grantham et al., 2008;Bennett et al., 2016;Wauchope et al., 2019;White, 2019;Bahlai et al., 2020;Cusser et al., 2020), its adoption has been largely informal and not specifically stated. This approach is best described with a simple example ( Figure 1B). White (2019) studied how many years of monitoring were required to detect population trends. For a given time series, White (2019) examined all possible subsamples of different lengths of time. For instance, a 10-year time series consists of two 9-year subsamples, three 8-year subsamples, and so forth. This is different than taking random subsamples of data as the subsamples are chosen to maintain the temporal autocorrelation. White (2019) then calculated the population trend (i.e., the slope from linear regression) for each subsample ( Figure 1B). The fraction of subsamples of a particular length, that had the same overall trend as the complete time series (i.e., the "true trend"), is the statistical power. Thus, the minimum time series required was the time series length that met a high enough threshold of statistical power. White (2019) applied this approach to 822 population time series, allowing for comparison across species and systems. Using resampling of the breeding bird survey, Wauchope et al. (2019) found that sampling for a short period, or infrequently, was adequate to determine the species trend direction, i.e., positive or negative. However, more frequent and longer monitoring was required to estimate the percent changes over time.
Monitoring programs need to be designed to both adequately address a question of interest and to be cost effective (Grantham et al., 2008;Rout et al., 2014;Maxwell et al., 2015;Bennett et al., 2018). In this context, it is essential to study the trade-off between the information gained from monitoring and the cost. For example, Bennett et al. (2016) used resampling approaches to study monitoring requirements for diatoms in lake samples. They found that in several cases much lower levels of sampling, in terms of the number of lakes sampled and observer effort, were required to ensure accuracy. This translated into potentially millions of dollars in savings (Bennett et al., 2016). Bruel and White (2020) used a similar approach to show how lake soil core sampling could be optimized to ensure accuracy in detecting ecosystem shifts while also reducing costs.  (Keller et al., 2017;Stock et al., 2019). (B) Parameter estimations for linear regression of three subsamples of data: deep trawls, shallow trawls, and all combined data (Note: the number of records was kept consistent for the three groups). (C) Estimate for the effect of being in a rockfish conservation area (inRCA) on catch for different amounts of data included. The horizontal, dashed line is the "true" estimate which is the estimate when 100% of the data is included.
Frontiers in Ecology and Evolution | www.frontiersin.org Non-random resampling of past data also helps address some common issues with other approaches to designing monitoring programs. For instance, mechanistic simulation models require at least some knowledge of the basic species biology in order to construct the model (Zurell et al., 2010). In addition, although possible with simulations, resampling approaches already explicitly account for the inherent temporal and spatial autocorrelation of monitoring data. Resampling approaches are also particularly useful in situations where experiments are expensive or impractical. Lastly, resampling approaches are also quick and easy to implement. Along with these advantages, resampling of past data has two major limitations. First, monitoring data already has to be available for the system of interest or a related one. And second, the previously-collected data needs to be a good representation of the system dynamics. This is due to the full data acting as the "true representation" of this system. Similarly, if resampling approaches are used for one system to learn about another, the original system has to be a good representation of the latter in terms of the general system dynamics. Nonetheless, long-term monitoring programs are always under pressure from logistical and financial constraints. These resampling methods can be a useful tool for researchers and managers to refine and update programs as funding changes and still achieve their research goals.

Fisheries Management
Non-random resampling of past data can be applied to a variety of contexts beyond estimating long-term population trends. For example, to study data-poor fisheries (Dowling et al., 2015, Table  1 of Chrysafi and Kuparinen, 2015), past work has primarily used TABLE 1 | Example questions that could be addressed using non-random resampling.

Question
Non-random resampling approach How many test water wells should be drilled to understand subsurface water flow?
We would start with an example system where a large number of test wells produced accurate dynamics. Then, we artificially degrade this data using less test wells. Last, we would examine when the predicted dynamics change as a result of less test wells.
What is the effect of not being able to identify microorganisms to the species level?
We first select data from a well-resolved tree that does identify organisms to the species level. Then, we artificially degrade the data in a way where we pretend a tree is only resolved to the genus or family level. We could then study the effect of not identifying organisms to the species level.
What is the effect of scuba diving depth limitations on estimates of biological diversity?
We would use high-quality diver survey data that was collected along a gradient of depths. We would then artificially degrade the data by removing deeper dives. We could then compare the diversity metrics when all the data is included versus only shallow dives. simulation models. To study data-poor fisheries using random and non-random resampling, one should instead study data-rich fisheries. The goal would be to artificially degrade the data-rich examples until the point that the fishery would be considered data-poor (Figure 2). We can then see how various methods of data-poor fisheries perform given that we have the full data set to act as a "true" comparison. As an example, we took data on darkblotched rockfish (Sebastes crameri) from the U.S. West Coast Groundfish Bottom Trawl Survey data (Keller et al., 2017;Stock et al., 2019). We then conducted two "experiments" with the data. First, suppose we only had access to shallow or deep data because of technology limitations. We can test the effect of these data limitations by non-randomly resampling the data depending on whether the samples came from deep or shallow trawls. We show that regression estimates of parameters differs based on which depths were included in Figure 2B. This contrasts with random resampling of the data. Suppose instead, we examine the effect of degrading the data to only a fraction of the totals records we have available. This is actually random resampling as we are taking actual random samples from the time series. We see that model estimates for the effect of being within a rockfish conservation area are not accurate until a large fraction of the original data is included (Figure 2C).

Agricultural Practices
Agricultural management recommendations are often based on conclusions from short to medium-term field trials (ca. 1-5 years), and it is common to observe contradictory findings between trials. When multiple factors are considered, such as crop water use, greenhouse gas emissions and relative profitability of practices, responses may vary dramatically and unequally to short-term environmental variation. Cusser et al. (2020) applied a non-random sequential resampling algorithm (Bahlai, 2019) to long-term data examining the effects of tillage practices on productivity, sustainability attributes, and return-on investment. They found that, because of high natural variability in the system, 15 years of data of data were required to observe the "true" pattern of difference between treatments in soil water availability and crop yield, and that more than a third of the sampled sequences shorter than 10 years led to outright misleading results. Furthermore, they were unable to detect consistent treatment differences in nitrus oxide emissions, although non-random resampling indicated that spurious trends could be observed in observation periods as long as 9 years. Finally, although profitability of adopting a new tillage practice was highly variable in initial years after adoption, by 10 years after, 86% of resampled windows indicated a net financial gain associated with the change. Whereas, it is unlikely that practitioners making management decisions can consistently rely on a decade of data to guide them, the results of the non-random sequential resampling of long-term data provide guidance on reconciling apparently differing trends between trials. Furthermore, non-random resampling gives land managers insight into the likelihood and duration of a particular management outcome in a variable environment.

CONCLUSIONS
Data from long-term monitoring programs are used in assessing trends in environmental observations, understanding system dynamics, and making management decisions. It is critical that these monitoring programs be designed in order to address our questions of interest (Field et al., 2007;McDonald-Madden et al., 2010;Lindenmayer et al., 2020). This is particularly relevant when new questions are asked of monitoring data or data from disparate monitoring programs are combined. We show that non-random resampling of past monitoring programs can be used to understand sampling requirements and the consequences of bias (Figures 1, 2). This approach can be applied to a variety of systems and questions ( Table 1) beyond environmental monitoring. In addition to simulations and experimental approaches, we argue that non-random resampling of past data should be used more widely to study questions related to sampling design. Combined with information on the cost of monitoring, this approach also helps identify when ecological monitoring is a good investment or when it may be a waste of effort that does not answer to a program's aims and objectives. Further, resampling approaches can be used as part of adaptive management to refine monitoring programs. Continued research in this area will allow scientists and managers to better evaluate past efforts and to design new monitoring programs using evidence-based approaches.

AUTHOR CONTRIBUTIONS
EW and CB conceived the ideas, designed the methodology, and analyzed the data. Both authors contributed critically to the drafts and gave final approval for publication.