Identification of Shared Spatial Dynamics in Temperature, Salinity, and Ichthyoplankton Community Diversity in the California Current System With Empirical Dynamic Modeling

Identifying spatially shared dynamics is a key component of community ecology studies as they provide evidence of common responses to environmental factors. We apply co-prediction, an empirical dynamic modeling method (EDM), where values in one time series are predicted from another to quantify shared dynamics in the California Cooperative Fishery Oceanographic Investigation (CalCOFI) dataset composed of spatially explicit physical and biological measurements. Co-prediction can arise in the absence of correlation between two time series. The survey dates to 1951 and consists of a semi-regular, fixed-station design off the west coast of the USA. While the California Current is a dynamic system with multiple identified regimes, we found evidence of coherence measured in terms of spatially shared dynamics in salinity, temperature, Shannon index of ichthyoplankton abundance, and single-species ichthyoplankton abundance throughout the CalCOFI survey area. Leave-one-out hindcast skill, without including any knowledge of shared dynamics was significant in 27 stations for salinity data, 36 for temperature data, and 33 for Shannon index (out of 81 total stations). We then evaluated hindcast skill when including shared dynamics via composite libraries, in which correlated or co-predicted time series are concatenated to produce denser attractors. The number of correlated stations was generally higher than the number of co-predicted stations, but hindcast skill from composite libraries of correlated stations did not increase hindcast skill. Composite libraries of co-predicted stations had significant leave-one-out hindcast skill in 60 stations for salinity data, 60 for temperature, and 72 for Shannon index. Additionally, we found evidence of nonlinear relationships, as nonlinear hindcasts accounted for nearly all of these significant stations. While there were high levels of correlation among stations, co-prediction proved a more robust method of identifying shared dynamics. Shared dynamics were largely concentrated south of Point Conception, considered an oceanographic and biological breakpoint, although in some cases shared dynamics spanned this boundary. Taken together, we apply EDM to present the first, to our knowledge, evaluation of station-specific hindcast skill and provide a view of the realized spatial structure occurring in the physical and biological dynamics of the California Current system.

Identifying spatially shared dynamics is a key component of community ecology studies as they provide evidence of common responses to environmental factors. We apply co-prediction, an empirical dynamic modeling method (EDM), where values in one time series are predicted from another to quantify shared dynamics in the California Cooperative Fishery Oceanographic Investigation (CalCOFI) dataset composed of spatially explicit physical and biological measurements. Co-prediction can arise in the absence of correlation between two time series. The survey dates to 1951 and consists of a semi-regular, fixed-station design off the west coast of the USA. While the California Current is a dynamic system with multiple identified regimes, we found evidence of coherence measured in terms of spatially shared dynamics in salinity, temperature, Shannon index of ichthyoplankton abundance, and single-species ichthyoplankton abundance throughout the CalCOFI survey area. Leave-one-out hindcast skill, without including any knowledge of shared dynamics was significant in 27 stations for salinity data, 36 for temperature data, and 33 for Shannon index (out of 81 total stations). We then evaluated hindcast skill when including shared dynamics via composite libraries, in which correlated or co-predicted time series are concatenated to produce denser attractors. The number of correlated stations was generally higher than the number of co-predicted stations, but hindcast skill from composite libraries of correlated stations did not increase hindcast skill. Composite libraries of co-predicted stations had significant leave-one-out hindcast skill in 60 stations for salinity data, 60 for temperature, and 72 for Shannon index. Additionally, we found evidence of nonlinear relationships, as nonlinear hindcasts accounted for nearly all of these significant stations. While there were high levels of correlation among stations, co-prediction proved a more robust method of identifying shared dynamics. Shared dynamics were largely concentrated

INTRODUCTION
One main objective of ecology is to understand how the environment influences biological organisms from individual to ecosphere scales, and identification of shared dynamics or synchrony across space and time is a valuable tool for achieving this goal (Hsieh et al., 2005). For example, marine fish populations shift distributions in response to ocean warming (Pinsky et al., 2013), and this kind of knowledge is necessary to inform resilient fisheries management (Wilson et al., 2018) or ecosystem-based fisheries management (Pikitch et al., 2004). Application of modern statistical analyses have the capacity to further understanding of synchronous dynamics and holds the potential to improve forecasting (Sugihara et al., 2012). Augmenting forecasting skill is paramount to effective marine management and is particularly important in times of rapid environmental change (Jacox et al., 2020).
The abundance of species can fluctuate in response to combinations of abiotic and biotic factors, and these dynamics can be shared across species-specific populations in space and time. Ecological studies have focused on identifying patterns of synchrony, defined to be shared fluctuations between time series of population abundance. Synchrony exists for species across a spectrum of sizes ranging from protists (Holyoak and Lawler, 1996), insects (Williams and Liebhold, 2000;Tobin and Bjørnstad, 2003), fish (Myers et al., 1995(Myers et al., , 1997Fromentin et al., 2000), and birds (Bellamy et al., 2003). In terrestrial and marine systems, synchrony between two populations decreases as a function of distance (Ranta et al., 1995;Sutcliffe et al., 1996;Bjørnstad et al., 1999), and estimation of this decay is a key component of spatiotemporal models (Cressie and Wikle, 2015;Thorson et al., 2015).
Synchrony can be quantified with parametric statistical methods (Gouhier and Guichard, 2014), and the specific definition of synchrony can depend on methodology (Liebhold et al., 2004). For the most part, parametric methods involve computing some metric (e.g., correlation, variance, or semivariance) between two time series (Bjørnstad et al., 1999;Koenig, 1999). Spatial synchrony is measured by relating the calculated metric to the geographic distances between survey sites. Analysis of residual correlation is one approach of relating synchrony to environmental changes (Buonaccorsi et al., 2001). Analysts will fit a model to the data, e.g., autoregressive models or linear models, then quantify correlations between residuals. Correlated residuals suggest that both time series experienced a common response to an external (e.g., environmental) factor (Buonaccorsi et al., 2001). The challenge with this approach is that correlated residuals may be the result of model misspecification, which is difficult to identify.
Empirical dynamic modeling (EDM) is a non-parametric analytic method that may be an alternative method to quantify shared dynamics without requiring assumptions of independence or statistical distributions. Broadly, the EDM approach focuses on identifying the factors that govern dynamics in natural systems. Takens' theorem of time-delay embedding, a key component of EDM, demonstrates that lags of a single time series can reconstruct the dynamics of the unobservable system (Takens, 1981). This approach primarily distinguishes between observational noise and chaotic dynamics, but has proven applicable to ecological systems. Prediction with EDM outperforms parametric predictions in simulated chaotic ecological systems (Perretti et al., 2013a,b;Munch et al., 2017), and improves forecast skill in salmon runs  and fish recruitment (Munch et al., 2018). The methods also identify causal relationships between sardine landings and sea surface temperature (Deyle et al., 2013).
Time series can be synchronous even when not fluctuating in unison, and EDM can identify shared dynamics in the absence of traditional correlation (Sugihara et al., 2012). Co-prediction is a method of identifying time series driven by the same forces in ways that are not readily apparent. Technically, co-prediction involves predicting values of one time series from another time series. If predictions are significant (see methods for significance criteria), the time series are assumed to have dynamic similarity. Applications of co-prediction can identify interspecific dynamics (Liu et al., 2012) and relationships between fish populations and environmental covariates (Liu et al., 2014).
In addition to identifying synchrony, correlation and coprediction can inform forecasting through composite libraries. Composite libraries are a means of using spatial replicates of comparatively short time series to understand system dynamics (Hsieh et al., 2008), but potentially have the same benefits for longer time series. Composite libraries are composed of multiple time series concatenated together. Individual time series that contain as few as five observations can form composite libraries that detect causal relationships in simulated data with both observation and process error (Clark et al., 2015). In an in vivo ecological setting, composite libraries can identify the shared dynamics of albacore (Thunnus alalunga) across the North Pacific Ocean . Composite libraries may be a powerful approach to improve both hindcasting and forecasting in the California Current ecosystem.
The California Cooperative Oceanographic Fisheries Investigation (CalCOFI) program is among the longest-running oceanographic surveys in the world (Gallo et al., 2019). The survey has a fixed station design that has at times collected physical oceanographic (e.g., temperature, salinity) and plankton (e.g., zooplankton, ichthyoplankton) samples from southern Baja California, Mexico to Vancouver Island, Canada. However, the most comprehensive temporal coverage ranges from the US-Mexico border to the San Francisco Bay area extending from the nearshore to roughly 500 km offshore (Gallo et al., 2019). Although the original intent of CalCOFI was to better elucidate the factors causing the collapse of the Pacific sardine population in the 1940s (Hewitt, 1988), data from the survey now extend beyond fisheries applications and serve as an indicator of overall ecosystem status (Sugihara et al., 2011;Thompson et al., 2018;Harvey et al., 2019). As climate change continues to impact ocean dynamics, CalCOFI is poised to contextualize changes in the California Current Ecosystem (CCE) and help predict changes in fish and zooplankton communities.
The CCE is highly dynamic and characterized by interannual and interdecadal variability (Rykaczewski and Checkley, 2008;Thompson et al., 2018). Distinct water masses mix in the CCE (Bograd et al., 2015), and each body of water is associated with a particular biological community (McClatchie et al., 2018). As a result, fish assemblages fluctuate interannually (McClatchie et al., 2018). The CCE is also subject to larger scale climatic forcing . For example, there was a rather abrupt shift from relatively cold to warm conditions in 1976 that induced long-lasting increases in warm water associated fishes in southern California (Peabody et al., 2018). Given the complicated and dynamic nature of physical and biological properties of the CCE, EDM may be a tool to better understand shared dynamics and augment forecasting across the system.
Here, we leverage the rich spatial and temporal resolution of CalCOFI to identify shared dynamics and measure the ability of these similarities to improve hindcast (as a proxy for forecast) prediction skill. We focus on physical (salinity and temperature) and biological time series (species-specific ichthyoplantkon and Shannon index of ichthyoplankton diversity). The goals of this study are both methodological and empirical.
The first methodological goal is to quantify the degree of correlation and co-prediction to identify shared dynamics of physical and biological data among sample stations. The second methodological goal is test whether composite libraries made up of co-predicted time series (significantly predict one time series from another) improve hindcast skill in comparison with composite libraries of correlated sites. Our attempts to improve hindcast skill can serve as a template to conduct forecasts in the future. We hypothesize that there is synchrony in the CalCOFI data and that co-prediction will be a more robust method of identifying shared dynamics than correlation. That is, composite libraries of co-predicted stations will have higher hindcast skill than composite libraries of correlated stations. Notably, we explore synchrony and hindcasting within each variable (e.g., can temperature at station X be predicted by temperature at station Y) but do not search for patterns between variables (e.g., we do not test if temperature can predict fish abundance or diversity).
The ecological goal is to evaluate the extent of shared dynamics within physical and biological time series. We hypothesize that shared dynamics in the physical and biological time series will be mostly localized either north or south of Point Conception, a known biogeographic and oceanographic breakpoint (Hubbs, 1948). Knowledge of both shared dynamics and the spatial scale of dynamics may have important implications for survey design. Depending on the strength of shared dynamics, it may be possible to identify relatively redundant sampling locations to optimize sampling effort.

Data Preparation
CalCOFI currently collects physical and biological data from each of 104 stations in winter and spring between the United States/Mexico border and San Francisco and 75 stations from the United States/Mexico border to approximately San Luis Obispo in summer and fall (Figure 1). Because some stations are sampled more regularly than others, we culled the analysis to include 81 stations (Figure 1), with observations spanning from 1951 to 2017. We used observations from winter and spring surveys as the temporal and spatial coverage was highest. CalCOFI shifted from annual to mostly triennial sampling from 1971 to 1983, resulting in no seasonal surveys in some years. Although CalCOFI strives to collect from exactly the same location for a given station, in reality the precise location can vary somewhat from cruise to cruise. Hence, for a particular station, we average observations (see below) within 5 km of a cardinal station location.
CalCOFI samples a myriad of physical factors throughout the water column with a conductivity temperature depth instrument (McClatchie, 2014;Gallo et al., 2019). We focus here on temperature and salinity because these variables are known to impact the distribution and abundance of many marine species (Thompson et al., 2014(Thompson et al., , 2017. For each station, we calculate mean temperature and salinity between the surface and 100 m and average these means across winter and spring cruises for each station. The main, long-running CalCOFI biological observations are collected with plankton nets. We focus on ichthyoplankton collected with bongo nets lowered to 210 m (or within 10 m of the bottom at shallow stations) and towed to the surface at a constant speed and at a 45 • angle (McClatchie, 2014). Quantifying the abundance of larval fishes is a comprehensive method for assessing the dynamics of most fishes in an ecosystem because although adults occupy different habitats, larval fishes from most species reside in the upper 200 m of the water column and can thus be sampled simultaneously. Several studies demonstrate that ichthyoplankton abundance correlates with the spawning stock biomass of fishes (Moser and Watson, 1990;Moser et al., 2001;Ralston et al., 2003;Ralston and MacFarlane, 2010). CalCOFI plankton samples are preserved at sea in a trisbuffered 5% formalin solution. Ichthyoplankton are identified in a laboratory based on morphology (Moser, 1996). CalCOFI provides time series for hundreds of fishes. Raw larval counts are multiplied by a standard haul factor that accounts for differences in water filtered and divided by the percent of the FIGURE 1 | Map of stations within the CalCOFI survey grid and the data availabilities for each individual station (referenced by line-station) for 1951-2017. The map (left column) shows stations used for this analysis (black points) and those that were not (white points). The arrow indicates the location of Point Conception. The data availability plot (right column) shows survey records for years with winter-spring values (black tiles), summer-autumn values (gray tiles; not used in analysis), and no coverage (white tiles). CalCOFI shifted from annual to mostly triennial sampling from 1971-1983 resulting in no surveys in some years. In the early years of CalCOFI, the survey area extended north to British Columbia, Canada and south to Baja California, Mexico (not shown).
samples with high zooplankton volumes are often subsampled (Smith, 1977). Final abundances are expressed as larvae per 10 m 2 surface area. CalCOFI surveys typically sample each of the stations we use in this study 1-4 times per year (Figure 1), and we calculate annual averages from winters and springs. Note that CalCOFI shifted from annual to mostly triennial sampling from 1971 to 1983 resulting in no surveys in some years (e.g., 1979).
Many of the fishes in this study share similar adult habitat affinities and are subject to comparable fishing pressure. Both factors are known to comparably and predictably affect fish population dynamics (Hsieh et al., 2005). To evaluate whether fishes in particular groups exhibit similar patterns of synchrony, we assign each taxa to five groups delineated by Hsieh et al. (2005): oceanic-unfished, coastal-fished, coastal-oceanic-fished, coastal-bycatch, and coastal-unfished. For each species, we select time series with at least 25% non-zero values as some species are only rarely observed at particular stations. For example, some mesopelagic fishes are never or very rarely found at shallow, coastal stations.
Finally, we generate time series of single value Shannon diversity index values (H s,t ) using the equation: at each station (s) in year (t) for species (i). The proportional abundance (p) is multiplied by the natural logarithm of p and summed from species i to the total number of species (R).
A single H value was calculated across winter and spring for each year and station. Additionally, the Shannon diversity values were calculated based on all the species reported, which in most cases included additional species to the 36 fish taxa in the single-species analysis. Although we can now identify almost all taxa to species, taxonomic knowledge was less developed at the beginning of CalCOFI. To keep time series used to calculated diversity consistent, we group some species to the 1950s level taxonomic resolution. We use 60 taxa in calculations of Shannon diversity and focus on the 36 most common species for singlespecies analyses.

Synchrony
We conduct and compare correlational and co-prediction analyses to evaluate synchrony among stations for two physical parameters (temperature and salinity), abundances of 36 fish taxa, and fish diversity. In addition, we evaluate the spatial extent of shared dynamics north and south of Point Conception. Specifically, we quantify the proportion of stations that are significantly synchronous with paired stations within and between northern and southern regions. We follow this analysis by calculating the mean Euclidian distance (km) separating significantly synchronous stations in the north and south. Finally, we assess if synchrony patterns vary among the five fish groups (e.g., ocean-nonfished, coastal-fished). We determine if northsouth synchrony patterns differ depending on whether synchrony was assessed with correlation vs. co-prediction analyses.

Correlation
We calculate correlation coefficients (i.e., correlational synchrony) between time series from combinations of station pairs with time lag-0. Our criteria for significance is a statistically significant Spearman's rho coefficient (p < 0.05). We do not apply sequential Bonferroni corrections that alleviate type-1 error that can arise with multiple testing because our objective is to compare results from correlation and co-prediction analyses rather than evaluate the significance of any one particular time-series.

Co-prediction
A key step in EDM co-prediction analysis is to identify the dimensionality of the data. In nature, a number of physical and environmental factors drive a population; a time series is a onedimensional observation of the factors' effects on the population. Fortunately, ecological models do not require analysts to identify all of the factors, rather a comparatively few number of variables can be representative of the dynamics (Schaffer and Kot, 1985). Similarly, Takens' theorem, a tenet of EDM, formalizes this idea that a single time series and some number of lags (dimension; E) are representative of system dynamics (Takens, 1981;Sugihara and May, 1990). Per Takens' theorem, an M-dimensional system converges to a d-dimensional attractor, and a single time series of observations y t , t = 1, 2, . . . , T and lagged coordinates of y (at time step τ) Y t = y t , y t−τ , . . . , y t−Eτ can reconstruct the d-dimensional attractor (Takens, 1981). This requires that a time series is sufficiently long to capture attractor dynamics. Simplex projection (hereafter referred to as simplex) along with sequentially locally weighted global linear maps (s-map) are two key EDM forecasting methods. The distinction is that simplex identifies the dimensionality of the system, and s-map characterizes the data as linear or non-linear (Sugihara, 1994). S-map will generally outperform simplex if system dynamics are non-linear.
We first identify the dimensionality of time series in the CalCOFI dataset with simplex. Simplex projection takes a weighted average of the nearest neighbors' trajectories depending on the specified dimension (E). Given an E, a time series and its E-lagged coordinates, we seek to predict a valueŷ t+1 . In order to predict this value, we use the Euclidean distance d(y t , y s ) between y t and y s and calculate the weights w i (t) as: where n(t, i) specifies the index of the i-th closest neighbor to y n(t,1) . The predictionŷ t+1 is: We use leave-one-out cross-validation and select the E with the highest correlation between predicted and observed values. We evaluate E values ranging from 1 to 10 and identify the bestfitting E for each time series.
Next, we evaluate co-prediction in which we take the bestfitting E for one time series (library) and predict values in another time series (predictee) with simplex. Co-prediction quantifies the dynamic similarity between time series, and has been used to identify interspecific and species-environment dynamics (Liu et al., 2014) and nonlinearities (Liu et al., 2012) in the FIGURE 2 | The numbers of significantly correlated vs. numbers of significantly co-predicted stations for salinity, temperature, and Shannon indices of diversity. Each point represents the numbers of each correlated or co-predicted with a particular station (e.g., station A is correlated with 70 stations and co-predicted with 10 stations). Points are slightly transparent to allow overplotting. For nearly all stations, the numbers of correlated stations were higher than the number of co-predicted stations. Dashed lines indicate the one to one line. Composite libraries were composed of correlated or co-predicted stations based on these results.
Northwest Atlantic. We conclude significant co-prediction if there are positive and statistically significant correlations between predicted and observed values (ρ > 0; p < 0.05) and mean absolute scaled error (MASE) values less than 1. Consider a dataset from timesteps t = 1, . . . , n. We generated a hindcast prediction, F t , at each timestep. Forecast error was: where Y t is the observation at time t. Scaled error q t was: and MASE was: A MASE value less than 1 indicates that the prediction had lower error than that from a naïve predictor, which uses the prior year value as a prediction (Hyndman and Koehler, 2006). Again, we evaluate co-predictability between stations for each set time series (i.e., salinity, temperature, Shannon index of abundance, and single-species ichthyoplankton) for each station, but do not attempt to conduct co-prediction across sets of time series (e.g., we do not predict salinity from temperature). Further details on EDM are available in the documentation for the R package "rEDM" 1 and Chang et al. (2017).

Hindcasting
We construct composite libraries (i.e., concatenated time series) of the significantly correlated and significantly co-predicted stations identified here.
We generate hindcast predictions from three data scenarios for each time series. Consider time series A from a specific 1 https://github.com/ha0ye/rEDM station for a particular data set (e.g., salinity). We s-map hindcast (leave-one-out) values of time series A from: (1) time series A; (2) the composite library of time series A and time series correlated with A; and (3) the composite library of time series A and time series co-predicted by A. Once again, we replicate these three scenarios for each of the time series but do not attempt comparisons between data sets (e.g., predicting Shannon index from salinity). Our goal here is to quantify the ability of correlated and co-predicted stations to improve hindcast skill.
We use s-map with leave-one-out cross validation to evaluate hindcast skill. S-map is an extension of simplex that has an additional parameter (θ) that controls the strength of nearestneighbor weighting. S-map can make both linear (θ = 0) and non-linear (θ > 0) predictions (Sugihara, 1994). We select E   and θ from time series A based on the values that maximize the correlation between the leave-one-out predictions and observations. We then use the E and θ values with s-map to hindcast values for time series A from time series A, correlated composite libraries, and co-predicted composite libraries. The criteria for statistical significance is positive correlations between predicted and observed values (ρ > 0, p < 0.05) and lower error than that of a naïve prior year predictor (MASE < 1).

Synchrony
We found evidence of synchrony between stations within all time series (salinity, temperature, Shannon index, and 36 single-species ichthyoplankton abundances) with both correlational and co-prediction analyses. Generally, there were many more correlational relationships than co-predicted relationships (Figure 2). Each of the 81 stations was correlated and co-predicted with at least one other station for each of the temperature, salinity, and Shannon index data. Correlated and co-predicted stations were most concentrated south of Point Conception. A minority of the correlated pairs are north of Point Conception: 7% for salinity, 8% for temperature, and 17% for Shannon index ( Table 1). We found a similar pattern for co-predicted pairs north of Point Conception: 11% for salinity, 18% for temperature, and 6% for Shannon index (Table 1). Note, that 22 of the 81 stations (27%) were located north and 59 of 81 stations (73%) were located south of Point Conception. Generally, stations closer to shore and south had the highest correlation and co-prediction with other stations (Figure 3).
Adjusting for the distribution of stations north (22 stations) and south (59) of Point Conception by representing values in terms of proportions (e.g., 10 out of a possible 22 and 12 of a possible 59) for each library station resulted in slightly more balanced relationships across Point Conception. Stations south of Point Conception co-predicted with roughly the same proportions of stations north and south for salinity, temperature, and Shannon index (Figure 4). Library stations north of Point Conception were more co-predicted with stations north for salinity and temperature, whereas the proportions for Shannon index were roughly equal (Figure 5). A majority of the predictee stations for single-species ichthyoplankton library stations were concentrated south of Point Conception ( Table 2). For library stations north of Point Conception, at least half of the predicted stations were south of Point Conception for coastal-oceanic-fished and oceanic species ( Table 2). For library stations south of Point Conception, at least half of the predicted stations were also south for coastal-fished, coastal-oceanic-fished, and oceanic species categories ( Table 2).

Composite Libraries
Individual stations showed evidence of hindcast skill. Leave-oneout predictions for a particular time series were significant in 27 stations for salinity, 36 for temperature, and 33 for Shannon index of 81 total stations ( Table 3).
Composite libraries generally resulted in a greater number of significantly predicted stations. The number of significantly predicted stations from composite libraries of correlated stations was 28 for salinity, 36 for temperature, and 42 for Shannon index ( Table 3). Predictions from composite libraries of co-predicted stations were significant in 60 stations for salinity, 60 for temperature, and 72 stations for Shannon index (Table 3). For salinity and Shannon index, co-prediction was a more robust method of identifying shared dynamics than correlation.
Generally, significant hindcast skill was highest with nonlinear predictions, indicated by θ values greater than 0. Nonlinear

Coastal-unfished
Pacific argentine Argentina sialis Across 33% (n = 12) 56% (n = 9) S-S 67% (n = 12) 44% (n = 9) The Point Conception column indicates whether the library and predicted stations crossed (Across), both north (N-N), or both south (S-S) of Point Conception. Generally the highest percentage of relationships were both south of Point Conception.
predictions resulted in significance for roughly 80% of the stations in the single station and composite library scenarios ( Table 3). For the co-predicted composite library scenario, nonlinear s-map predictions accounted for nearly all the significant results ( Table 3).
The CalCOFI survey has variable temporal and spatial sampling frequencies due to logistical and financial challenges common to any long-term ecological survey. Stations off northern California in the CalCOFI grid had stretches with no winter and spring surveys. Because we compare s-map predictions to lagged observations, assuming a lagged observation from say 10 years prior may bias MASE calculations. In other words, a poor predictor compared to a lagged observation from many years prior may result in lower MASE values. To control for this, we filtered time series such that the maximum gap was 3 years or less and recalculated MASE. The number of significant stations was relatively unchanged, and the Predictions were made with s-map using either a single station, composite library of correlated stations, or composite library of co-predicted stations. Values in parentheses show the percentage of significant stations with θ values greater than 0 (indicating nonlinearity).
number of significantly co-predicted composite library stations decreased by 1-3 stations (Supplementary Table 1).
Inclusion of co-predicted stations improved hindcast skill of Shannon index for mostly offshore stations (Figure 6). Additionally, composite libraries resulted in significant predictability for all three of salinity, temperature, and Shannon index in 38 of 81 stations. Although there were some cases, particularly for salinity and Shannon index, where predictions from composite libraries of co-predicted stations had MASE values of 0.6 (indicating that error from predictions was 60% the error from a lagged observation; Figure 6).
Prediction for individual species was highest with copredicted composite libraries. For many species, hindcast skill was highest with composite libraries of co-predicted stations, and this trend was strongest in oceanic and coastal-fished species (Table 4).

DISCUSSION
We find evidence of spatially shared dynamics in salinity, temperature, Shannon index, and individual ichthyoplankton species as measured by correlation and co-prediction. Leveraging knowledge of shared dynamics via composite libraries of correlated or co-predicted stations generally improved hindcast skill across all data types. However, although synchrony is more evident from correlation than co-prediction, co-prediction is a more robust method to significantly hindcast salinity, temperature, Shannon index, and nearly all single-species ichthyoplankton. Taken together, we demonstrate the utility of co-prediction in identifying shared dynamics and find evidence of widespread nonlinear spatial structure in physical and biological observations across the CalCOFI survey area. To our knowledge, this study serves as the first evaluation of stationspecific hindcast skill of the CalCOFI data set. Identifying shared dynamics with co-prediction is an important step in constructing composite libraries. Previous studies that implemented composite libraries used all available time series from individual species (Hsieh et al., 2008) or locations Clark et al., 2015). Our results show that identifying shared dynamics with co-prediction is an important step to improve hindcast skill. Longer composite libraries composed of more stations (identified through correlation) did not result in higher hindcast skill than copredicted stations, with the exception of Shannon index. While we did not explicitly have a scenario of composite libraries with all 81 stations, composite libraries for salinity and temperature mostly included between 70 and 80 correlated stations. Coprediction quantifies the degree to which two time series are generated from the same underlying process and has the potential to identify relationships in the absence of positive correlation (Engle and Granger, 1987).
We found evidence of nonlinear relationships in the CalCOFI survey data. A majority of the significant results came from nonlinear predictions, with s-map θ values greater than 0, across data types and composite library scenarios. These findings are consistent with previous analyses of CalCOFI data which found nonlinearities in biological time series (Hsieh et al., 2005). These studies utilized out-of-sample forecasting, in contrast to the methods used here, but found that physical time series had high dimensionality and linear dynamics. Thus, it is likely that fish populations have nonlinear responses to environmental forces and have nonlinear relationships across space.
While this study focuses on hindcasting, the methods used here may be extended to out-of-sample forecasting to better identify and predict regime shifts. The transition to out-ofsample forecasting may yield insight to the characteristics of a system undergoing a regime shift. For example, a system undergoing a regime shift may be characterized by a composite library of co-predicted stations undergoing a decrease in forecast skill. Additional indicators may be a shift in the number and orientation of co-predicted stations or a transition between linear and nonlinear dynamics. If analyses extend to include multivariate analyses, there may be time-varying changes in interactions, similar to those identified in Deyle et al. (2016).
The California Current is characterized by physical and biological regimes, and here we show that stations across space demonstrate shared dynamics through multiple regimes over the roughly 70 year span of CalCOFI observations. Studies of principal components in over 100 time series, both physical and biological, found regime shifts in 1976 and 1989 (Ebbesmeyer et al., 1991;Hare and Mantua, 2000). Shifts in the Pacific Decadal Oscillation from a negative to positive phase were hypothesized to precede shifts in biological regimes (McFarlane and Beamish, 2001;Moser et al., 2001). Indeed, a previous study has identified five ichthyoplankton assemblage regimes in analysis of the southern portion of the CalCOFI survey area (Peabody et al., 2018). The combination of co-prediction, composite libraries, and s-map can potentially improve the capability to track system dynamics of a regime change. This work remains to be done but is a logical next step.
We found shared dynamics to be largely concentrated south of Point Conception, although this result may be influenced by skewed station distributions north and south of Point Conception. Point Conception is a well-known biogeographic break within the CCE (Allen et al., 2006) with sharply contrasting water masses north and south of Point Conception (Lynn et al., 2003). Ocean conditions north of Point Conception tend to be dominated by the equatorward-flowing California Current which is cold and relatively fresh as well as cold, salty upwelled water closer to shore that is induced by strong equatorward winds (Checkley and Barth, 2009). As a result, water temperature often increases abruptly south of Point Conception within the Southern California Bight (Checkley and Barth, 2009;Thompson et al., 2016). Co-prediction identifies shared dynamics between two time series but does not measure causal relationships. Convergent cross mapping (Sugihara et al., 2012) can identify causality between time series, and analyses that apply this method may identify mechanistic relationships between stations. Inclusion of additional oceanographic measurements such as oxygen, phosphate, and silicate may further enhance analyses of the movement and forcing of distinct water masses.
Shannon index have significant hindcast skill in 89% of stations (n = 81). While our focus is on the 60 most-caught taxa (used to calculate Shannon index), our results indicate that there are likely common factors driving shared dynamics across space. Physical conditions cascade to affect zooplankton abundances, which fluctuated in synchrony from 1949 to 1969 (Bernal and McGowan, 1981). Additionally, taxa with similar life histories and adult habitats track each other even when they are uncorrelated with environmental conditions (Hsieh et al., 2005). This is another area of future research, and convergent cross mapping, another EDM method, is one extension to identify causal relationships between populations and environmental conditions or interspecific interactions.
There are multiple possible ecological explanations for the predictability in species like Northern anchovy and bigscales. Recruitment may be an important factor influencing shared ichthyoplankton dynamics in the CalCOFI data. Recruitment generally stabilizes metacommunities (Gouhier et al., 2010(Gouhier et al., , 2011 although the relative levels of recruitment can influence synchrony and stability differently (Townsend and Gouhier, 2019). Additionally, local recruitment synchronizes mussel populations across 1,800 km of coast (Gouhier et al., 2010). The rich time series of available data in the Southern California Bight would allow for analyses relating egg time series (collected on CalCOFI cruises) to ichthyoplankton time series to young-ofyear surveys and adult catches to evaluate interactions across all life history stages. EDM works best at predicting recruitment for short-lived, fast-growing species (Munch et al., 2018), and the inclusion of multiple variables may further improve forecast skill. Oceanographic currents in the Southern California Bight have been characterized to identify metapopulation networks (Watson et al., 2011).
While the EDM approach is generally robust to some missing values, additional modeling approaches may not be. The composite library approach has higher predictive skill than using the previous year's value as the forecast. In an analysis of multiple time series forecast methods, this naïve predictor had the highest short-term predictive skill for 2,379 time series of vertebrate population indices (Ward et al., 2014). Shannon index results were likely more predictable as they integrate the year-toyear variability in individual species. Species like bigscales, blue lanternfish, rockfish, Northern anchovy, and Pacific hake all had the most predictable dynamics suggesting that there may be a small number of species driving Shannon index in each year.
Evaluation of out-of-sample predictability was beyond the scope of this study but is a logical next step. Out-of-sample forecasting skill will likely increase if causal relationships exist in the CalCOFI data. Convergent cross mapping (Sugihara et al., 2012) and its spatial applications (Clark et al., 2015) are natural extensions of this analysis and may identify relationships between physical variables like temperature and salinity and biological variables like Shannon index of diversity. The CalCOFI dataset is an ideal dataset for such analysis due to the high spatiotemporal resolution and multiple types of observations. Comparing temperature and salinity directly to ichthyoplankton time series misses key components of the community structure. Likely, there are multiple levels of interactions relating physical conditions to phytoplankton to zooplankton to ichthyoplankton . Additionally, analysis may need to adopt a finer temporal scale to identify seasonal drivers. Here, we used averages of physical and biological measurements across winter and spring, which may have smoothed signal in the data. S-map coefficients may elucidate time-varying interactions between biological and physical data sets (Deyle et al., 2016;Ushio et al., 2018). Finally, additional methodologies such as EDM Gaussian processes (Munch et al., 2017;Rogers and Munch, 2020) and regularized s-map (Cenci et al., 2019) may offer improvements in both in-sample and out-of-sample prediction skill.
The analysis we have presented here, and the analytic next steps outlined above, are motivated by both the desire to understand the ecological dynamics of the CCE and the need to identify analytic methods that can support future survey design/reorganization efforts. There are numerous financial and logistical challenges associated with conducting large-scale surveys, and it is difficult to maintain constant sampling efforts year to year. Co-prediction and composite libraries can provide a means of prioritizing survey sites by identifying partial redundancies in the CalCOFI survey grid. In the case that sampling efforts reduce, locations with strongly shared dynamics may be redundant, in that sampling in these areas may not provide additional information. Locations without shared dynamics may be high priorities because they contribute to a more comprehensive survey of an area.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: https://calcofi.org/ccdata.html.
Fund and the McQuown Chair in Natural Sciencies, University of California, San Diego.