A Regional Earth System Data Lab for Understanding Ecosystem Dynamics: An Example from Tropical South America

Tropical ecosystems experience particularly fast transformations largely as a consequence of land use and climate change. Consequences for ecosystem functioning and services are hard to predict and require analyzing multiple data sets simultaneously. Today, we are equipped with a wide range of spatio-temporal observation-based data streams that monitor the rapid transformations of tropical ecosystems in terms of state variables (e.g., biomass, leaf area, soil moisture) but also in terms of ecosystem processes (e.g., gross primary production, evapotranspiration, runoff). However, the underexplored joint potential of such data streams, combined with deficient access to data and processing, constrain our understanding of ecosystem functioning, despite the importance of tropical ecosystems in the regional-to-global carbon and water cycling. Our objectives are: 1. To facilitate access to regional “Analysis Ready Data Cubes” and enable efficient processing 2. To contribute to the understanding of ecosystem functioning and atmosphere-biosphere interactions. 3. To get a dynamic perspective of environmental conditions for biodiversity. To achieve our objectives, we developed a regional variant of an “Earth System Data Lab” (RegESDL) tailored to address the challenges of northern South America. The study region extensively covers natural ecosystems such as rainforest and savannas, and includes strong topographic gradients (0–6,500 masl). Currently, environmental threats such as deforestation and ecosystem degradation continue to increase. In this contribution, we show the value of the approach for characterizing ecosystem functioning through the efficient implementation of time series and dimensionality reduction analysis at pixel level. Specifically, we present an analysis of seasonality as it is manifested in multiple indicators of ecosystem primary production. We demonstrate that the RegESDL has the ability to underscore contrasting patterns of ecosystem seasonality and therefore has the potential to contribute to the characterization of ecosystem function. These results illustrate the potential of the RegESDL to explore complex land-surface processes and the need for further exploration. The paper concludes with some suggestions for developing future big-data infrastructures and its applications in the tropics.

Tropical ecosystems experience particularly fast transformations largely as a consequence of land use and climate change. Consequences for ecosystem functioning and services are hard to predict and require analyzing multiple data sets simultaneously. Today, we are equipped with a wide range of spatio-temporal observation-based data streams that monitor the rapid transformations of tropical ecosystems in terms of state variables (e.g., biomass, leaf area, soil moisture) but also in terms of ecosystem processes (e.g., gross primary production, evapotranspiration, runoff). However, the underexplored joint potential of such data streams, combined with deficient access to data and processing, constrain our understanding of ecosystem functioning, despite the importance of tropical ecosystems in the regional-to-global carbon and water cycling. Our objectives are: 1. To facilitate access to regional "Analysis Ready Data Cubes" and enable efficient processing 2. To contribute to the understanding of ecosystem functioning and atmosphere-biosphere interactions. 3. To get a dynamic perspective of environmental conditions for biodiversity. To achieve our objectives, we developed a regional variant of an "Earth System Data Lab" (RegESDL) tailored to address the challenges of northern South America. The study region extensively covers natural ecosystems such as rainforest and savannas, and includes strong topographic gradients (0-6,500 masl). Currently, environmental threats such as deforestation and ecosystem degradation continue to increase. In this contribution, we show the value of the approach for characterizing ecosystem functioning through the efficient implementation of time series and dimensionality reduction analysis at pixel level. Specifically, we present an analysis of seasonality as it is manifested in multiple indicators of ecosystem primary production. We demonstrate that the RegESDL has the ability to underscore contrasting patterns of

INTRODUCTION
Novel data streams in the Earth system sciences are becoming available at unprecedented rates (Boulton, 2018). Given that many data streams are regularly improved and frequently its spatio-temporal resolution is increased, we overall face a quasiexponential growth of data volumes (Guo, 2017;Reichstein et al., 2019). The "data rich world" has become a challenge widely acknowledged across disciplines, but also opened a novel window of opportunity. For example, in the Earth system sciences, we expect a deeper understanding of a wide range of processes that remain to be insufficiently understood today (Scholze et al., 2017;Gentine et al., 2018;Reichstein et al., 2019). In particular, these data have large potential to reduce uncertainties in the quantification of global hydrological fluxes (Miralles et al., 2011;Beck et al., 2016;Ciabatta et al., 2018;Shen et al., 2018), atmosphere-biosphere exchange of carbon, water and energy (Dorigo et al., 2011Green et al., 2017;Konings and Gentine, 2017;Papagiannopoulou et al., 2017;Ryu et al., 2019;Jung et al., 2020), the estimation of biodiversity patterns (Asner et al., 2015;Ma et al., 2020), and the interactions of all these processes as mediated by key ecosystem functional properties (Reichstein et al., 2014;Musavi et al., 2015;He et al., 2019).
The reason for this optimism comes especially from emerging opportunities in interpreting a wide array of data streams that jointly monitor the same system from different viewpoints. Examples are the monitoring of land ecosystems with multiple sensors at different wavelengths via satellite remote sensing, e.g., from the optical to the radar domain (Joshi et al., 2016;Anaya et al., 2020;Heckel et al., 2020), the joint analysis of field measurements and remotely sensed data Meyer et al., 2019), and productions of ensembles of multiple data sets that integrate process-based understanding (Musavi et al., 2017). In general, it is the multitude of climate data sets that allow researchers to understand the multivariate and multifaceted nature of land-dynamics in relation to climate variability (Kraemer et al., 2020;Mahecha et al., 2020). Big-data perspectives of this kind in the Earth system context are therefore highly relevant to improve our understanding of ecological processes, e.g., effects of land use and climate change, and other fundamental transformations on the functioning of land ecosystems.
Given that many of the relevant data streams are retrieved from space, they create a unique opportunity to understand dynamics, trends and tipping points in those regions of the Earth that often lack dense in-situ observation networks. This is an advantage for low and mid-income countries, which nevertheless experience the fastest and most severe ecological and social transformations (Hansen et al., 2013;Leblois et al., 2017;McNicol et al., 2018;Song et al., 2018). We specifically point out to tropical ecosystems for two reasons. On the one hand, these regions experience fast ecological transitions e.g., due to rapid socioeconomic development (Dávalos et al., 2011;Bathiany et al., 2018;Armenteras et al., 2019a). For instance, the tropical Andes are the most critical hotspot for biodiversity in the world due to human encroachment, deforestation and land use change (Orme et al., 2005;Etter et al., 2008;Poveda et al., 2011). On the other hand, tropical ecosystems constitute relevant controls on the global carbon and water cycles. The Amazon forest, for instance, plays a significant role in the global carbon balance (Chambers et al., 2001;Pan et al., 2011;Phillips and Lewis, 2014;Hubau et al., 2020), and strongly regulates water and moisture recycling at the continental scale (Poveda et al., 2006;Zemp et al., 2014Zemp et al., , 2017. In turn, the Andes and the low-lying Amazon constitute a coupled system whereby the low-lying Amazon exports atmospheric water to the Andes by the winds, while the Andes export surface water, sediments and nutrients to the Amazon, which highlights their mutual interdependence and the deleterious impacts of deforestation on both sides for the integrity of the system's functioning (Builes-Jaramillo et al., 2018). Additionally, other ecosystems play a crucial role in biogeochemichal cycles. In general, savannas have been considered main drivers of interannual variability in the carbon cycle (Ahlström et al., 2015), and wetlands, including flooded savannas and swamp forest, are significant contributors of methane emissions (Bloom et al., 2017).
However, the countries hosting these highly relevant ecosystems often happen to be those where knowledge on Earth system dynamics happens to be at a comparatively early stage. Limited resources for science and technology often hinder dealing with these highly interdisciplinary challenges. One practical reason might be that research e.g., into large-scale biosphere-atmosphere interactions may require big-data infrastructures, data hosting facilities, and numerical preparation that is hardly achievable by local institutions in the long-term. This is why great hopes are today on global data facilities that may become fundamental game changers in this context. Specifically, we refer to such facilities that are not only providing data access but also provide users' independence for developing and executing analysis. Prominent examples such as Google Earth Engine (GEE) (Gorelick et al., 2017;Tamiminia et al., 2020), or the Climate Data Store give access to a wide set of data streams accompanied by analytics facilities. However, these platforms usually provide the data as is, which means they are a collection of satellite images that are stored image by image for example as a stack of GeoTIFFS or as NetCDF files chunked as latitude-longitude maps. If the task is understanding the complex dynamics of time series or spatio-temporal patterns, having efficient computational access to the time dimension is key for boosting temporal studies at individual pixels or selected regions. Therefore, a certain amount of data pre-processing is necessary to give analytical tools the possibility for efficient access to data along all axes of the n-dimensional cube (e.g., latitude, longitude, time, variables, ensemble members, etc.). In addition, none of these platforms are particularly developed for the analysis of tropical ecosystems in terms of specific data availability and analytic capacity to trigger the understanding of regional Earth system processes.
In an ongoing joint European-Colombian collaboration since 2016, we identified the lack of open platforms that not only share free and open data, but enable analyzing them for specific problems such as regional conservation issues that should accompany a suite of regional monitoring and synthesis efforts (Sierra et al., 2017), let alone with the accompanying meta-data information. Starting from there, we have worked on the conceptual outline of a data infrastructure to study land ecosystem dynamics in space and time in tropical South America. We developed a regional Earth System Data Lab (RegESDL) for northern South America to facilitate big-data analytics efficiently based on cloud infrastructures. Our approach is twofold: first, it provides analysis ready data cubes (ARDCs) that can be augmented by almost any other conventional spatial data set. Second, and this is key, it provides the opportunity to apply any arbitrary set of userdefined functions and algorithms on the generated data cubes. The idea is that time, space, and variables, are all dimensions that can be equally relevant to the problem under investigation and therefore need to be treated alike ).
Many regional data cube efforts have recently emerged around the Open Data Cube (ODC) concept that was originally developed for Australia (Lewis et al., 2017). Mostly, ODCs facilitate access to pre-processed satellite imagery. This idea is now applied to e.g., Armenia (Asmaryan et al., 2019), Colombia Bravo et al., 2017), Kenya (Killough, 2019), Switzerland (Giuliani et al., 2017), among other countries. Previous work has focused on the implementation and perspectives of these and related initiatives that vary from the software development to its establishment as national projects for supporting decision making (Giuliani et al., 2017;Killough, 2019). However, these initiatives are based on a latitudelongitude grid for data storage. This dramatically limits efficient access to time series which comes with an expensive computational cost when operating on the raw data directly because the spatial dimensions is the main unit of access and processing. Compared to all these initiatives, our approach is committed to support efficient access to any suite of geographical dataset. This allows users to explore time, variables, space, and other dimensions in its equal right . We also regard it as essential to take user-defined functions seriously and prioritize them. Users should be able to use the full power of a programming language to write algorithms, including calling into third-party libraries and map them over the entire data cube in a way that is equally efficient. In this study, we support moderate spatial resolution for understanding Earth System interactions addressing regional challenges; varying from technical aspects of data quality, acquisition and management, to high complexity due to landscape heterogeneity.
The purpose of this paper is to introduce the RegESDL for northern South America and illustrate its potential to characterize land-surface processes in relation to climatic and land use drivers. Using the example of ecosystems complex seasonality, we illustrate the approach by combining dimensionality reduction and time series analytics tools. The paper is structured as follows: First, we briefly introduce the RegESDL architecture, the implemented facilities and computational approach. Second, we describe the available datasets and the regional context. Third, we showcase how to operate on the RegESDL by characterizing seasonal dynamics in tropical ecosystems. For this, we use a multivariate set of remote sensing derived indicators related to ecosystem productivity. Seasonality in the tropics is well characterized from a climatological point of view, but hardly described from the point of view of ecosystem functioning. Finally, we discuss the findings of our study and provide some guidance on how the RegESDL should help to advance research in the tropics across disciplines such as biodiversity from both a conceptual and technical standpoint.

The Regional Earth System Data Lab Architecture
The RegESDL is a twofold approach for big-data analytics of spatio-temporal variables. It is conceptually and technically an extension of the the Earth System Data Lab (ESDL) , and its guiding principle is to treat all data dimensions such as latitude, longitude, time, variables, and new dimensions (i.e., outcomes from processing), all alike. Thus, all data sets are treated as elements of the same "hypercube". A formal mathematical definition of data cubes and how to operate on them is provided in Mahecha et al. (2020). The first fold is based on ARDCs which are a set of data cubes gridded at the same spatio-temporal resolution (see section 2.2). The second fold is the analytics software that tackles the issue of working with large datasets that are too big to fit into a computer's RAM. With our approach this is not any longer a critical limitation given that many user functions do not operate on the whole dataset at once, but can operate independently on slices of the data cube along a given dimension.
Traditionally, there are many tools for analyzing data sliced along the space dimensions and well established libraries like the Geospatial Data Abstraction Library (GDAL/OGR Contributors, 2020) used as main dependency in geospatial libraries of different geographical information system software such as QGIS and programming languages such as R and Python. However, in our cubing approach, slicing is not only efficient along the space dimension but also along all dimensions i.e., time, variables, and any other thinkable dimension a cube might have (e.g., frequency domain after time series decomposition). The idea is that users only have to define and implement their basic functions at the minimum dimensions that the computation would operate individually in the data cube. The system then handles each operation efficiently, i.e., the implemented logic for slicing can be applied to solve spatial, temporal, or multivariate problems all in the same highly efficient way.
The ESDL software uses split-apply-combine methods (Wickham, 2011) to facilitate the repeated application of userdefined functions to sub-cubes. Thus, in the Julia interface, in order to implement a customized function to be applied to the entire cube or its lower-dimensional sub-cubes, the user first simply implements it for the lowest possible dimensionality. As an example, an operation on a single time series only needs to be implemented as a function that takes a vector (i.e., onedimensional array) as its input. Likewise, a function intended to operate on two-dimensional latitude-longitude slices of the data only needs to be designed in a way that it expects a matrix argument, not a three or four-dimensional cube. The user simply ignores the fact that later the function is applied to a higherdimensional object, for example along variables or any remaining cube dimension. Then this function is passed to a higher-level processing pipeline, along with the definition of input and output dimensions and a highly optimized computation kernel is generated by the processing package.
The users can, too, allow any ARDCs axis to interact with data stored in other formats such as one or multi-dimensional arrays. For example, indices of climate variability such as El Niño Southern Oscillation (ENSO) could either be defined simply as a vector or transformed into a one-dimensional cube. In both cases, the index interacts with the corresponding ARDCs axis in a similar way than an apply function in the time domain. These split-apply-combine methods are common tools in data-science oriented languages like R, Python and Julia for in-memory datasets. However, for larger-than memory datasets it is not so easy to find a suitable solution at hand. While frameworks like Apache Hadoop and Spark (Vavilapalli et al., 2013;Zaharia et al., 2016) provide solutions for unstructured, tablelike data, they would not fit for the challenges provided by structured n-dimensional arrays. A very promising approach is the combination of xarray and dask for efficient and scalable splitapply-combine computations in combination with Zarr as a storage backend.
The implementation of this data cube approach takes advantage of the latest cloud-ready formats for big chunked spatio-temporal data sets. Here we use the Zarr format (https://zarr.readthedocs.io/ en/stable/spec.html) that focuses in cloud technologie storage and can be efficiently handled in Python (https://zarr.readthedocs.io/en/ stable/) and Julia (https://github.com/meggart/Zarr.jl). In addition to the Python xarray interface we offer another interface implemented in Julia, a novel high-level programming language for scientific computing, in the ESDL.jl package by Fabian Gans (co-author), and the most up-to-date documentation is always available in the respective GitHub repository https://github.com/esa-esdl/ESDL.jl (last visit Apr 17, 2020). All the Julia ESDL.jl packages and facilities have been transferred to the RegESDL offering a very flexible and efficient way for processing. The ESDL software is open source and available under the MIT license.

Analysis Ready Data Cubes
In our study, we defined ARDCs as spatio-temporal datasets, usually provided by different sources, stored in a uniform grid and located at common chunks. Figure 1 shows a schematics of the ARDCs main features and the stepwise generation. ARDCs support immediate interaction between different data sources. Rechunking is unnecessary for analysis along the spatial or temporal dimension speeding up the processing. ARDCs in the RegESDL extend from latitude 14°N to 14°S and longitude -83°W to -60°W. The spatial resolution is 0.0083°× 0.0083°(approximately 0.9 km × 0.9 km at the equator). We consider this spatial resolution a good compromise for regional studies, preserving sufficient spatial details for general climate patterns, ecosystem gradients and main relief features. Nevertheless, it is limited in very steep areas and heterogeneous landscapes in the Andes. The temporal resolution is 8-daily and the covered period is from 2001 to 2014. When necessary, data has been resampled or interpolated to match the spatio-temporal grid. Temporal and spatial aggregation was done using the package gridtools (https:// github.com/esa-esdl/gridtools), further details are included in Table S1. The total size of the ARDCs is 3.03 TB. We present the RegESDL ARDCs in three categories; time series, descriptive variables and national layers. A comprehensive list of the data with details of the original resolution, interpolation method, spanning time, source and license is in the Supplementary Tables S1-S5.
Time series are mainly data sets from models or satellite products related to vegetation and climate (Supplementary Table S1). We compiled data of gross primary productivity (GPP), evapotranspiration, shortwave radiation, photosynthetically active radiation (PAR) and diffuse PAR from the Breathing Earth System Simulator (Ryu et al., 2011;Jiang and Ryu, 2016;Ryu et al., 2018) describing ecosystem functioning. The selected products from the Moderate Resolution Imaging Spectroradiometer (MODIS) are 8-daily composites of leaf area index, fraction of absorbed photosynthetically active radiation (FPAR), and 16-daily composites of the enhanced vegetation index (EVI) and normalized difference vegetation index (NDVI) which are values of standing vegetation and greenness. Day and night land surface temperature was also obtained from MODIS. Albedo data was acquired from the Quality Assurance for Essential Climate Variables project (http://www.qa4ecv.eu/). Precipitation data was provided from two sources; the Tropical Rainfall Measuring Mission (TRMM, https://trmm. gsfc.nasa.gov/) and The Climate Hazards Group Infrared Precipitation with Stations (Funk et al., 2015). Time series with different temporal resolutions were also included such as the annual land cover maps from the European Space Agency (ESA) (ESA, 2017), annual vegetation cover fraction from MODIS, monthly annual averages of cloud coverage (Wilson and Jetz, 2016) and monthly fire data from MODIS. We also incorporated quality flags from MODIS products. The quality flag criteria was implemented on the original files (i.e., sinusoidal projection) and it is documented in Supplementary Table S2. After defining the pixels with acceptable quality, data was Frontiers in Earth Science | www.frontiersin.org July 2021 | Volume 9 | Article 613395 reprojected to WGS84. For detailed information of all data products refer to the original documentation, references are included in the supplementary and bibliography. Another important element relates to descriptive ecosystem variables. This category collects global datasets without a time dimension and describes some ecosystem features (Supplementary Table S3). There are 45 layers in total; canopy height (Simard et al., 2011), 11 variables of the Harmonized World Soil Database (FAO and ISRIC, 2012) and 23 of Soil Grids (Hengl et al., 2014) associated with soil composition and chemical properties at different strata. These ecosystem variables reflect a specific ecosystem state which is of importance for characterization, but they lack on offering a dynamic perspective. Alternative products bring the possibility for investigating soil water dynamics (e.g., soil moisture), however the coarse spatial resolution constrains its current use in our study.
Because the RegESDL also aimed to support the Colombian Biodiversity Observation Network (BON) geographical layers of Colombia were included (Supplementary Table S4). These data layers have been ingested from governmental web portals or from scientific publications. Layers in vector format were transformed to the target grid. Borders of administrative units (IGAC, 2010) and national natural parks (PNN, 2015) were included. Maps of wetlands (Flórez et al., 2016), agriculture frontier (MADR-UPRA, 2017) and biotic units (Londoño et al., 2017) were also added for further ecological analyses, as well as comparative interpolations of mean annual precipitation (Álvarez-Villa et al., 2011).
The RegESDL can be operated through different ways. First, the RegESDL can be accessed locally using Julia. In this case, the loaded datasets are exclusively the ones required for the analysis. Second, the RegESDL can be completely downloaded to any local machine using Python. Explanatory scripts of how to access the RegESDL are included in the supplementary and at http://doi. org/10.5281/zenodo.5068004.
It is important to highlight that datasets described previously are the first core of the RegESDL. Nevertheless, the RegESDL is prepared for interacting with new data sources and can be easily expanded by users ingesting their own data as NetCDF or CSV files using the ESDL.jl package. The imported data layers must share the same grid extent to warranty Interoperability among multiple datasets. Vector files i.e., shapefiles are also supported, they can be loaded and converted to a grid for further processing in the ESDL environment.

The Regional Earth System Data Lab Coverage
The RegESDL focuses on tropical ecosystems of northern South America. It covers Colombia, Ecuador, Venezuela, and partially Brasil, Bolivia, Panama and Peru. From now on we refer as regional scale to the area covered by the RegESDL, and local scale to the country level and finer geographical units. We selected this region because it is facing a rapid ecosystem transformation due to land use change and urgently needs tools that help to understand ecosystem dynamics, contributing to fill a knowledge gap in the countries that encompass this region. Having a ready RegESDL to facilitate big-data analytics for ecosystem function is key on understanding one of the most diverse regions of the world. This region is dominated by extensive natural tropical ecosystems, and multiplicity of climates related to topographic gradients, trade winds, and the meridional migration of the Intertropical Convergence Zone that drives annual variability. The highest altitude is in the peruvian Andes in Cordillera Blanca with more than 6,500 masl. Dominant land cover types are broadleaved evergreen forest open or close > 15% (4,740,043 km 2 ), grassland (474,959 km 2 ) and shrubland (266,967 km 2 ) according to the ESA land cover classification (ESA, 2017) (Figure 2). Regional climate interaction with the Andes creates a variety of microclimates that cause two major hydrological regimes; unimodal and bimodal seasonal dynamics in many hydrometeorological processes. In general, all feedbacks in the water cycle are governed by complex lateral interactions across the Amazon and the Andes, but also locally caused by precipitation recycling due to orographic gradients (Poveda et al., 2006;Bedoya-Soto et al., 2019;Espinoza et al., 2020). The mean annual precipitation records in the RegESDL area range from less than 100 mm in the Peruvian coast, upper limit with Atacama desert, to more than 11,000 mm in the Choco region of Colombia, which is perhaps the rainiest region on Earth (Poveda and Mesa, 2000;Yepes et al., 2019). The diurnal cycle of temperatures is the most salient feature of tropical climatology. Variations in temperature are often larger within a day than throughout the year (Hastenrath, 1991), with strong effects on the diurnal cycle of precipitation in the tropical Andes (Poveda et al., 2005). The region currently faces increasing rates of deforestation and land cover change.
In the following, we emphasize the territory of Colombia which is considered the third most biodiverse country and a hotspot for biological conservation (Myers et al., 2000;Andrade, 2011). Currently, Colombia is facing a massive transformation of natural ecosystems due to various socioeconomic transitions (Baptiste et al., 2017;Sierra et al., 2017;Salazar et al., 2018). Deforestation is now reaching national protected areas (Armenteras et al., 2019b;Clerici et al., 2020), and the agricultural frontier is also expanding to other natural ecosystems (Miles et al., 2006;Etter et al., 2008;Bianchi and Haig, 2013;Patino and Estupinan-Suarez, 2016;Correa Ayram et al., 2020). The RegESDL incorporated national layers to facilitate analysis at this scale with a special focus on biodiversity and ecosystems research (see section 2.2). Moreover, some variables were selected to provide seasonal and longer-term information to Biomodelos (http:// biomodelos.humboldt.org.co/es) aiming to get a more dynamic perspective of species distribution models. For this particular reason they both share the same grid extent. The RegESDL also aimed to contribute to the development of Essential Biodiversity Variables (EBVs) in mega-diverse tropical countries. In this case we thought on a top-down approach for biodiversity monitoring. The Colombia BON, one of our partners, has done an extensive development and implementation on the topic.

Case Study Question
In the following, we showcase the potential of using the RegESDL for assessing seasonality in northern South America. Here, we investigated the joint variability of multiple time series, and contextualized the identified patterns in different spaces (i.e., climate space, geographical space, along biodiversity gradients). This analysis had two purposes. One was to carry out a seasonality analysis pixel wise for the region. The other one emphasized on the Colombian territory and its biotic units. A schematic of the RegESDL and a workflow overview is in Figure 3.

Seasonality in Northern South America
We focused on seasonal phenology, one of the fundamental dynamics of most ecosystems of the world (Schwartz, 1998). In tropical ecosystems, however, seasonal dynamics are hardly understood and not well characterized (Wu et al., 2016). On the one hand, this is due to data quality issues, but on the other hand it reflects the fact that the tropics embrace extremely heterogeneous landscapes. For northern South America, a large level of annual variability in phenological variables has been reported, particularly for savannas, dry forest and wetlands . Subannual data acquired from tree ring cores (Giraldo et al., 2020), and from space via solar induced fluorescence (SIF) (van Schaik et al., 2018) recently also suggested annual seasonality in tropical rainforest. From the functional perspective, Nemani et al. (2003) reported radiation as a limiting factor of plant growth and net primary productivity in the tropics that likewise hints at some seasonality. What remains unclear from the literature is: i) how strongly are seasonal patterns reflected by terrestrial ecosystem dynamics, ii) whether we can delineate unimodal or bimodal regions, and iii) how do patterns of annual and semi-annual seasonality distribute spatially. If we understand where unimodal and bimodal seasonailities predominate in land-atmosphere interactions, we could achieve a better predictive understanding of the imprints of extreme climate events such as ENSO, and climate change signals on ecosystems.
Terrestrial seasonal dynamics should be contained in all remote sensing indicators related to green biomass and primary production. Available data sets in the RegESDL are partly direct remote sensing vegetation indices of greenness, i.e., NDVI (Tucker and Sellers, 1986) and EVI (Huete et al., 1997(Huete et al., , 2002. But also, we can analyze GPP and FPAR that are closely related to vegetation activity. Conceptually, these variables represent different processes which are related and physiologically connected, but they are not exchangeable. Of course, the closer to the actual process under interest (e.g., GPP), the more Frontiers in Earth Science | www.frontiersin.org July 2021 | Volume 9 | Article 613395 7 model assumptions are contained in the data (Jiang and Ryu, 2016). The advantage of the original remote sensing signals is that they are closer to the purely observational signal, but not necessarily directly related to the process of interest. Yet, we can assume that all these time series somehow reflect the seasonal cycle of vegetation productivity, yet coupled via more or less direct mechanisms that all reflect seasonal dynamics of green vegetation.

Characterization of the Mean Seasonal Cycle Pixel Wise
In order to capture the seasonal variability of vegetation while accounting for the redundancy of the different vegetation related signals, we conducted a principal components analysis (PCA). PCA is a dimensionality reduction method that seeks new dimensions (components) in the feature space to explain the largest variance, and does it recursively based on orthogonal basis functions. In this sense, it provides common modes of variability at the pixel level that serve to assess seasonality. For our analysis, we selected GPP, NDVI, EVI and FPAR variables spanning from 2001 to 2014 (data set size 111.24 GB). Variables were gap filled using the Mean Seasonal Cycle (MSC) method. Gaps were clustered in rivers and waterbodies and its amount varied among variables; while GPP excluded all data pixels related to water, MODIS products preserve or exclude them irregularly. The next step was to standardize the time series to mean zero and variance of one ( μ 0 and δ 1), and then applied PCA at pixel level. Explained variances by the leading components described how representative the leading mode is for the different vegetation variables.
Based on the Fast Fourier power spectrum, we estimated the contribution of annual and semiannual oscillations of the MSC pixel wise using the discrete Fast Fourier transform library (FFTW) included in the Julia programming language. Here, the idea is that the ratio between the spectral power of the annual and semiannual oscillations can be quantified if the annual or semiannual oscillations dominate the seasonality of the signal. Theoretically, values of <1 indicate a dominance of the semiannual oscillations, values close to 1 are related to an equal contribution of seasonal and semi-annual modes, while values of >1 reflect a dominance of the annual oscillation. We classified the FFTW outcomes as follows: The first component is the sinusoidal function offset, the second component corresponds to the annual oscillation, whereas the third and fourth components together (6 and 4 months respectively) correspond to the semiannual oscillation. Subsequently, we computed the contribution of both annual and semiannual oscillation to the entire signal. Only pixels with at least one direct retrieval at each MSC time step were included in the analysis. The quantification of direct retrievals was based on the quality flags from MODIS variables (Supplementary Table S2). The computation is documented in Jupyter notebooks included as supplementary material and at the GitHub repository https://github.com/linamaes/Regional_ESDL.

Seasonality Characterization of Biotic Units in Colombia
For our second study, we used the biotic units map of Colombia produced by Londoño et al. (2017) as a level of aggregation, aiming to bring a functional understanding of units with biological diversity connotation. The delineation of these biotic units was based on a Beta diversity criteria defined as the taxonomic composition variation (Supplementary Figure S1). Each unit was defined by having a unique set of species that was significantly different to the species set of all other units. Beta diversity was computed from species distribution models of amphibians, birds, mammals, reptiles and plants available in Biomodelos, a collaborative platform that integrates models and expert knowledge (Olaya-Rodríguez et al., 2018;Velásquez-Tibatá et al., 2019). Our approach looks at biodiversity based on the hierarchical concept developed by Noss (1990), and seeks a functional perspective including different taxa similar to Radeloff et al. (2019). It is not related to spectral diversity for ecosystem function assessment.
We calculated the MSC of biotic units using the first principal components (PC) obtained from the pixel wise analysis. First, we computed the average and standard deviation of all pixels within each unit. Then, we calculated the ratio and fraction values of the averaged MSC based on the Fast Fourier power spectrum as we explained in section 2.4.2. We also analyzed the biotic units outcomes in light of climatic variability. For this, we selected variables that describe annual bioclimatic conditions such as precipitation of the driest month, maximum temperature of the warmest month, mean temperature of the driest quarter from WorldClim (Fick and Hijmans, 2017), and median annual cloud frequency (Wilson and Jetz, 2016). These variables were used in the climate space and were related to the MSC ratio by biotic units.

Summarizing Multivariate Vegetation Dynamics
The first PCs from the PCA captured the largest vegetation variability pixelwise (Figure 4). When exploring pixels from different land cover types, we found overall that PC1 captured the main MSC features of each variable (Supplementary Figure  S2). In general, for broadleaf evergreen trees and grassland the variables' trajectories and peaks overlapped most of the time, although the signal amplitude differed. The shrubs-herbaceous flooded cover was the one presenting the most contrasting trajectories between variables. The main contributors to PC1 are different along the regions (Supplementary Figure S3), NDVI contributed the most in arid and semiarid regions such as the Caribbean and Orinoquia savannas and at the Pacific coast of Ecuador and Peru, whereas EVI is the larger contributor in the Amazon and Biographic Choco i.e., the wettest regions. Some pixels were excluded from the analysis due to data quality. They are mainly located along the Pacific coast, in the higher Andenean mountains, and in the transition between the mountains and lowlands. We observed different regional patterns of variance explained by the three principal components. PC1 ( Figure 4A) explains the largest variance in ecosystems mainly dominated by savannas in the Orinoco and Caribbean basins of Colombia and Venezuela, and lowlands of the Magdalena-Cauca river basin. Similar pattern occurs in the Pacific coast of Ecuador and Peru. These regions are characterized by very low precipitation during the dry season and are dominated by grassland. PC2 ( Figure 4B) shows a homogeneous spatial pattern, being slightly lower in arid and semiarid ecosystems. Otherwise, PC3 ( Figure 4C) dominates the broadleaf evergreen forest. The explained variance by the PCs range from; 0.28 to 0.86 for PC1, 0.07-0.31 for PC2, and 0.03-0.24 for PC3. These shows that variance PC2 and PC3 can carry similar amount of variance in some regions. Overall, Figure 5 shows that PC1 explains more than 40% of variance in grassland, shrublands and herbaceous cover with shrubs and trees. But it is certainly limited in broadleaf evergreen forest where captures between 20% and 40%, which is also a region known by large data gaps due to clouds.

Quantification of Temporal Dynamics
The map of seasonality ratios ( Figure 6) depicts a spatially heterogeneous dominance of annual variability in the region that can be associated to the major landscapes. Annual cycles dominate the land-surface dynamics along the Ecuadorian and Peruvian coast and northern Venezuela. Large-scale patches of semi-annual oscillations are found in the flooded savannas of Orinoquia, and inter-Andean valleys in Colombia. Similar contributions of annual and semi-annual oscillations are observed partly in the Caribbean coast and foothills. The northwestern Amazon basin does not have a uniform pattern, although it shows a slight trend with dominance of bimodality close to the Equator (the wettest region), and unimodality toward South. Yet, it is hard to determine due to the optical sensors limitations in the rainforest. However, care must be taken in interpreting this figure, taking into the account the variability represented from the total signals ( Figure 4) and the amplitude of the oscillations. In this regard, we estimated the fraction between both assessed oscillations (i.e., annual and semiannual) and the entire signal based on the power spectrum of Fast Fourier. These values are found in Figures 6, 7.

Seasonal Dynamics of Colombian Biotic Units
In order to understand the spatial variability of vegetation seasonality and its links to biodiversity we used the biotic units of Colombia. We observe that the units with higher amplitude are also the ones where annual and semiannual variability have a fraction value of >0.7 (Figure 7). Overall, we observe that seasonality in the biotic units is extremely different; peaks of vegetation activity are reached at different times across Colombia, unimodal and bimodal seasonality are equally important at the national level, and regions with lower variability explained by these regimes have to be further explored. These could be associated with the dominance of fast oscillations as reported by Linscheid et al. (2020) for some tropical regions.
We used bioclimatic variables as a first proxy to understand seasonal variability. We selected precipitation of the driest month, maximum temperature of the warmest month, mean temperature of the driest quarter (Fick and Hijmans, 2017), and median annual cloud frequency Wilson and Jetz (2016). As an overview, six biotic units with contrasting patterns are presented. We observe that the lowest values of monthly annual precipitation differ significantly within biotic units. In Arauca and Baja Guajira-Cesar values are less than 30 mm ( Figures 8A,D) whereas in Micay values are above 300 mm on average ( Figure 8F). Interestingly, in the region Magdalena-Medio & Depresion Momposina, there are two clear hotspots ( Figure 8E). Pixels with precipitation in the driest month of >50 mm show a bimodal dynamics in vegetation dynamics, while pixels with drier conditions show a higher importance of annual oscillations. This could be associated with the distribution of  Figure  S4). It is important to highlight that in Colombia, the largest temperature variability is mostly found on a diurnal basis than along the year (Hastenrath, 1991). Due to the orographic conditions of our study area, clouds are a major limitation for passive sensors, however recording these conditions is also informative in an ecological context. Biotic units with a strong dry season are the ones with lower values of annual cloud frequency. Nevertheless, they either show a centralized pattern as Baja Guajira or a large spectrum as Magdalena Medio, similar to the one observed in Patia (Supplementary Figure S5). Micay exposes high cloud cover as it is expected due to its location in the biogeographic Choco region, one of the wettest places on Earth (Poveda and Mesa, 2000;Yepes et al., 2019).
As we have shown in this case study, the seasonal land-surface dynamics in northern South America is surprisingly complex and not trivially explainable by climatic factors alone. Our results are a step forward to reveal interactions between biotic and abiotic components in tropical ecosystems. From a technical perspective, we show that multi-dimensional analysis in any dimension i.e., along physical variables, time, and space, can be performed very efficiently with a few lines of code. The code (available at http://doi.org/10.5281/zenodo.5068004) can be applied to any other data cube, e.g., the ones that are now emerging from the Coupled Model Intercomparison Project Phase 6 (CMIP6) archive and used to understand if patterns as reported here, are similarly identifiable in such global simulations. If they were not, this would suggest that one of the most basic processes of land-surface dynamics in state-of-theart models is not well represented yet.

Computational Performance of the Regional Earth System Data Lab
During the multivariate vegetation analysis, we used a set of four variables (i.e., GPP, NDVI, EVI, FPAR) available as ARDCs at the RegESDL with a total size of 111.24 GB. The estimated time processing for the main steps was: 27 min for the time series standardization, 2.5 min for PCA and 2.8 min for computing the Fast Fourier spectrum. All these computations were carried out at the pixel level using the RegESDL for time series analysis. Overall, these estimates are broad because timing for data loading and processing are hardly discernible, furthermore speed is also affected by other features as parallel processing in other cores. Nevertheless, these timing values bring a general picture of the Frontiers in Earth Science | www.frontiersin.org July 2021 | Volume 9 | Article 613395 computational performance. A comparison between the spatial and temporal RegESDL computational performance, for time series analysis, is in the supplementary ( Supplementary Table S6) and the code is available at http://doi.org/10.5281/zenodo.5068004.

DISCUSSION
Open data and open source software can be seen as the major triggers for recent advances in environmental and Earth system sciences. A remarkable example are the achievements after the Landsat archive was opened for ecosystem monitoring, assessment of climate impacts, among others (Wulder et al., 2016). Moreover, data streams are continuously improved; spatio-temporal and radiometric resolutions increased with the development of new satellites and sensors (e.g., the Sentinel constellation). This increasing data availability also raises enormous challenges for data management and ways to interact with them. Data cubes have been a common solution for tackling this problem bridging the code to the data, and therefore simplifying data storage and processing (Lewis et al., 2017;Giuliani et al., 2019;Gomes et al., 2020;Tamiminia et al., 2020;Yao et al., 2020). Nevertheless, most initiatives prioritize spatial grids for data storing (Gorelick et al., 2017;ODC, 2021), and work with specific data sets such as climate or satellite products exclusively. In comparison, our approach considers that all data dimensions are equally important whether they are space, time, variables, or frequency. This has two main advantages: i) Facilitate access to other axes different from latitude and longitude. ii) Implement the cube structure to outputs as is for inputs. With the ESDL package new-axes can be easily added to cubes during processing. Users have full functionality for defining new-axes of a "hypercube", assign their name and type class. In addition, the RegESDL links data across disciplines offering a multidimensional perspective for environmental sciences. Thus, we are capable of analyzing multiple data sources efficiently at spatial and temporal dimensions which is key for tropical regions facing rapid transformation. Trees-flood. sa: Tree cover, flooded, saline water. Note that the y-axis range varies from 0 to 30 to 800 mm, and the color scheme range differs between plots. "n" is the total number of pixels in each biotic unit.
Frontiers in Earth Science | www.frontiersin.org July 2021 | Volume 9 | Article 613395 4.1 Perspectives of the Regional Earth System Data Lab The RegESDL started as an initiative for supporting analysis that integrate Earth science and biodiversity within a tropical geographical range. The main objectives were to get a dynamic perspective of environmental conditions for biodiversity, to contribute to the understanding of ecosystems functioning and to explore atmosphere-biosphere interactions. Until now, studies have focused mostly on evaluating land cover and ecosystems structure for the region. This has been pivotal for the assessment of deforestation and in some extent forest degradation (Armenteras et al., 2016;Pacheco-Pascagaza et al., 2018;Ramírez-Delgado et al., 2018;Meyer et al., 2019;Anaya et al., 2020). As well, different passive and active sensors have been used for ecosystems delineation (Estupinan-Suarez et al., 2015;Flórez et al., 2016), estimation of vegetation biomass and extraction of canopy features from individual trees and forest using Lidar (Asner et al., 2012;Li et al., 2012;Ferraz et al., 2016;Jeronimo et al., 2018;Ferraz et al., 2020). However, fewer studies are carried out for investigating ecosystem function even though global products such as GPP are available from several sources i.e., GPP estimates derived from satellite retrievals (Running et al., 2004), coupled to process-based models (Ryu et al., 2011;Jiang and Ryu, 2016), and data driven methods (Jung et al., 2011(Jung et al., , 2020. This can be partially related to the large uncertainties regarding tropical ecosystems due to atmospheric conditions and limited ground data for models calibration and validation. But also to the early state of Earth science development in the region. The growing availability of new retrievals from upcoming satellite missions and advanced mathematical methods offer new information and alleviate some gaps regarding data accuracy and quality. Promising variables such as SIF open a possibility to improve estimates of GPP and phenological changes (Porcar-Castell et al., 2014;Walther et al., 2016;Sun et al., 2017;Merrick et al., 2019). Recently launched and upcoming satellites missions will deliver hyperspectral and lidar data globally that will facilitate structure and functional biodiversity assessment. Satellite missions such as the DLR Earth Sensing Imaging Spectrometer, the Global Ecosystem Dynamics Investigation (GEDI) or the Surface Biology program will provide key information for evaluating vegetation stress, vegetation traits, and improving carbon and water fluxes estimation. Thus, tools and long-term projects are needed to warrant timely and efficient access to these new data streams. In this sense, the ESDL software offers a suitable framework to address the technical developments required by the unprecedented volume of coming datasets, and most importantly for integrating different suites across research disciplines. Recently, Cremer et al. (2020) implemented the ESDL for higher resolution analysis, specifically for assessing Sentinel-1.
In fact, the RegESDL is an example of a regional effort for offering a common ground to assess Earth system science, ecosystem function, and explore links to biodiversity that could be realized in practice. Specifically during our showcase we presented different modes of seasonality in the region. Studies of bimodal precipitation regimes have been carried out globally (Knoben et al., 2019) and regionally in East and Central Africa, and northern South America (Poveda et al., 2006;Hawinkel et al., 2015). Nevertheless, such patterns are poorly explored at the ecosystem level. In East and Central Africa double peaks of vegetation greenness were observed by Hawinkel et al. (2015) at annual scales. Recently, Turner et al. (2020) reported bimodality in vegetation activity for savannas in California based on SIF measurements that were previously jeopardized when using vegetation indices. Our outcomes revealed that these double peaks on vegetation activity are also occurring in regions of northern South America. But further analysis are needed to understand the mechanistic process of bimodality in the region. For example, it is expected to observe different timing for leaf flushing in savannas. Hypothetically, this might be driven by water pulses and different vegetation strata. Otherwise, studies using passive sensors are limited in some regions despite of having more than 10 years data. In this context, data integration from upcoming missions with higher spatial and multispectral resolution and active sensors is key. Moreover, the biotic units analysis showed that the dominance and intensity of seasonality in the tropics is very heterogeneous, and it requires deeper analysis on the drivers that are not covered in this paper.
Future studies in the RegESLD aim to exploit time series analysis to evaluate tropical ecosystems dynamics at different time scales. Also, a next step to increase our understanding of how biodiversity is related to ecosystem function is to integrate species distribution data, from platforms such as the Global Biodiversity Facility (GBIF), with data from the RegESDL. This will allow the scientific community to understand how the spatial distribution of specific taxa (not necessarily plants) could respond not only to abiotic variables but also to dynamics of the land surface. Moreover, this can be a benchmark to predict patterns of species migrations by global warming.

Comparison with Alternative Regional Projects
Colombia has done a major effort to develop strategies for big data generation and management. The National Institute of Environmental Studies and Meteorology has released the latest Climate Forecast System Reanalysis models for the region from hourly to daily temporal resolution, as well as national climate change scenarios and analyses of extreme events thresholds (Ruiz M.et al., 2020) (http://bart.ideam.gov.co/wrfideam/). Another example is the development of the Colombian ODC Bravo et al., 2017) to reduce redundancy in satellite imagery acquisition, pre-processing and storage. Simultaneously, the scientific community is taking advantage of GEE to assess rapid socio ecological challenges such as deforestation using satellite imagery or products derived from passive (Clerici et al., 2020) and active sensors (Anaya et al., 2020). From the biodiversity perspective, Colombia is contributing significantly in an international context; it is the only country with a National BON in the tropics, its biodiversity information system (SIB Colombia, https://sibcolombia.net/) contributes to the GBIF, and since 2017 different projects are supporting the assessment and implementation of EBVs; specifically for improving species distribution model platforms (Velásquez-Tibatá et al., 2019), supporting biodiversity decision making (http://biotablero.humboldt.org.co/, http://www. bosproject.org/en/), and participates in the calibration of new satellites missions such as GEDI in tropical dry, moist and rain forest (Fagua et al., 2019), TRMM and the Global Precipitation Measurement (Vallejo-Bernal et al., 2020). All of these efforts point to an urgent necessity for developing a common framework that improves data access and management and facilitates ecosystem assessments with a more dynamic and functional perspective, but also for evaluating the rapid natural and anthropogenic changes that are occurring. The RegESDL aimed to bring the initial foundations for these developments, integrating data from biospheric, and atmospheric variables with biodiversity.

Technical Challenges for the Future
One of the major challenges of all data cube facilities is that data is constantly being updated. Not adopting a "living data paradigm" is one of the major obstacles that reduces the user-update of project-based infrastructures. Also in our case, no matter how convincing the concept and implementation might be, it will not persist as a stand-alone implementation next to a machinery that can update the underlying data archives in near-real time. This is why we hope that the analytic framework as it was developed here can soon be coupled to data cube facilities that solve the underlying data availability issue. In fact, in Europe, for instance, the Data and Information Access Services initiatives are about to realize such an approach such that initiatives like the RegESDL can be placed on top of them. The difficulty today remains that the cube type of data access along all dimensions remains often not well addressed and that the idea is rarely to give the full spectrum of functionalities to the users as ESDL does. This is to enable users to map arbitrary functions. Otherwise, initiatives centered on specific satellites products have opted for automated ingestion based on rapid data acquisition from platforms as GEE (Giuliani et al., 2017). However, this alternative is hard to extrapolate when working with multiple data sources.
We believe that in the future, the ESDL software should not merely be a facility that hosts predefined ARDCs but become a service to generate them tailored to individual user needs. These user-defined cubes should be able to ingest arbitrary gridded and non-gridded data. Particularly, higher resolution datasets are requested when moving from regional to local studies or for in-situ data up scaling. Here, accessing high-resolution and latest global data products e.g., SIF from various sources on demand and in tandem with other existing data suites is key. In this sense, ARDCs may be static and pre-processed, transient, automatically updated, or lazy, whatever suits best its intended usage. This means that the ESDL needs to serve data streams from multiple sources, and therefore handle multi-resolution data within the same framework.
Only if we can convince the major infrastructure providers of our concept, we soon will address the most pressing data analytic bottlenecks that remain open: That is, the need to further generalize spatio-temporal data analytics. The current methods implemented in the ESDL are highly efficient in exploiting highdimensional time series and maps, but lack one fundamental requirement: Spatio-temporal interactions and spatio-temporal contextualized data analytics are essentially not possible and limit our capacity to study e.g., telecouplings at large scales or lateral transport processes. This is of particular relevance, when aiming to simulate e.g., water transport in space and time via e.g., surface hydrology or atmospheric interactions. Another challenge for the future is certainly brining in latest advances in machine learning i.e., Deep Learning (DL), as one of the most relevant and rapidly developing fields. In principle this is possible already today, but whenever a DL model requires dealing with e.g., structured data (either spatial, temporal or spatio-temporal as in the Earth sciences), the cube-slicing approach is suboptimal and needs to be redesigned to efficiently map DL methods on the cube. This is particularly promising for understanding e.g., biotic dynamics in the tropics that are often not well constrained by our ecohydrological understanding.
Another major challenge is Interoperability. The exponential data generation and advancements in computer and software technology reinforced the urgency of data exchange between research infrastructure systems. This has to be accompanied by common metadata conventions, open algorithms and software documentation (Kissling et al., 2015;Wilkinson et al., 2016;Hardisty et al., 2019). For Earth observation data cubes, Giuliani et al. (2019) defined two tiers to approach this issue: an upstream tier which is the ODC infrastructure, and a downstream tier which is centered on the user's interaction. Here, we focused on data ingestion from several data sources which was crucial for multivariate analysis. With this approach we achieved to integrate different environmental layers from the biosphere and atmosphere. The usage of Zarr format for data storage, that is in the process of becoming the standard for the Open Geospatial Consortium (OGC, 2020), facilitates exchange with other geo initiatives such as the CMIP6 model. As well, the implementation of downstream software as xarray opens the possibility to develop backends with existing data cubes. We followed the Climate and Forecast metadata conventions (http://cfconventions.org/) that supports properties of the data. What is still remaining is the possibility of working with multi-resolution data, which is one of the following steps. These characteristics are favorable to Interoperability, but this is indeed a topic on constant development by the community.

Engaging the User Community
A long-term sustainability of the RegESDL requires an active users community. On one hand, the ESDL software is open, documented and its main features, i.e., the cube generation and analytical tools, can operate independently which brings versatility for forward software development and applications. On the other hand, consolidating such a community requires active dissemination and training to operate the RegESDL efficiently, besides constant technical support. Having a consolidated community is a key step for warranting the establishment and development of our tool. We see as potential users research groups and universities that are interested in the spatio-temporal understanding of the biosphere-atmosphere interactions, implications of climate variability, land cover change, and biodiversity loss in tropical ecosystems.
Moreover, embracing big data initiatives, such ours, is also beneficial for supporting environmental studies that support environmental policy. For example, the Colombian government has appointed a roadmap for a strong bioeconomy development to the International Mission of Experts in Science, Technology and Innovation (https:// minciencias.gov.co/mision-sabios/documentos), but there are many unknowns about ecosystems dynamics and functioning nation wise that need to be solved for creating such a plan. Studies bringing new insights for informing the sustainable goals, or the Intergovernmental Panel on Climate Change as well as research for understanding ecosystems functioning and ecosystem resilience under climate change scenarios could be efficiently carried out in the RegESDL. In this sense, our initiative could contribute to existing platforms that provide scientific input for decision making at local level.

CONCLUSION
To our knowledge the ESDL is the first data cube implementation with an emphasis on representing interactions across the water cycle, carbon cycle, and climate system . It has been successfully used to understand biosphere-atmosphere interactions at multiple time-scales (Linscheid et al., 2020), analyzing specific variables of ecosystems to climate extremes (Flach et al., 2020), and has enabled studying the multivariate nature of land-surface dynamics globally (Kraemer et al., 2020). The RegESDL has been developed to more specifically explore biodiversity as yet another thematic domain. Our goal was bridging the gap between Earth science and biological diversity that includes ecosystems composition, structure and function (Noss, 1990;Randin et al., 2020).
In this regard, the satellite era has brought us the possibility of gaining ecosystems dynamics understanding from systematic measurements over time and at larger spatial scales. Advances in the EBVs (Pereira et al., 2013), functional traits and functional ecosystem properties (Reichstein et al., 2014;Musavi et al., 2015) are supporting this development. And it is increasingly recognized that one has to consider the functional dimension of biodiversity in its own right, which requires working with high temporal resolutions. In particular, remote sensing observations are of uttermost importance for the analysis of the EBVs (Skidmore et al., 2015;Pettorelli et al., 2016;Giuliani et al., 2020;Randin et al., 2020). Recently, the EBVs for Species Population started to develop a framework for working with space-time-species cubes and proposed it as a suitable model for the oncoming challenges of big data (Jetz et al., 2019) (European BON http://biodiversity.eubon.eu/essential-biodiversityvariables). As well, Randin et al. (2020) provided a detailed review of possible synergies between remote sensing products and biophysical process variables that are key for species distribution models. Furthermore, Hardisty et al. (2019) developed a framework for Interoperability between research infrastructures related to EBVs. Hence, the successful implementation of the RegESDL for studying ecosystem dynamics in northern South America can be seen as a step forward on the development of EBVs related to function and structure in tropical ecosystems.
Specifically on our case study, we show the variety of seasonal trajectories in northern South America. We computed simple metrics to compare semiannual and annual regimes, derived from vegetation variables of productivity, greenness and standing vegetation available at the RegESDL. Nevertheless, our seasonality analysis approach relies on passive sensors which are limited in rainy regions despite of having long time series. This reinforced the necessity of synergies between multiple sensors and data sources. Using the biotic units map of Colombia, we observed heterogeneity on the seasonality modes and the timing of maximum activity among units that are characterized by Beta diversity values. Our findings contribute to elucidate the large seasonal variability in hotspots of biodiversity which is key for differentiated management of tropical ecosystems. Complementary analysis in light of local climate variability needs to be further investigated to identify the main drivers.

DATA AVAILABILITY STATEMENT
The RegESDL can be accessed using Julia or Python. The dataset is available for free download using Python. Complementary, data can also be accessed locally using Julia. In this case, only data used in the analysis is loaded. Explanatory scripts of how to access the RegESDL, and how to reproduce the main analysis and figures are available in the supplementary and at http://doi.org/10.5281/ zenodo.5068004. Supplementary Table S7 lists all the scripts and their location. The most up-to-date code can be found at https:// github.com/linamaes/Regional_ESDL.

AUTHOR CONTRIBUTIONS
LE-S performed all analyses. The study was designed by LE-S and MM. LE-S and MM wrote the paper with substantial input from all other authors. FG, UW, and LE-S acquired and pre-processed the data. FG and LE-S worked in the RegESDL cube generation.