DATA REPORT article
Front. Earth Sci.
Sec. Hydrosphere
Volume 13 - 2025 | doi: 10.3389/feart.2025.1595943
This article is part of the Research TopicNew Challenges for Baltic Sea Earth System ResearchView all 3 articles
Baltic Hydro-Climatic Data: A regional data synthesis for the Baltic Sea Drainage Basin
Provisionally accepted- 1Department of Physical Geography, Faculty of Science, Stockholm University, Stockholm, Stockholm, Sweden
- 2Department of Sustainable Development, Sustainability Science and Engineering, School of Architecture and the Built Environment, Royal Institute of Technology, Stockholm, Stockholm, Sweden
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Understanding and predicting water flows and storage changes on land is crucial for addressing waterrelated scientific and practical challenges across disciplines and geographical regions and scales. The degree to which various hydrological, environmental, ecological, biogeochemical, geological, atmospheric, and climate sciences consistently and realistically capture and represent analogous parts of the terrestrial water system and its spatio-temporal dynamics remains largely unknown and uncertain. Coherently addressing the terrestrial water and related knowledge gaps is essential for tackling the key scientific and practical questions (Zarei & Destouni, 2024a), for which we need to decipher the complex interactions between the water system, the climate system (Gudmundsson et al., 2017), and the land and water uses of various human activities in the landscape (Althoff & Destouni, 2023). The increasing availability of regional to global hydro-climatic data has expanded access to key water variables. However, purely observational datasets that can fully represent entire catchments and their total water balance closure still remain highly limited. Direct measurements of the lateral (horizontal) runoff water fluxes (R) through the landscape are particularly limited and especially needed for catchment-wise water balance closure; such measurements are available through stream discharge monitoring that integrates total runoff across a whole catchment and yields catchmentaverage R by division of the measured discharge with the contributing catchment area. With regard to the vertical water fluxes precipitation (P) and evapotranspiration (ET), precipitation data are available from relatively widespread meteorological monitoring stations, enabling measurement-based interpolation or extrapolation of catchment-average P. In contrast, ET measurements are much scarcer, often requiring model-based estimations due to the lack of sufficient direct flux measurements to cover a whole hydrological catchments. Different datasets are available for addressing the questions and deciphering the interactions involved in the hydrological processes, fluxes, storages, and their changes. Each discipline and sector may then use their preferred dataset, based on some selected combination of ground measurements, satellite observations, and model-based data, to represent, investigate, and predict changes in the terrestrial water system at the scale and world region in focus (Yang et al., 2019;Zhang et al., 2016). Ideally, the different datasets should be consistent, but major discrepancies often emerge (Bring et al., 2015;Ghajarnia et al., 2021;Zarei & Destouni, 2024b), making it essential to distinguish which (if any) specific datasets are realistic, and which datasets diverge considerably and thus are unrealistic and unreliable, and why.Meeting this distinction challenge is key for selecting relevant and effective measures to mitigate and/or adapt to threatening water flux, storage, and quality changes and risks. The national and international measures required within the transboundary Baltic Sea Drainage Basin (BSDB) to protect and restore ecosystem health in the Baltic Sea is an important regional example where this challenge needs to be met. The BSDB includes various national and transboundary hydrological catchments, where changes in the freshwater fluxes from land and the nutrient and pollutant loads these carry to the sea, and the related coastal-marine eutrophication impacts are management concerns with many associated knowledge gaps and uncertainties (Vigouroux and Destouni, 2022). Notably, two-thirds of the BSDB water management districts are delineated according to the Water Framework Directive of the European Commission, meaning that a substantial portion of the hydrological catchments draining into the Baltic Sea fall under international jurisdiction (Nilsson et al., 2004), requiring a coordinated and harmonized approach to address the gaps and uncertainties, and ensure effective management and policy implementation.However, comprehensive harmonized and openly accessible hydro-climatic data with sufficient spatiotemporal coverage at the BSDB scale remain lacking, even though they are essential for relevant understanding and mitigation of the Baltic Sea eutrophication and pollution (Hannerz & Destouni, 2006;Destouni et al., 2017). Such data are needed to decipher and quantify the key drivers affecting the nutrient and pollutant loading from land to the coastal waters, as well as the variations and changes in the freshwater flows, quality, and inputs to the sea across the BSDB. The data are vital for the development, design, and evaluation of national and international management plans for water quality and ecosystem health improvements in the Baltic inland, coastal and marine waters, as well as for needed scientific knowledge advancements on crucial water resource dynamics in the changing regional climate and other environmental and societal conditions.The BSDB covers ~1.7 million km² land, and is influenced by large spatial and temporal (seasonal and long-term) variations in hydro-climatic (e.g., precipitation and snowmelt), water quality, and anthropogenic conditions across the multiple countries that are encompassed within it (Hannerz & Destouni, 2006;Andersson et al., 2015). To meet the challenge of comprehensive open accessibility to relevant, quality-checked and harmonized hydro-climatic data across these diverse conditions, we here present such a regional dataset synthesis, entitled the Baltic Hydro-Climatic Data (BHCD), for 69 main non-overlapping hydrological catchments with continuous data time series availability over the 30-year period 1980-2020 within the BSDB (Figure 1). The Baltic catchments with relevant data availability included in the BHCD collectively cover approximately 722,235 km² of land, accounting for nearly half of the total BSDB land area. This synthesis is derived by extraction of data for the Baltic catchments represented within the Global Hydro-Climatic Data (GHCD) compilation provided by Zarei & Destouni (2024b), and represents a broad range of hydrological and climatic conditions with multi-decadal temporal and widespread spatial BSDB data coverage.Figure 1 goes here. The Baltic catchments with relevant data availability included in the BHCD collectively cover approximately 722,235 km² of land, accounting for nearly half of the total BSDB land area. The selection of catchments in the BHCD -as that globally in the GHCD -was based on strict criteria to ensure comprehensive and harmonized spatiotemporal data coverage, open accessibility, and direct comparability across several comparative datasets included in the synthesis. The main catchment Formatted: English (United States) selection criteria were: (i) a minimum of 300 non-missing monthly runoff values over the period of 30 years for all datasets, (ii) complete dataset consistency, and (iii) non-overlapping spatial coverage. Areas within the BSDB that did not meet these criteria across all datasets were excluded from this synthesis. The non-overlapping requirement aimed to prevent redundancy in data representation and avoid over-representing the same hydrological signals in the aggregated statistics. Additionally, priority was given to selecting the largest possible catchments that met all other inclusion criteria, thereby maximizing the spatial data coverage while maintaining data integrity and consistency.The BHCDas also globally the GHCD -includes four comparative datasets: (i) Obs, which synthesizes in-situ observational data for runoff (R) (Do et al., 2018a;Gudmundsson et al., 2018a) and precipitation (P) (Schneider et al., 2016) and modelled data for the associated average annual evapotranspiration (ET), based on the simple model ET≈P-R assuming negligible average annual water storage change (DS≈0); and, for direct comparison with this simple ET model and DS assumption, (ii) Mixed, which synthesizes the same observational R and P data as Obs but differs in the model used for ET, which for Mixed is the global model GLEAM (Martens et al., 2017;Miralles et al., 2011), based on which the implied water storage change can be calculated as DS=P-ET-R; and corresponding terrestrial water data extracted from the global reanalysis products (iii) GLDAS (Beaudoing & Rodell, 2019;Rodell et al., 2004) and (iv) ERA5 (Hersbach et al., 2017) that each provide a complete set of model-based data for P, ET and R, from which the implied storage change also can be calculated as DS=P-ET-R.By comparing these different datasets, the BHCD enables a comprehensive assessment of important hydro-climatic dataset consistencies, inconsistencies, and uncertainties for the BSDB region. This can support hydrological, coastal-marine, and climate studies, revealing important hydro-climatic and societal relationships, impacts and feedbacks across the catchments included in BHCD. Use of the BHCD can facilitate identification of spatial and temporal patterns, and key data gaps that need to be bridged within the BSDB, and distinction of consistency, realism, and accuracy among the comparative datasets. The BHCD is also a resource for calibration and validation of hydrological, climate, and related coastal-marine models regarding the multiple catchments included in this data synthesis across the Baltic region. The selection of catchments in the BHCD -as that globally in the GHCD -was based on strict criteria to ensure comprehensive and harmonized spatiotemporal data coverage, open accessibility, and direct comparability across several comparative datasets included in the synthesis. The main catchment Formatted: Font:Formatted: Heading 2, Space Before: 0 pt selection criteria were: (i) a minimum of 300 non-missing monthly runoff values over the period of 30 years for all datasets, (ii) complete dataset consistency, and (iii) the largest spatial coverage as possible with non-overlapping catchments. Areas within the BSDB that did not meet these criteria across all datasets were excluded from this synthesis. While hydrological studies often consider nested catchments (i.e., including smaller sub-catchments withinand, as such, partly overlapping withlarger catchments), the requirement (iii) of non-overlapping catchments with tThe largest spatial coverage requirement was used for the BHCD to aimed to prevent redundancy in data representation and avoid over-representing the same hydrological signals for small sub-catchments inside the larger catchments in the aggregated statistics. Additionally, priority was given to selecting the largest possible catchments that met all other inclusion criteria, thereby maximizing the spatial data coverage while maintaining data integrity and consistency.The BHCDas also globally the GHCD -includes four comparative datasets for the same hydro-climatic variables. The datasets are: (i) Obs, which synthesizes in-situ observational data for runoff (Do et al., 2018a;Gudmundsson et al., 2018a) and precipitation (Schneider et al., 2016) and modelled data for the associated average annual evapotranspiration, based on the simple model ET≈P-R assuming negligible average annual water storage change (DS=P-ET-R≈0); and, for direct comparison with this simple ET model and DS assumption, (ii) Mixed, which synthesizes the same observational R and P data as Obs but differs in the model used for ET, which for Mixed is the global model GLEAM (Martens et al., 2017;Miralles et al., 2011), based on which the implied water storage change can be calculated as DS=P-ET-R; and corresponding terrestrial water data extracted from the global reanalysis products (iii) GLDAS (Beaudoing & Rodell, 2019;Rodell et al., 2004) and (iv) ERA5 (Hersbach et al., 2017) that each provide a complete set of model-based data for P, ET and R, from which the implied storage change also can be calculated as DS=P-ET-R.For each of these comparative datasets, tThe BHCD integrates the multiple observational and model-based data for these the main water flux (P, R, ET) and storage-change (DS) variables catchment-wise to ensure consistent and comprehensive variable and spatial and -temporal hydroclimatic coverage between its comparativethe datasets. Main The data sources in Obs and Mixed include observational P data from the "Global Precipitation Climatology Centre (GPCC-V7)" (Schneider et al., 2016) and R data from the "Global Streamflow Indices and Metadata (GSIM)" (Do et al., 2018a;Gudmundsson et al., 2018a). Mixed also includes model-based data for ET and soil moisture (SM) from the "Global Land Evaporation Amsterdam Model (GLEAM)" (Martens et al., 2017;Miralles et al., 2011), which combines satellite observations with its model algorithms. The observational R data from GSIM in Obs and Mixed define the contributing catchments, which are included consistently in all comparative datasets of the BHCD. Additionally, data for air temperature (T) are also included in Obs and Mixed from GHCN-CAMS (Fan & Van den Dool, 2008). In GLDAS and ERA5, the full sets of model-based T, P, R, ET, and SM data are obtained based on global land surface modeling from the "Global Land Data Assimilation System (GLDAS)" (Beaudoing & Rodell, 2019;Rodell et al., 2004), and on global climate modeling from the "ECMWF Reanalysis 5th Generation (ERA5)" (Hersbach et al., 2017), respectively. A main point of including the different comparative datasets for the same hydro-climatic variables in the BHCD is to facilitate assessment of uncertainty ranges, confidence intervals, and sensitivity analysis in further studies that use the data. That is, for the specific research purposes and catchment areas considered in each study, the results and implications of the different datasets in the BHCD can be directly compared, the consistency/divergence and uncertainty ranges between datasets can be determined, and the result/implication dependence on and sensitivity to dataset choices can be assessed.The BHCD uses cCatchment-wise water balance closure (DS=P-ET-R) as is a fundamental key (Lehmann et al., 2022;Berghuijs et al., 2019;Bring et al., 2015) to facilitate investigating further investigation of the terrestrial water system state and the consistency/ or divergence, uncertainty ranges, and realism of the differentthe comparative datasets around the global land area (Lehmann et Formatted: Space Before: 0 pt Formatted: Font: English (United Kingdom) al., 2022; Berghuijs et al., 2019;Bring et al., 2015). While However, otbtaining consistent, highresolution DS data across numerous catchments of various scales around the world still remains a major challenge.In the BHCD, the water balance closure in Obs is assumed to yield negligible storage change (DS ≈ 0), with ET thereby determined as ET ≈ P -R, while the other comparative datasets include calculated DS=P-ET-R based on their available more elaborately modelled ET data. Obtaining consistent, high-resolution DS data across numerous catchments of various scales around the world is generally a major challenge. If For some areas, however, a reliable combination of ground-based and remote sensing observation datas may be available of for DS were widely available, such, it could instead be that ET that wasis alternatively estimated from catchment-wise water balance as ET=P-R-DS (Bhattarai et al., 2019). For such areas, both the independent DS data and the associated calculated ET data can be compared with the corresponding DS and ET data in the BHCD for further consistency/divergence, uncertainty range, and realism assessment across datasets. However, obtaining consistent, high-resolution DS data across numerous catchments of various scales around the world still remains a major challenge.AdditionallyIn the BHCD, the temporal changes in SM (DSM) are also calculated for each dataset, in order to so that further studies using the data can compare compare and check the internal dataset consistency for thein change directions of DS and DSM, as the latter is also a component of the former. That is, DSM is an integral part of and should be expected to change in the same direction as the total water storage change DS in each catchment (Destouni and Verrot, 2014). For Obs, however, in which DS=0 by assumption, comparison with DSM is not meaningful. There is also no related set of ground-measured SM data to consistently include in the Obs dataset.The measured stream discharge data from GSIM define the 69 non-overlapping catchments consistently included in all BHCD datasets. They also determine catchment-average R in the Obs and Mixed datasets and enable ET calculation in Obs (ET = P -R). Global gridded datasets provide the remaining variables, extracted within each catchment's hydrological boundaries (i.e., water divides). Spatial interpolation generated aggregated catchment value, with an area-weighted averaging approach applied to grid cells intersecting catchment boundaries. Data were processed at the finest consistently available temporal resolution (monthly) to produce catchment-average time series. Additionally, annual and long-term averages were derived to support analysis of longer-term change trends.To compare total average DS, as derived from water balance closure, with DSM, we separately calculated DSM for each catchment, expressed in fractional units per year, consistent with the relative area-normalized SM values (e.g., mm/mm). The calculation was based on average SM over a moving 3-year window, with DSM quantified as the change from one 3-year window to the next. The choice of a 3-year window allows for a balanced assessment of inter-annual variations while also capturing longer-term trends in DSM over the 30-year data period, as needed to facilitate meaningful comparisons with the corresponding trends in total DS. The comparison between DS and DSM serves as an independent test of the internal consistency and realism of the DS results implied by each dataset. This comparison is relevant and important because soil moisture and its changes are part of the subsurface water system and are directly hydraulically linked to the groundwater storage dynamics (Destouni & Verrot, 2014). Given that groundwater holds the largest share of liquid freshwater on Earth (Oki & Kanae, 2006) and extends beneath the entire land surface area of each catchment (and the world), the groundwater storage changes commonly dominate total DS compared to the surface water storage changes in lakes, wetlands, and streams that only extend over a tiny fraction of the land surface area.The schematic flowchart in Figure 2 provides an overview of the data processing steps involved in developing the BHCD dataset, summarizing the approach used to create this Baltic multi-catchment and multi-dataset synthesis (Zarei & Destouni, 2025).Figure 2 goes here.The comparative datasets in the BHCD provide time series of monthly and annual average values for each included variable, along with their long-term averages over the 30-year climatological period . Only the Obs dataset does not include monthly ET and DS time series, as its assumptions of ET ≈ P -R and DS ≈ 0 are only physically meaningful when averaged over at least a full year or longer, rather than on a monthly basis. The 1980-2010 period follows the World Meteorological Organization (WMO) recommendation that 30 years of data should be used to represent climatic conditions (World Meteorological Organization, 2017); the BHCD can be further extended as more recent data become available and climatic reference periods change. All variables in the datasets are provided as catchment-average values and, in addition to the catchment-average water flux and total storage-change variables (P, ET, R, DS), the datasets also include: catchment-average T, SM, and DSM data, along with data for DS-implied catchment-average cumulative water level change (CWLC) over the total 30-year period, and the catchment-characteristic long-term aridity index PET/P (where PET is the fully temperature-dependent potential evapotranspiration), and flux partitioning indices of long-term ET/P and R/P.The CWLC data facilitate important checks of realism in the DS implications of the different datasets for the various catchments; only Obs does not include CWLC data, since its DS ≈ 0 assumption inherently implies CWLC = 0, and this specific DS assumption is instead tested by comparison with the DS implications that emerge as realistic from the other datasets. The calculated DS data and their comparison with the corresponding DSM data in the BHCD can be used to reveal possible important internal inconsistencies within a dataset (in terms of different storage change directions implied for DS than observed in DSM), as well as divergent storage change implications and uncertainty ranges between the comparative datasets. Moreover, bBased on the average DS rate (mm/year) obtained from each dataset, the the BHCD provides a calculated corresponding average CWLC example has been calculated as an additional indicator of dataset realism. The CWLC indicator is calculated as the product of average DS and an average porosity example of 0.3 for the geological formations that contain the groundwater beneath the entire land surface area of each catchment. Globally, Zarei and Destouni (2024b) have illustrated the importance of such reality checks, particularly especially for the Mixed and ERA5 datasets, for which CWLC values emerge as clearly unrealistic, with catchmentaverage groundwater level drops or rises by ±100 m and ±50 m, respectively, for some catchments around the global land area; particularly for ERA5, large and unrealistic water level drops (continuous drying) is seen for catchments in the southern hemisphere. Although a typical porosity value of 0.3 is adopted here to estimate CWLC from DS, aActual catchment-scale porositiesy values of course can vary both within and between catchments depending on geological conditions.. The calculated CWLC for a typical porosity value of 0.3 in the BHCD is meant asthus is not a specific hydrogeological quantification but just a broad-scale comparative indicator for of broad-scale assessment of DS realism across the datasets, rather than an exact hydrogeological quantification. Note also that Obs does not include CWLC data, since its DS ≈ 0 assumption inherently implies CWLC = 0; this specific DS assumption is instead tested by comparison with the DS implications that emerge as realistic from the other datasets.Moreover, the pronounced differences observed between datasets for DS and DSM already highlight significant inherent uncertainty, which BHCD is designed to expose rather than obscure through over-precision. Future applications of BHCD can integrate statistical modeling approaches to further quantify and explore uncertainty propagation.The relationship of ET/P versus the aridity index PET/P also facilitates further checks of dataset realism. Plotting this relationship for the various catchments in Budyko space (Koppa et al., 2021;Langbein, 1949) is a widely used approach to characterizing terrestrial water conditions (Berghuijs et al., 2014;Roderick & Farquhar, 2011;Wang et al., 2016;Xu et al., 2013). Globally, Zarei and Destouni (2024b) have shown that the Mixed and ERA5 datasets yield average ET/P that is considerably greater Formatted: Font: Field Code Changed than the theoretical upper Budyko limit of long-term average ET/P ≤ 1 for many catchments around the world. This reflects an unrealistic water balance closure in these datasets, implying that a considerable amount of extra water, beyond that provided by P minus the part going to feed R, is needed to feed the large modelled ET fluxes; this extra water must then come from continuous water storage depletion (negative average DS), which is indeed also what the Mixed and ERA5se datasets imply for some catchments around the world, and in ERA5 particularly so for southern hemisphere catchments (Zarei and Destouni, 2024b).Folder 1 in the BHCD contains the catchment polygon shapefiles, which were used to extract data from global datasets and subsequently aggregate them over each Baltic catchment to generate associated catchment-average variable time series. These catchment polygons were sourced from GSIM (Do et al., 2018b;Gudmundsson et al., 2018b) and have been renamed to align with the naming conventions used in the BHCD dataset. A csv file named 'Catchment_Info.csv' in Folder 1 lists the BHCD catchment names, their corresponding name in GSIM, the country where the catchment outlet (hydrometric station) is located, and the catchment area in km 2 as reported in GSIM (Do et al., 2018b;Gudmundsson et al., 2018b).Folders 2-5 further contain catchment-average monthly and annual time series for the variables P, ET, R, DS, SM, and T across the 69 non-overlapping study catchments within BSDB for each dataset in BHCD: Obs, Mixed, GLDAS, and ERA5. The Obs dataset includes only annual time series and long-term average values for ET and DS, and no associated ground observation-based SM data. The data are provided as .csv files, organized separately for each variable and catchment. Each dataset folder contains: (i) an Annual folder that includes annual time series data, (ii) a Monthly folder that includes monthly time series data; (iii) a "DatasetX_AnnualDataSummary.csv" file that provides a concise summary of all data for the 69 non-overlapping catchments across the comparative datasets, including catchment-wise long-term average values for P, R, ET, DS, SM, DSM, T, and PET, the relative indices PET/P, ET/P, and R/P, and the DS-implied average CWLC for all catchments, and associated catchment metadata, including catchment names, country of the catchment outlet, latitude and longitude of the catchment outlet locations, and catchment areas (km²); and (iv) a "Readme_Data Columns and Variable Units.txt" file that contains comprehensive details about the variables in the Annual and Monthly folders, including their data source, origin, units of measurement, and column names in the csv time series files. For the SM variable, the readme file also specifies the rootzone depth for the soil moisture profile used in the associated datasets Mixed (based on GLEAM), ERA5, and GLDAS. Users of the BHCD are recommended to read the readme file before utilizing the variable time series in the Annual and Monthly folders.The BHCD synthesis shows facilitates assessment of important agreements and discrepancies, and associated uncertainty ranges in the catchment-wise water balance closure and hydro-climatic conditions of implied by the different comparative datasets. Understanding the reasons behind the dataset similarities , and divergences, and uncertainties can significantly enhance research on the freshwater flux and storage change conditions contributing to and implying related changes for the Baltic Sea. Users should note that the comparative datasets are not fully independent, but share some data and methods with key distinctions that can help further studies identify the causes of discrepancies. For example, the Obs and Mixed datasets differ specifically in their ET modeling, while GLDAS and ERA5 variations stem from process representations beyond atmospheric forcing. Overall, the BHCD enables further research into the causes and significance of the dataset agreements, and discrepancies, Formatted: Heading 1, Left, Indent: First line: 0" Formatted: Font:and uncertainties for the water fluxes, storage changes, and their balances in the BSDB and the associated implications for the Baltic Sea. It further allows users to determine the reliability of the comparative datasets for specific catchments and scales of interest, and thereby identify cases that require additional validation against independent data sources. Furthermore, the water balance checks and derived implication measures in the BHCD can be further tested using available ground-measured and satellite data for independent comparative estimates of catchment-average DS, CWLC, and ET/P. Such testing can enhance confidence in a dataset or highlight areas with particular need of further investigation.Figure 1. The catchments in the Baltic Hydro-Climatic Data synthesis and their spreading around the Baltic Drainage Basin. Map showing the location of the 69 non-overlapping catchments (yellow fields) and the Baltic Drainage Basin (grey fields). Catchment boundaries derived from GSIM (Do et al., 2018b;Gudmundsson et al., 2018b). Map created by the authors using ArcGIS Pro.Figure 2. Schematic illustration of the process and steps for creating the Baltic Hydro-Climatic Data synthesis.
Keywords: Multiple datasets, Baltic Sea, catchment, Water balance closure, water fluxes, Subsurface and surface water storage changes
Received: 18 Mar 2025; Accepted: 29 May 2025.
Copyright: © 2025 Zarei and Destouni. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Mohanna Zarei, Department of Physical Geography, Faculty of Science, Stockholm University, Stockholm, SE-106 91, Stockholm, Sweden
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.