DATA REPORT article
Front. Environ. Sci.
Sec. Interdisciplinary Climate Studies
Volume 13 - 2025 | doi: 10.3389/fenvs.2025.1646528
This article is part of the Research TopicImpacts of Climate Change (CC) on territory and environment: materials and methods for evaluating the various CC-induced hazards and actions for reducing the consequences on the communities.View all 3 articles
A hydro-climatic data synthesis for Sub-Saharan Africa: Facilitating water balance closure with different comparative datasets
Provisionally accepted- 1Stockholm University, Stockholm, Sweden
- 2Kungliga Tekniska Hogskolan, Stockholm, Sweden
- 3Stellenbosch Institute for Advanced Study, Matieland, South Africa
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Freshwater resources are highly sensitive to the impacts of climate change, often resulting in decreased availability and degraded quality of water, e.g., for drinking and irrigation, and with far-reaching consequences for ecosystems and human societies (Abbasnia et al., 2019;Solangi et al., 2020;Zimmermann & Neu, 2022). Water-related extreme events such as floods, droughts, heatwaves, and wildfires may also be increasingly common, and continue to pose significant threats to lives, livelihoods, and environmental systems (Gulzar et al., 2021;McGregor et al., 2005). Among natural hazards, floods and droughts remain the most lethal and economically destructive, impairing ecosystems, damaging infrastructure, and destabilizing water availability (Wijkman & Timberlake, 2021). Understanding, monitoring, and accurately predicting such changes in hydrological conditions under climate change depend fundamentally on data. A range of datasets spanning in-situ measurements, satellite observations, and model outputs are available to explore the interactions of climate change on water flux and storage change dynamics. Depending on disciplinary or operational needs, researchers and decision-makers select datasets based on their specific methodological frameworks and sector and regional focus (Yang et al., 2019;Zhang et al., 2016). However, inconsistencies among datasets remain a major challenge (Bring et al., 2015;Ghajarnia et al., 2021;Zarei & Destouni, 2024a), highlighting the need to critically assess the reliability of and identify discrepancies between different datasets, and determine which (if any) accurately reflect the regional hydrological reality under climate change.The growing availability of regional and global hydro-climatic datasets has improved access to quantitative data for water cycle variables. However, datasets representing entire catchments and facilitating complete water balance closure remain scarce. The limitation of observation-based data is particularly evident for the lateral (horizontal) runoff fluxes (R), which are essential for catchmentwise water balance closure. Such observation-based runoff data are derived from monitored stream discharges, which integrate the hydrological runoff over each contributing catchment and directly determine the catchment-average runoff by dividing the monitored discharge (volumetric flow rate) by the catchment area. The vertical fluxes of precipitation (P) and evapotranspiration (ET) have differing levels of observational coverage. Precipitation data are more accessible, supported by relatively extensive meteorological station networks that permit spatial interpolation or extrapolation to estimate catchment-average P. Direct measurements of ET are far more limited, due to sparse availability of local flux towers, necessitating reliance on model-based estimations of ET at whole catchment scales.In terrestrial water and climate research, significant geographical gaps also persist, for example, for South America and Africa where large populations both influence and depend heavily on the terrestrial freshwater (Zarei & Destouni, 2024b). In many parts of Africa, the research gaps may to large degree depend on the limited data availability (Ndehedehe, 2019). The research and data gaps are particularly critical in regions like Sub-Saharan Africa (SSA), where water-related vulnerabilities to climate change and related natural hazards may be amplified by limited adaptive societal capacity and infrastructure. Despite the needs, comprehensive and openly accessible datasets with adequate spatial and temporal coverage to facilitate assessment of catchment-wise water balance closure and its changes under the ongoing climate change remain scarce over SSA (Sutcliffe & Parks, 1999;Falkenmark, 1989). These persistent challenges in data reliability, catchment-scale modeling, and water balance closure across the region (Moyers et al., 2023;Banda et al., 2022), reinforce the need for harmonized, quality-checked datasets that bridge multiple data sources and improve consistency for hydrological assessments in SSA. Such datasets are critical to identify and quantify the impacts of climate change on freshwater fluxes and storages across the region. High-quality, openly available data serve as a foundation for improving the scientific understanding and tackling these challenges (Gudmundsson et al., 2017), not least over a region as ecologically and socially diverse as SSA (Turyasingura et al., 2022).Sub-Saharan Africa is one of the most densely populated regions in the world and faces some of the most unstable and unequal accessibility conditions for freshwater resources (Turyasingura et al., 2022). Water bodies in the region play a critical role in supporting local livelihoods not only for drinking and irrigation, but also for activities such as fishing (Gopal et al., 2022), handcraft production (Onyena & Sam, 2020), and agricultural practices like mulching (Milder et al., 2011;Yahaya et al., 2022). The SSA region comprises 49 countries, spans approximately 24.3 million km 2 , and stretches across four time zones (Aryeetey-Attoh & McDade, 1997). Covering over 15% of the Earth's land surface and extending into all four hemispheres, the region is predominantly located between the Tropics of Cancer and Capricorn, subjecting much of it to tropical climatic influences. Nonetheless, SSA exhibits considerable climatic, hydrological, and socio-environmental diversity, with substantial spatial and temporal variability in water availability (Africa, 2018).While existing data resources such as the Global Runoff Data Centre (GRDC) and HydroSHEDS provide streamflow (GRDC, 2020) and hydrologically conditioned mapping data (Lehner et al., 2008), respectively, they do not offer a harmonized synthesis of all multiple hydroclimatic variables needed for catchment-wise water balance closure and integrated hydro-climatic assessment. To address this critical need, we present a Sub-Saharan Africa Hydro-Climatic Data (SSA-HCD) synthesis. The SSA-HCD is extracted from the Global Hydro-Climatic Data (GHCD) compilation developed by Zarei & Destouni (2024a) and provides continuous, openly-access, qualitychecked hydro-climatic data time series for the same 127 major, non-overlapping hydrological catchments across SSA as presented by four different (types of) datasets for the 30-year period from 1980 to 2020 (Figure 1). Unlike the GRDC, which provides observational runoff data with limited spatial and temporal completeness in SSA (GRDC, 2020), and HydroSHEDS, which focuses on static hydrographic features (Lehner et al., 2008), the SSA-HCD offers a harmonized, multi-variable dataset with time-resolved hydro-climatic variables and built-in diagnostic indicators for assessing catchmentwise water balance closure around SSA. Furthermore, while the analogous GHCD synthesis applies globally (Zarei & Destouni, 2024a), the SSA-HCD has specific application focus on SSA (Figure 1) for all quantitative comparisons, quality checks, and implications for additional calculated variables of the included comparative datasets (see outline of the comparisons, checks, and additional variable calculations in Figure 2). The quantification steps and procedures are analogous for the SSA-HCD and the GHCD, but the actual quantification results depend on and differ between different region and scale applications. Consequently, the SSA-HCD differs accordingly from the GHCD.The development of the SSA-HCD aim at several core objectives with focus on the SSA region including: (i) to support consistent evaluation of observational, satellite and model-based, and reanalysis-based estimates of freshwater fluxes and storage dynamics; (ii) to facilitate examination of water balance closure at the catchment level under the associated changing climatic conditions; and (iii) to facilitate deciphering of key spatial and temporal patterns of hydro-climatic dynamics, and of consistency and discrepancy about these dynamics across different datasets. Moreover, the SSA-HCD provides a foundation for further uses of the synthesized data to address important research questions for the region, comparatively between the harmonized included datasets. For example: ( 1) In what ways do observational and model-derived datasets differ in capturing interactions among major hydroclimatic variables in and across the SSA catchments? (2) If the dataset implications diverge, which of the divergent representations is more credible or physically plausible? The SSA-HCD dataset also offers a platform for investigating broader research issues across various spatiotemporal scales within SSA, such as: (3) track patterns of variability and change in water extremes and conditions, and analyze their primary climatic and anthropogenic drivers; and (4) assess the impacts of these water variations on sectors such as agriculture, water management, and ecosystem functioning. Furthermore, SSA-HCD serves as a resource for: (5) model calibration and performance evaluation from catchment to regional level for SSA; and (6) quantification of uncertainties and identification of key data limitations, as emerging from conflicting dataset representations of hydro-climatic processes in the region. The 127 catchments included in the SSA-HCD collectively represent approximately 5.3 million km 2 of land area across the SSA. As for the global GHCD synthesis, the selection of these catchments was based on a set of stringent criteria to ensure harmonized and comprehensive spatiotemporal data coverage, open accessibility, and comparability across all four datasets. The primary inclusion criteria were: a minimum of 300 non-missing monthly runoff observations over the 30-year period for all data sources, internal consistency across all included hydro-climatic variables, and non-overlapping spatial boundaries. Catchment areas within SSA that did not meet these conditions across all datasets were excluded from the final compilation. The non-overlapping criterion was important for avoiding data redundancy and prevent multiple representations of the same hydrological signals in aggregate analyses over the whole region. Moreover, priority was given to the largest possible non-overlapping catchments that satisfied all other data availability and temporal extent criteria, to maximize the spatial coverage of the SSA-HCD while upholding standards of data quality, integrity, and consistency.The SSA-HCD facilitates assessment of main dataset consistencies and differences for SSA. Such comparative analyses can enhance research on the hydrological impacts of climate change by clarifying the implication robustness and uncovering critical uncertainties in the hydro-climatic relationships, impacts and feedbacks implied by different types of observation and model-based data for the region's diverse catchments. Furthermore, the SSA-HCD can be used for calibration and validation of various hydro-climatic models, supporting efforts to improve simulation accuracy and predictive capacity across the hydrological catchments of this region. The SSA-HCD synthesis can also be used as a practical tool to improve representations of regional water systems in models and assessments related to regional water management, plans and preparations for droughts and floods.Furthermore, reliable data is often a key barrier both to relevant research, policy, and decision-making. In order to overcoming this barrier, the SSA-HCD supports water-related research and applications, e.g., for water resource planning, climate risk assessments, and developments of adaptation strategies. Materials and methods In consistency with the global GHCD compilation, the SSA-HCD incorporates four comparative datasets that enable cross-evaluation of climate change impacts on the regional freshwater fluxes and water balance components (Supplementary Information (SI)-Table S1). (I) One dataset is named Obs and includes in-situ observational data for air temperature (T) from GHCN-CAMS (Fan & Van den Dool, 2008), precipitation (P) from the "Global Precipitation Climatology Centre (GPCC-V7)" (Schneider et al., 2016), and runoff (R) from the "Global Streamflow Indices and Metadata (GSIM)" (Do et al., 2018a;Gudmundsson et al., 2018b); based on the P and R data, Obs further includes estimates of average annual evapotranspiration (ET) using the water balance approximation ET ≈ P -R, under the simplifying assumption of negligible long-term water storage change (DS = P -R -ET ≈ 0). (II) A second dataset is named Mixed and includes the same observational data for T, P and R as Obs, combined with modelled ET data from the "Global Land Evaporation Amsterdam Model (GLEAM)" (Martens et al., 2017;Miralles et al., 2011), which uses satellite observations for its ET modeling, and additionally also provides corresponding soil moisture (SM) data for the SSA-HCD. The observational R data from GSIM in Obs and Mixed directly define the contributing catchments, for which such observational catchment-defining data are available such that the same catchments can included consistently in all comparative observation and model-based datasets of the SSA-HCD. (III) A third dataset is GLDAS, a global reanalysis product that provides a full set of model-based data for T, P, ET, R and SM from the "Global Land Data Assimilation System (GLDAS)" (Beaudoing & Rodell, 2019;Rodell et al., 2004). (IV) An additional global reanalysis dataset (ERA5) is finally also included and offers full model-based time series for all basic hydro-climatic variables T, P, ET, R and SM from the "ECMWF Reanalysis 5th Generation (ERA5)" (Hersbach et al., 2017). For Mixed, GLDAS, and ERA5, the SSA-HCD further includes calculated data for storage change (DS=P-ET-R) based on the modelled ET data provided in these datasets. Catchment-wise water balance closure (comparatively as P-ET-R=DS in Mixed, GLDAS, and ERA5, and approximately assumed P-ET-R≈0 in Obs) is a key for investigating the regional water system state (in terms of R, ET, and DS) under the different climatic conditions of temperature (T) and precipitation (P) around the region, and the consistency or divergence of the different datasets in representing the regional climate and water state relationships (Berghuijs et al., 2014;Bring et al., 2015;Lehmann et al., 2022).In principle, if consistent observations of water storage change (DS) were widely available from ground-based and/or remote sensing sources checked and accurately calibrated based on ground data, ET could be estimated instead of DS from the catchment-scale water balance as ET = P -R -DS (Bhattarai et al., 2019). However, availability of such DS data with sufficient spatiotemporal resolution across diverse catchments remains a substantial challenge. To further assess internal dataset consistency, the temporal changes in soil moisture (DSM) were also computed for each dataset and compared with the DS implied by the same dataset (SI-Note S1), given that DSM is a component of the total DS and both are therefore expected to vary in the same direction (Destouni & Verrot, 2014).As stated, the 127 non-overlapping catchments included in the SSA-HCD are determined by the stream discharge measurement locations in the Global Streamflow Indices and Metadata (GSIM) archive. These discharge data also form the basis for determining catchment-average R in both the Obs and Mixed datasets, and enable the estimation of ET in Obs as ET = P -R. For the remaining variables, gridded datasets were used, with data extracted according to the hydrological boundaries of each catchment. To compute catchment-average values, consistent with the catchment-average R data, spatial interpolation was applied using an area-weighted averaging approach for all grid cells intersecting the boundaries (topographically determined water divide) of each catchment. The data processing was conducted at the finest consistent temporal resolution available (monthly) for and across all hydro-climatic variables to generate continuous time series of monthly variable values for each catchment. In addition, corresponding annual and long-term average variable values were also calculated for each catchment to facilitate analyses of interannual variability and longer-term hydroclimatic trends in and across the catchments.The schematic flowchart in Figure 2 presents an overview of the data processing steps used to create the multi-catchment, multi-dataset SSA-HCD (Zarei & Destouni, 2025b). As summarized visually in Figure 2, the processing steps in the SSA-HCD development, are consistent with and include the same data quality and consistency requirements, checks, and assurances as the analogous published global data syntheses GHCD (Zarei & Destouni, 2024a), and BHCD for the Baltic region (Zarei & Destouni, 2025a). For more detailed descriptions of the different steps and associated requirements, checks, and assurances of data quality and consistency, we refer to the more extensive GHCD paper (Zarei & Destouni, 2024a). The comparative datasets included in the SSA-HCD provide time series of monthly and annual average values for each hydro-climatic variable, along with their long-term averages over the 30-year climatological reference period . This period aligns with the World Meteorological Organization (WMO) recommendation to use 30-year intervals for representing climatic norms (World Meteorological Organization, 2017); the SSA-HCD can be further extended as more recent data become available and climatic reference periods change. The only exception in terms of temporal resolution is the Obs dataset, which does not include monthly time series for ET and DS. This is because the assumption of ET ≈ P -R and DS ≈ 0 in Obs is only physically meaningful for annual or longer-term average (and not monthly) water balance. This assumption (DS ≈ 0) is commonly used in large-scale water balance assessments over multi-annual and longer time periods (e.g., Althoff & Destouni, 2023;Bhattarai et al., 2019). Average water storage change (DS) has been shown to often be small over whole catchments (Jaramillo et al., 2013) and larger scales (Zarei & Destouni, 2024a) relative to the corresponding average evapotranspiration flux that this assumption is used to calculate as ET≈P-R since ET is not measured over such large scales. All dataset variables are further provided as catchment-average values. In addition to the primary climatic air temperature (T) and precipitation (P) data and corresponding hydrological flux and storage-change data (for ET, R, DS, and DSM), the SSA-HCD also includes data for catchment-average cumulative water level change (CWLC) implied by DS over the full 30-year period (SI-Note S2 and associated Figure S1), and the long-term aridity index (PET/P; where PET is potential evapotranspiration) and flux partitioning ratios ET/P and R/P (SI-Note S3 and associated Figure S2).Folder 1 of the SSA-HCD contains the catchment polygon shapefiles used to extract the data from the corresponding global datasets and aggregate them over each SSA catchment to produce catchment-average time series of the hydro-climatic variables. These polygons were sourced from the Global Streamflow Indices and Metadata archive (Do et al., 2018b;Gudmundsson et al., 2018a) and have been renamed to match the naming conventions adopted in the SSA-HCD. A corresponding csv file, 'Catchment_Info.csv,' also located in Folder 1, provides essential metadata, including the SSA-HCD catchment names, their original GSIM identifiers, the country in which each catchment's outlet (hydrometric station) is located, and the reported catchment area in km² (as per GSIM). In GSIM, each gauged catchment is linked to a specific river station, with an associated country designation based on the station's location-regardless of whether the full catchment area spans multiple countries. For instance, catchments labeled under "Namibia" include the Konkiep River (Bethanien station), Omaruru River (Etemba station), Abu-Huab River (Rooiberg station), Hoanib River (Sesfontein station), Kwando River (Kongola station), Zambezi River (Katima Mulilo station), Kunene River (Ruacana station), and Omuhonga River (Ombuka station). Although several of these rivers are transboundary and thus extend into countries such as Angola and Zambia, they are categorized under "Namibia" in GSIM due to the monitoring station location in Namibian territory.Folders 2-5 of the SSA-HCD contain catchment-average monthly and annual time series for the hydro-climatic variables air temperature (T), precipitation (P), evapotranspiration (ET), runoff (R), total water storage change (DS), and soil moisture (SM) across the 127 non-overlapping study catchments within SSA, for each of the four datasets: Obs, Mixed, GLDAS, and ERA5. The Obs dataset includes only annual time series and long-term average values for ET and DS, and does not include any corresponding soil moisture data. All data are provided in csv format, organized by variable and catchment. Each dataset folder contains: (a) an annual folder with annual time series data, (b) a monthly folder with monthly time series data, (c) a summary file titled DatasetX_AnnualDataSummary.csv, which compiles long-term average values for P, R, ET, DS, SM, DSM, T, and PET across all 127 catchments, along with derived indices (PET/P, ET/P, R/P), CWLC values, and associated catchment metadata (name, outlet country, outlet latitude/longitude, and area in km²), and (d) a Readme_Data Columns and Variable Units.txt file, providing comprehensive documentation on all variables, including data sources, origins, units, and corresponding column headers in the time series files. For SM, the readme file also includes information about the rootzone depth used from the soil moisture profiles of the Mixed (GLEAM-based), ERA5, and GLDAS datasets. Users are strongly encouraged to consult the readme file prior to working with the data time series in the Annual and Monthly folders to ensure correct interpretation and usage. While data availability constraints have influenced the selection of hydro-climatic study regions in past research, these constraints also underscore the need to start prioritizing underrepresented areas in new investigations. The SSA-HCD provided here opens new avenues to bridge the data gaps for SSA. The SSA-HCD shows both agreements and differences in the hydro-climatic variables and the catchmentwise water balances they imply based on the different datasets. Further study of these data and their implications can provide new insights into the reasons for dataset divergence and advance our understanding of the impacts of climate change on the freshwater fluxes and storage change dynamics in the SSA. Use of the SSA-HCD can enable a wide range of hydro-climatic research applications, such as: quantifying long-term groundwater depletion through analysis of DS and DSM trends, evaluating flood and drought hazards based on precipitation and runoff variations and how they are modulated by evapotranspiration and temperature, comparing and analyzing how different datasets represent these extreme events, supporting seasonal forecasting and early warning systems for the extreme events, validating hydrological and land surface models, and assessing climate change impacts on water availability at catchment and over larger scales. These applications can target multiple stakeholders groups including: hydrologists and water resource scientists for evaluating water balance closure and hydro-climatic interactions, climate modelers and data assimilation experts for benchmarking land surface models and assessing data realism, regional planners and applied researchers for drought trend assessment, flood risk modeling, and water resource planning in datascarce contexts, and policy and decision-support communities who may rely on catchment-scale data to guide investment and adaptation planning under climate variability.It is important for users to recognize that the datasets included in the SSA-HCD are not fully independent, as they share some underlying data and methodologies. However, each dataset involves distinct assumptions and modeling approaches that can help reveal the reasons for inconsistencies between the datasets. For instance, differences between the Obs and Mixed datasets depend primarily on their alternative methods of ET estimation, while differences between GLDAS and ERA5 stem from their different model representations of hydro-climatic interaction processes on land, beyond the similar atmospheric forcing of these datasets. The SSA-HCD synthesis enables valuable hydro-climatic analysis across SSA, subject to various limitations inherited from the source datasets. For instance, GPCC precipitation data are affected by sparse gauge coverage in parts of SSA, leading to uncertainties in interpolated data values. Similarly, GSIM streamflow data, though extensively quality-controlled, are also spatially limited and include associated data gaps and station-specific uncertainties. The reanalysis datasets ERA5 and GLDAS offer full spatial coverage over SSA, but have been primarily calibrated using data from other regions, with denser observational infrastructure, which may introduce important biases when applied to the SSA region. These underlying dataset limitations are not corrected in the SSA-HCD. The four included datasets, however, collectively support consistent crosscomparisons that can identify important result convergence and robustness, and key result differences and uncertainties between the datasets.
Keywords: Hydro-climatic data, Climate-water interplay, sub-Saharan Africa, Water flux and storage changes, catchments, water balance
Received: 13 Jun 2025; Accepted: 10 Sep 2025.
Copyright: © 2025 Zarei and Destouni. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Mohanna Zarei, Stockholm University, Stockholm, Sweden
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.