Integrated framework for hydrologic modelling in data-sparse watersheds and climate change impact on projected green and blue water sustainability

Climate and hydrologic hazards pose a threat to the distribution of watersheds ’ water resources in time and space, necessitating planning for sustainable resilience and adaptation. Hydrologic modelling has emerged as a potential solution for understanding watershed responses to projected climate change, and a prediction model that can deliver actionable information is necessary, although it requires basin-scale observations to calibrate the model to reliably predict basin-scale water resources hazards. Such luxury is not always tenable in watersheds with inadequate ground-based observation. However, satellite-based evapotranspiration (ET) data coupled with a machine learning feature selection as a data re ﬁ nement process has made integrated water balance modelling widely regarded as a viable alternative for improving the capability of watershed modelling processes in data-sparse regions. This study developed a convincing hydrologic model framework to suf ﬁ ciently calibrate and provide accurate behavioural solutions for all model responses. The framework was applied to four sub-basins that form the larger Lake Chad basin. The model results were applied to assess the dynamic changes in projected blue and green water resource sustainability in response to climate change in one of the sub-basins. Study ﬁ ndings indicate that hydrologic ﬂ uxes can be simulated accurately with varying degrees of acceptability, with R 2 and NSE values in the range of 0.69 – 0.88 and 0.45 – 0.77 for calibration and 0.69 – 0.79 and 0.34 – 0.63 for validation, respectively, and captured within a satisfactory uncertainty range of P-factor and R-factor values of 0.68 – 0.93 and 0.73 – 1.31, respectively, in 83%, 67%, 85.7%, and 81.3% of the sub-watersheds based


Introduction
Water resource planning must find a solution to the issue of achieving judicious and effective use of water, particularly in light of the growing population, climate change, and depleting water supplies (Novoa et al., 2019).Water is the cornerstone of community development since it provides such a wide range of ecological functions.This enables its efficient, fair, and sustainable distribution in order to eradicate poverty, promote economic development, and protect the environment (Hu et al., 2016).
The rate and amount of time required to store water in various storage reservoirs, including surface and groundwater, seas, atmosphere, snow, and ice, has been altered due to human use (Keys et al., 2016).Consequently, attaining water sustainability, which is defined as meeting everyone's present water needs without compromising the supply in the future while advancing societal goals and preserving the environment, remains one of the greatest difficulties worldwide (Hu et al., 2016;Chouchane et al., 2018).Many administrative authorities have made the sustainable management of water resources a top priority to ensure that all residents and economic sectors have access to water sufficiently in the right quality and quantity (Martinsen et al., 2019;Tortajada et al., 2019).The sustainability of water in a basin can only be achieved if it is possible to sustain ecosystems' hydrological, ecological, biological, and chemical processes while providing an equitable and effective water supply over time (Pfister et al., 2009;Wang et al., 2016).The water footprint (WF) concept addresses these needs by providing an assessment of water resources that accounts for natural variability and usage across sectors (Hejazi et al., 2014).
A multi-dimensional indicator called the water footprint (WF) reveals the characteristics of anthropogenic stresses on water supplies and the amount of freshwater consumed.This offers insights into waterrelated challenges, aids in understanding present patterns of water allocation across different river basin sectors (Muratoglu et al., 2022), and enables decision-makers to take advantage of the substantial data on water use supplied by the WF technique by improving water management, hotspot identification, and the development of appropriate responses to changes (Pellicer-Martínez and Martínez-Paz, 2018).The approach is excellent for comparing water resources across different administrative boundaries relative to their quantity and quality (Li et al., 2018).
Blue and green water are the two categories into which the freshwater cycle can be separated based on the hydrological processes and types of storage involved.Green water is the portion of precipitation that seeps into the ground and changes into soil moisture or momentarily stays on top of the ground or vegetation and, subsequently, evaporates and transpires back into the atmosphere.Blue water is the term for precipitation that accumulates in aquifers, lakes, and reservoirs and flows through or below the land surface (Rockström et al., 2009;Rodrigues et al., 2014).The consumption of both blue and green water by various sectors is included in the water footprint concept, according to Hoekstra et al. (2011).The concept indicates that the green water footprint (GWF) represents the estimated amount of green water required and used by plants (i.e., evapotranspiration (ET) from crop and pastureland), which is frequently referred to as productive vapour flows), whereas the blue water footprint represents the consumptive use of freshwater water resources from rivers, lakes, and overland flow.
Hydrologic models created for various time and spatial scales have started to become more complicated, and as a result, the use of WF as an indicator of sustainability is essential in order to determine environmental water consumption restrictions (Shrestha et al., 2017).This indicator is particularly essential in regions susceptible to water variabilities, such as basins in Mediterranean and tropical climates, where the demand for water for irrigation rises during decreased precipitation, limiting runoff and downstream flows (Novoa et al., 2019).
Previous studies have demonstrated that extreme hydrological events are rising in terms of frequency and severity due to the deepening of global climate change, creating new problems for managing water resources and the regional water cycle (Vicente-Serrano et al., 2017;Tabari, 2020).Global attention has been drawn to changes in climate patterns and their possible effects on water resources.However, there are uncertainties in future climate change estimates (e.g., changes in temperature and precipitation), which makes it difficult to decide on appropriate adaptation measures by planning authorities (Dessai and Hulme, 2007;Gosling and Arnell, 2016).The main sources of these uncertainties are changes in the initialisations and parameterisations used in climate models to explain physical processes as well as downscaling methods (Zhuang et al., 2016).It has been posited that water resources are vulnerable to these uncertainties, and it is challenging to anticipate with precision in a changing environment.Therefore, it is crucial to create water management plans in an environment that is complex and uncertain during a period of global climate change (Wang et al., 2016).
Multiple climate models have been used in an evaluation framework to find effective ways to manage basin water resources under the effects of climate change; the results showed that these plans are extremely vulnerable to climatic changes.Some conclusions drawn from other studies revealed that water resources in various regions are sensitive to climate change, and the relative influence varies significantly around the world.Accumulating evidence reveals that the sources and types of uncertainties affect the selection of adaption strategies (Dessai and Hulme, 2007;Arnell et al., 2011;Refsgaard et al., 2013;Cai et al., 2015;Tzabiras et al., 2016;Sun et al., 2017).Identification of effective corporate strategies and policy actions requires uncertainty analysis (e.g., climate adaptation, resilience, and mitigation measures).When stakeholders, decision-makers, and researchers are aware of the sources, types, and characteristics of uncertainty, their trust in scientific analysis is increased (Gabbert et al., 2010;Kirchner et al., 2021).Accounting for uncertainty is typically necessary for the scientific publishing of model-generated quantitative assessments and is regarded as excellent modeling practice (Troost et al., 2015).
Observational data are the foundation of our understanding of environmental systems, but their scarcity and unpredictability limit their study and practical applications.The accuracy of atmospheric data is crucial for the validity of hydro-meteorological and climatological investigations, among other factors (Zandler et al., 2019).The flaws in the input rainfall data utilised might be reduced or amplified by the non-linearity of hydrological modelling processes, which can lead to a good or bad depiction of the hydrological responses and, consequently, lead to inadequate water resource policy and adaptation measures (Maggioni and Massari, 2018).In order to improve the spatiotemporal process representation, distributed observational datasets must be used to inform and assess distributed hydrological models, created to enable large watershed forecasts (Baroni et al., 2019;Ocio et al., 2019).In this instance, determining whether meteorological data are adequate and coherent to accurately reproduce basin-scale hydrology is a requirement before choosing data for managing water resources (Laiti et al., 2018).Integrated modelling is useful in many areas of study on global climate change, and in this article, we define integrated modelling as an interdisciplinary technique of linking accurately curated empirical data and mathematical models that are founded on disciplinary notions to present a more thorough and precise picture of interactions between people and their environment (Moss et al., 2010;Laniak et al., 2013).
Uncertainty can manifest and build up across any chosen modelling framework, which makes it a significant problem for integrated modelling.Uncertainty is mostly dealt with in two ways using existing integrated modelling frameworks (IMFs).The first is by using scenarios to measure the uncertainty of future changes, e.g., various alternative descriptions of how the future might look are provided, which are internally consistent with projections (Reilly and Willenbockel, 2010;Mitter et al., 2019).Second, contributions from research teams of how outputs of hydrologic models are compared with observations defined based on recommended key performance indicators across the scientific community and the techniques are utilised to address uncertainty due to the application of various alternative data with designed models (Elliott et al., 2014;Folberth et al., 2019).The full identification and tracking of uncertainties in integrated modelling, i.e., the manner in which uncertainty spreads among climate models as applied to hydrologic modelling in data-sparse regions, has received very little attention (Holzkämper et al., 2015;Karner et al., 2019;Mitter and Schmid, 2019).Such analysis was previously acknowledged as a serious research quest in the early phases of integrated modelling, particularly for the propagation of uncertainty from land use optimisation models to the construction of hydrologic models.
In connection with this effort, numerous studies that provide a unique methodology to define and understand the various hydrologic model processes and the relationships between the various hydrological variables are important (Bierkens et al., 2015).It was also emphasised that one of the major scientific difficulties is continually refining the depiction of hydrologic model processes in the model design (Clark et al., 2015).The hydrologic community has agreed on the necessity for additional datasets along with associated signature measurements to enhance the portrayal of the key physical model processes (Clark et al., 2016).The fact that climate models still struggle to accurately replicate important climate processes is of greater concern.While precipitation estimates are widely variable, temperature projections are similar across all climate models and are thought to be more reliable, and future hydro-meteorological conditions can be uncertainly predicted because of the significant degree of variability in general circulation model (GCM) outputs.
In order to create an integrated modelling framework, it is often necessary to work on individual model modifications, model connections that are improved, and the application of the integrated modelling framework to particular research issues, whose outcome can be relied upon for the basin-scale assessment of water security, sustainability, and other related applications to achieve better water policy decisions in response to projected climate change.For a meaningful comprehension of basin weather patterns and their future trends based on feature extraction by training the historical dataset using artificial intelligence to track water resource indicators, a prediction model that can deliver actionable information is necessary (Kratzert et al., 2018;Ali et al., 2020).In this study, we created a convincing framework or strategy to deal with the difficulties of modelling in areas with little or sparse data, appropriate ways to use alternative research datasets to evaluate models, and considerations for data uncertainty and incompatibility between models and measurements.The framework integrates a machine learning technique, Boruta random forest (BRF) optimiser, and a hydrologic model, Soil and Water Assessment Tool (SWAT), to refine the data input process mechanism for the creation of a reliable model for basin water resource assessment.The methodology will be applied to four sub-watersheds that encompass the Lake Chad hydrologic basin in Sub-Saharan Africa, with variable morphological properties.
The objective is to provide a novel pathway to increase transparency and improve uncertainty communication of long-term water balance models in an easily understood way without compromising scientific accuracy in data-sparse watersheds, which have not been adequately studied.This idea seeks to be general and adaptable enough within the allowable uncertainty band to permit its use in other basins with comparable modelling problems.Finally, the integrated model framework will provide a crucial link between hydrology and human activities at local watershed levels to assess and monitor the implications and dynamic changes from baseline, the projected blue and green water resources and their sustainability, in response to changes in climate at annual and monthly timescales in the Yobe-Komadugu sub-watershed.
2 Case study area and data

Case study area
The Lake Chad basin, with an estimated area of over 2,500,000 km 2 , is one of the largest endorheic basins in the world (Coe and Foley, 2001;Gao et al., 2011).It is located between the Sahara and the Sudano-Sahelian areas of West Africa, between latitudes of 5.2⁰-25.3⁰N and longitudes of 6.9⁰-24.5⁰E (Figure 1).The basin receives the majority of its annual rainfall between July and September.The region is renowned for being particularly susceptible to climate change, which frequently results in severe drought and water shortage circumstances (Ndehedehe et al., 2018), and the lake is a freshwater source for livestock grazing, fish farming, and other socioeconomic activities (Buma et al., 2016).The major contributors of discharge to the lake are the Chari River (~90%), with an annual streamflow of 860 m 3 /s between 1960 and 2013, and the Yobe River (~2-5%), with an annual streamflow of 18 m 3 /s between 1961 and 2013 (Lemoalle, 2014).Other rivers that contribute supplies of between 1% and 2% are Gubio, Yedseram, Ngadda, and El-Beid.However, there are a few rivers like the Batha River and other rivers situated in the Saharan zone that do not have an outlet to Lake Chad (Figure 1).The precipitation in the basin varies geographically and seasonally between <100 and 1,500 mm/yr (Nkiaka et al., 2018).

MODIS evapotranspiration
The state of observed streamflow data is quite poor and inadequate, with many missing data points, which undermines the confidence in the output of hydrologic modelling results temporally and spatially in the entire basin.Alternatively, the availability of high spatial variability of satellite-derived land surface MODIS-NASA Evapotranspiration data at a monthly timescale was extracted by overlaying the 1.0 × 1.0 grids with the basin map.A total of 100 observation points (Figure 1) were generated and aggregated to develop 59 simulation sub-basin points across the entire watershed, as recommended by Abbaspour et al. (2019), and the data were divided into 1983-1998 for calibration and 1999-2006 for validation of the models.A more accurate hydrologic model simulation may be obtained by taking into account the geographical distribution of the AET throughout the watershed.The MODIS-NASA ET data also takes into account factors like plant transpiration and the evaporation of soil moisture (Autovino et al., 2016).

Digital elevation model and soil, land use, and land cover data
The watershed was delineated using the ArcSWAT program in the ArcMap 10.8 environment using the topography information from the basin that was collected from the ASTER Global Digital Elevation Model version 3, with a spatial resolution of 30 m.The soil data were obtained from the Harmonized World Soil Database (HWSD) (Table 1), with a 1 km resolution, founded by the Food and Agricultural Organization (FAO) and notable research centres (Abbaspour et al., 2019).Land use and land cover data were obtained from the European Space Agency (Table 1), which was an  initiative that was developed from global composite land cover maps using observations from the 300-m MERIS sensor onboard the ENVISAT satellite mission.The GlobeCover map contains 23 land cover types (Bontemps et al., 2011).The description, resolution, and data source are shown in Table 1, and the links to the source can be found in Abbaspour et al. (2019).

Climate data
The gridded precipitation and temperature data used here were recommended from a previous study by Lawal et al. (2021) at a daily time-step between 1979 and 2011 and were extracted at a 1 °× 1 °grid resolution for the pre-processing of general circulation models for baseline (1979-2011) and projected (2021-2080) climate change scenario data considering two shared socioeconomic pathways based on carbon dioxide emission scenarios SSP2(4.5) and SSP5(8.5),supported by the World Climate Research Programme in the ESGF database, and the data are available and can be extracted from the source provided in Table 1.

Pre-processing of input data
The dataset required for the integrated model framework needs to be checked and prepared to fit the model specifications  The bold values are the sum of total basin area, sub-basin and hydrologic response unit (HRU) of the entire basin respectively.for efficient and accurate output of the model hydrologic variables.The primary input data required are pre-processed to depict the status of the watershed land management and vegetation properties.

Climate models, downscaling, and bias correction
The general circulation models (GCMs) used in this assessment were the ensemble of four (MPI-ESM1.2-LR,INM-CM4.8,MRI-ESM2.0,and INM-CM5.0)Coupled Model Intercomparison Project Phase 6 (CMIP6) models and were extracted at 210 data points (Figure 2A) for the historical  and projected climate change scenarios (2021-2080) for two shared socioeconomic pathways (SSP4.5 and SSP8.5) corresponding to total radiative forcings of 4.5 and 8.5 W/m 2 (approximately equal to mean CO 2 emission concentrations of 650 and 1,370 ppm), respectively, in 2100.Before predicting the future climate, it was necessary to modify the climate model's outputs because they contain biases.The delta and quantile mapping methods were used to downscale and correct the known biases in the precipitation and temperature data, respectively, based on a study conducted in the basin using CPC and PGF gridded data in line with study requirements (Lawal et al., 2023).The methods are non-parametric and corrected the predicted climate data based on point-wise empirical cumulative distribution functions.The downscaling strategies were found to significantly improve the forms of the linked frequency distributions and minimise systematic biases and indices of extreme events by approximately one order of magnitude (Themeßl et al., 2012).

Land use, soil, and DEM data
The soil characteristics for the entire watershed were extracted from the world HWSD dataset and include two soil profiles (0-30 cm and 30-100 cm depths), the available water capacity, and the bulk density, along with the majority of the soil information needed for the SWAT model.The majority of the primary soil classifications include clay, loam, sand, clay-loam, sandy-clay, loamy-sand, sandy-loam, sandy-clay-loam, and rock that make up the watershed (Figure 2B).The watershed's land use and land cover data were extracted to match the subwatershed extent and categorised into six different land uses that work with the SWAT model (Figure 2C), including artificial area (URMD) 0.013%, barren land (BARR) 52.873%, agricultural land (AGRL) 3.743%, forest land (FRST) 15.636%, vegetation (PAST) 27.842%, and water bodies (WATR) 0.166%.In order to extract the topographic features of the terrain, which are a necessity for hydrological research, basin elevation information is crucial.The 30 m spatial resolution digital elevation model (DEM) was extracted (Figure 2A) and transformed into a Universal Transverse Mercator system of coordinate representation to aid the delineation of the watershed boundary.

Integrated modelling framework
We combined a machine learning optimiser (BRF) and SWAT model to refine the input process of the baseline and projected climate data to reduce input uncertainties in the modelling process (i.e., technical uncertainties) to enhance the simulation process and improve the confidence in the modelling output for the reliable assessment of basin-scale hydrologic features.This approach is necessary to further lower the danger of misinterpreting climate signals and improve adaptation assessments.Our goal is to create an integrated modelling framework that satisfies these criteria for evaluating the effects of anticipated regional water balance changes brought on by climate-projected scenarios on the sustainability of green and blue water in data-sparse regions under uncertainty.The two integrated processes are discussed briefly in the following sub-section, and the schematic overview is shown in Figure 3.

Boruta random forest optimizer
The Boruta feature selection method was created as a wrapper for the random forest algorithm, which is used to identify important features of the predictors.Every input predictor's Z-score distribution relative to the shadow property is calculated.The distribution of the Z-score metrics establishes the key components of the predictors (Kursa and Rudnicki, 2010).It involves developing a stepwise model using a minimal-optimal feature selection technique that rates the salient features of the model and the residual according to the optimization-determined factors (Kursa, 2016).It is an effective feature selection technique that makes it easier to categorize high-dimensional data.Information gain is used to gauge each feature's contribution and establish its association using a novel extension of balanced information gain.This is very significant when analyzing a vast amount of data to achieve high generalization accuracy.
The methodology of the optimization process of the input dataset is discussed in Lawal et al. (2023).The proposed strategy is required to address potential shortcomings of the conventional modelling methodologies, such as their incapacity to analyze  stochastic aspects, complicated variable input features, and interrelated climatic and hydrological properties that restrict the process' ability to address crucial temporal behaviour (Adamowski et al., 2012).As demonstrated in Lawal et al. (2023), the BRF feature selection technique has been proven to retain the climatic signals by filtering out redundant downscaled GCMs that may create a dip in the capability of selected ensembles developed to accurately represent basin-scale hydrologic features like extreme events (return period of flood and drought) and their trends and magnitudes.Thus, integrating the feature extraction algorithms with SWAT modelling may provide an improved output of calibrated and validated water balance models for a reliable and accurate prediction of baseline and projected hydrologic features in data-sparse watersheds for water security assessment.

SWAT hydrologic model
The model is a semi-distributed and continuous time-step hydrological model known as SWAT.The model is utilised to study water quality (sediment load and nutrient flow), water quantity (streamflow, evapotranspiration, water yield, aquifer recharge, etc.), and crop growth processes in different watersheds (Veettil and Mishra, 2018;Gesualdo et al., 2019).Surface runoff in the model is generated by the use of the SCS curve number method, and ET is estimated by various approaches based on the source of the data and basin conditions, such as the Priestley-Taylor, Penman-Monteith, or Hargreaves methods.All flow variables are evaluated using the mass balance equation of the model system (Swain et al., 2020).The water balance equation conceptualises SWAT's simulation of the hydrological processes as where SW t is the residual water content in the soil (mm), SW o is the initial soil water content, t is the time in days, P i denotes precipitation, Q i denotes surface runoff, ET i denotes evapotranspiration, G i is the subsurface flow from the soil profile, and R i is the return flow on the ith day all in mm.SWAT primarily analyses each hydrologic response unit (HRU), which is a division of the sub-basin with similar groups of soil and vegetation types, to estimate the water availability at each sub-main basin's channel for a particular time phase.To regulate the flow of water, the water is subsequently channelled to the basin exit via the river and subsurface systems.SWAT's model optimisation process and analysis are often performed using SWAT-CUP or calibration and uncertainty programs (Abbaspour et al., 2015).The Sequential Uncertainty Fitting version 2 (SUFI-2) tool, an optimisation algorithm based on stochastic procedures within the SWAT-CUP interface, was utilised for adjusting independent parameter sets by Latin hypercube sampling (LHS).The interface used global or onefactor-at-a-time sensitivity analysis during calibration and validation.The model performance was evaluated by the statistical metrics coefficient of determination (R 2 ) and Nash-Sutcliffe efficiency coefficient (NSE), whose equations are as follows: where x is the observed mean value, x i is the value of the ith observation, y i is the modelled value of the ith observation, y is the mean of the simulated model values, and n is the total number of sample sets of the observation.

Integrated model simulation
The model was set up by importing the DEM to the ArcSWAT interface, and the watershed boundary was delineated.However, the basin was divided into four major watersheds based on the climatic zones: the Yobe-Komadugu, Magay-Ngadda, Chari-Logone, and Bodou-Dillia sub-basins (Figure 1).The main river networks and tributaries were generated based on a threshold drainage area of 3,000 km 2 , and all were connected to Lake Chad.The HRUs' adjusted threshold of soil type, land use, and slope were set at 15% to fairly retain the characteristics of the land use features and slope classes of 0%-2%, 2%-8%, 8%-15%, and >15%.The catchment was discretised into 315 sub-basins, with a subdivision of 1,702 HRUs (Table 2).Frontiers in Environmental Science frontiersin.org The Boruta random forest filter was integrated to optimise the climate dataset used in this study.Initially, 16 GCM datasets at a daily time-step were parsed through the algorithms by utilising the observed gridded and GCM datasets as the target and dependent features, respectively, at the 210 grid points considered, to screen and extract the significant input features (GCMs).The optimisation process (i.e., dependent and target) features are considered to be statistically significant if the lagged values delay them.All input predictors' Z-scores are computed by the algorithm, and the distribution defines the shadow characteristics derived from the target variable.At each grid point, an input feature is deemed important if and only if the feature importance score (Z-score) is greater than the shadow attributes generated from the target feature after 500 iterations.
The ensemble of the four best GCMs at each grid point was formed for both baseline 1979-2011 and the projected scenarios SSP2(4.5) and SSP5(8.5) at two-time slices of 2021-2050 and 2051-2080 and integrated into the hydrologic model.The optimisation process is important to screen through antecedent lagged memories within the datasets (GCM inputs) after the application of the algorithms to potentially correlate the time series arising from meteorological factors without necessarily misrepresenting the basin climate features.Owing to the lack of observation data, like wind speed, solar radiation, relative humidity, and reservoir operation data, default model values were maintained, and the influence of the reservoir was neglected.The Hargreaves temperature-based approach was set up within the model in the simulation of the evapotranspiration variable to prevent the influence of the aforementioned weather data in the ET simulation.

Model calibration, validation, and uncertainty analysis
The four watershed models were optimised using SUFI-2 algorithms against the observed ET data extracted at 100 points and reaggregated to form 59 test points based on the delineated watershed boundaries and with a balanced spatial distribution that covers the entire basin to increase confidence in the model output.The primary

Assessment of water footprint environmental sustainability
The output of the hydrologic model was used to evaluate the impact of climate change on spatial and temporal variations in green and blue water footprint environmental sustainability of the Yobe-Komadugu watershed.The watershed is dominated by agricultural land and situated within the two-climate extreme of the basin.The sustainability index (Supporting Material) is a summary index that assesses the sustainability of water resource systems (da Cunha e Silva et al., 2022).It can be used to calculate the sustainability for water consumers and determine changes in sustainability by comparing the index among various suggested water policies (de O. Vieira and Sandoval-Solis, 2018).The sustainability index will help decision-makers to highlight policies that will maintain or enhance the basin's desired future water management characteristics (Sandoval-Solis et al., 2011).
Geographic hotspots that lead to water resource conflict were identified by defining the environmental sustainability of blue and green water at the basin size in relation to freshwater provision levels (threshold available water for human use).To evaluate the environmental sustainability, we used a sustainability index (Supporting Material), which compared specific sub-basin WF to its corresponding water availability (WA) in terms of the water footprint concept, as demonstrated in Liu et al. (2020) based on the following equation: Here, SI i,j blue and SI i,j green represent the indices that define watershed blue and green water environmental sustainability in sub-basin i at time j; WF blue , WF green , WA blue , and WA green represent blue and green water footprint and availability, respectively.When the blue and green water footprints exceed the availability, i.e., (SI i,j blue < 0) and (SI i,j green < 0), then the water footprint is unsustainable in the sub-basin because human water usage contravenes the needs of ecosystems and environmental flow regulations (Hoekstra et al., 2011).Here, we categorize the green and blue water sustainability thresholds into extremely (ES) (0.75 ≤ SI ≤ 1), highly (HS) (0.5 ≤ SI < 0.75), and moderately (MS) (0.0 ≤ SI < 0.5) sustainable indices, which are referred to as viable water security points, and extremely (EU) (SI < − 1), highly (HU) (−1 ≤ SI < − 0.5), and moderately (MU) (−0.5 ≤ SI < 0.0) unsustainable indices, which are referred to and identified as high-, medium-, and low-risk geographic water security hotspots, respectively.

Blue water footprint and availability assessment
Blue water is determined from the output of the modelling framework (Figure 3).Blue water is the sum of groundwater storage and water yield (WYLD), referred to as blue water flow (BWF).The water yield (WYLD) defines the threshold amount of water that leaves the HRU and enters the main channel, and groundwater storage is the difference between aquifer recharge (GW_RCHG) and the main channel flow (GW-Q) (Rodrigues et al., 2014).The basin blue water security is evaluated by the sustainability indicators in terms of the blue water footprint or water abstraction restriction based on satisfying absolute environmental demand, i.e., the concept of both abstraction (demand) and consumption (withdrawal minus return flow).The blue water footprints were referred to as water appropriated or consumed by different sectors at the river basin scale, and the spatial distribution of water uses was determined by sectoral water demand information (Table 3), 1-km gridded world population density (CIESIN) data for baseline and projected future periods consistent with the CO 2 emission scenarios related to the middle of the road (SSP2) and fossil-fuelled development (SSP5) available at http://sedac.ciesin.columbia.edu/gpw(Balk et al., 2006;Jones and O'Neill, 2016), and a conservative value of 92 L/capita/day was used to quantify absolute basic water consumption for the domestic blue water footprint to meet the minimum target during stringent water restrictions (Crouch et al., 2021).This concept was adopted here and can be applied to basins where actual sectoral water demand information cannot be established or is inadequate for long-term water security assessment at the basin scale.
The basin's annual blue water footprint was determined based on the following equation: where C i is the proportion of sectoral water use, A b is the area of the sub-basin (km 2 ), P d denotes the long-term mean population density per square km, Q w is the per capita water use (L/capita/day), and n denotes the number of sectors utilising the freshwater resources.However, owing to inadequate data, monthly variations were not accounted for in the assessment of blue water sustainability at a monthly scale.
The blue water availability was estimated as proposed in Hoekstra et al. (2011), where WA blue was determined by considering the proportion of safe natural runoff (streamflow) that is available for consumptive use at each sub-basin, as shown in Eq. 7.
where Q represents the long-term sub-basin natural runoff (streamflow) (m 3 /s) and EFR is the environmental flow requirement to maintain a healthy river ecosystem.In this case, EFR was estimated using the presumed standard method proposed in Richter et al. (2012), stating that 20% of the long-term mean monthly natural runoff can be made available and considered appropriate for withdrawal.

Green water footprint and availability assessment
Green water has two components defined as green water flow (withdrawal) and green water storage (availability).According to the HRU output of the SWAT model, the green water withdrawal represents actual evapotranspiration and is defined as the green water footprint (Rodrigues et al., 2014;Veettil and Mishra, 2016).The amount of soil moisture or green water storage (GWS) that can support crop development and soil evapotranspiration, which represents the original soil water (SW) content, is referred to as "green water availability."It was acquired from the output of the SWAT model and applied to the water sustainability assessment (Abbaspour et al., 2015;Veettil and Mishra, 2018).

Calibration and validation of the integrated model
The model optimisation process is quite challenging and, to a certain extent, subjective in complex hydrology, especially in a region with inadequate multi-variable observed data.We therefore aim to produce a model whose simulation reflects the natural conditions of the watershed.As a first step, we integrated the Boruta random forest feature selection approach as an interface to assess and filter out redundant downscaled GCM data across the 210 selected grid points of the entire watershed.According to Lawal et al. (2023), this procedure was required to improve and preserve the internal variability of climate data signals that may be affected by reparameterisation to utilise the right number of GCM ensembles capable of evaluating the complex interactions within hydrologic models and ensure all uncertainty (conceptual model, input data, and parameters) ranges are mapped onto and bracketed by most of the observed data within the accepted range of uncertainty (Abbaspour et al., 2007), for an accurate understanding of long-term changes in baseline and projected watershed hydrology, especially in data-sparse and climate-sensitive regions, which are not adequately studied.
The one-at-a-time sensitivity analysis adopted for the preselection of sensitive model parameters was relied on here, partly due to the use of different observed data for calibration and validation processes from previous hydrologic studies of watersheds with similar features around the world (Abbaspour et al., 2017;2015;López et al., 2017;Jiang et al., 2020), variations in watershed features, and a homogeneous representation of the evapotranspiration characteristics capturing wet, moderate, and dry years across the available data period.The built-in sensitivity analysis tool utilised algorithms (SUFI-2) in SWAT-CUP and identified 19 parameters in the four sub-watersheds analysed, with different levels of sensitivities outlined in Table 4, and this may have alluded to the variations in land use and land cover and terrain and slope features across the watershed.The result of the model global sensitivity analysis of the calibration process across the four sub-watersheds analysed indicated that the combination of the parameters rendered some less sensitive in the simulation run.Thus, we categorised the level of parameter sensitivity based on the p-value of the model run as (p-value = 0) highly sensitive (**), (0 < p-value ≤10 −5 ) moderately sensitive (*), and (p-value > 10 −5 ) less sensitive.The sensitivity threshold applied indicated that the SCS runoff curve number for average moisture condition (CN2.mgt),moist bulk density (SOL_ BD().sol), saturated hydraulic conductivity (SOL_K().sol),and soil evaporation compensation factor (ESCO.hru) are the most important modelling parameters in the entire watershed, as shown in Table 4.
Other important sensitive parameters to note based on the subwatershed modelling process are highlighted with a single asterisk, and they differ across the watersheds, which may be related to the distinct morphological features that change the hydrologic behaviour.However, the optimised watershed's sensitive parameter ranges were varied, and this lack of uniqueness is a characteristic of the calibration of hydrologic models.This assertion was supported in Abbaspour et al. (2009), stating that there will be numerous such models with various parameter ranges if a model that fits the measurements exists.
The sub-watershed performance of the simulation process, as shown in Figures 4A-D, was calculated based on the observed and "best" simulated monthly actual evapotranspiration values of the objective function across the 59 measured points spatially distributed across the basin.The calibrated and validated model results depicted by the correlation coefficient (R 2 ) and Nash-Sutcliffe efficiency (NSE) criteria were in the range of R 2 = 0.69-0.88,NSE = 0.45-0.77and R 2 = 0.62-0.79,NSE = 0.34-0.63across all the watersheds, respectively.Moreover, a large number of the achieved model results fell within a satisfactory uncertainty range, with P-factor and R-factor values in the range of 0.68-0.93 and 0.73-1.31 in 83%, 67%, 85.7%, and 81.3% of the sub-watershed, respectively.There are a few sub-basins with poor simulated output whose R 2 and NSE values are as low as 0.25 and 0.14, respectively, although they exhibited a good representation of the data uncertainty band with encouraging P-factor and R-factor values in the range of 0.53-0.78and 1.21-1.95,respectively.Even the region with better objective functions faces difficulty in simulating and matching the peak values of the observed evapotranspiration values, and this may be due to simplification of the model by reaggregation of the land use features and inadequate data that account for some of the important basin-scale processes like lack of sufficient information, such as reservoir operations, dams, water transfers, and irrigation process, and this is generally classed as technical modelling uncertainties and natural heterogeneity in the hydrologic modelling process and has been corroborated in Schuol et al. (2008) and Abbaspour et al. (2015).However, our results are generally quite realistic for the basin-scale assessment of water-related hazards.
The obvious reason for the large variability in NSE estimates across the four basins or model results could be related to the "actual evapotranspiration only" calibrations.The modelling issue can be significantly "improved" by incorporating additional observation datasets into the distributed calibration modelling schemes (Kunnath-Poovakka et al., 2016;Rajib et al., 2016), where reliable data are made available.However, Koppa et al. (2019) argued that the ability of a model to simultaneously reproduce the included water balance components is not assessed by any limits of acceptability or error thresholds in multivariate calibration.
The result presented here is a step forward and improvement to earlier studies by Faramarzi et al. (2013) and Schuol et al. (2008), using a stand-alone SWAT model with the direct use of climate data, where the results from the studies indicate a poor watershed representation of the portion of the Lake Chad basin, which depicted a large uncertainty range with a correlation value greater than 0.6 at only 38% of the calibration point and poor objective function value (NSE) of between 0 and 0.2 in the larger Lake Chad region, and this may be attributed to the use of climate data with coarse resolution and distorted signals of watershed features where the complex orographic and land-sea distribution was not accounted for and may lead to local variation in basin water balance outputs and affect projected climate change assessment studies.Our result has shown a wider spatial coverage of good P-factor and R-factor values relative to the previous study, which was reported to be 0.6 or higher at only 61% and 1.5 at only 69% of the basin area.However, there are differences in model variables and parameters adopted for calibration, and these studies are conducted on a wider scale.Interestingly, the optimisation approach used here by incorporating machine learning into the integrated modelling strategy could reduce large model uncertainty propagation and provide a new direction to modelling issues in data-sparse regions with variable morphological features by providing highvalued water resource information at the local basin scale to drive sustainable water policy decisions.

Assessment of climate change impact on projected green and blue water resources
The assessment of climate change impact on the spatial and temporal distribution of blue and green water resources will be of great significance at the sub-watershed level to provide the necessary information for decision support for water authorities.The confidence in the output of the model results was reinforced by investigating variations in the projected mean changes in the near  5.
The results of the projected changes in precipitation indicated an increasing trend with an annual shift of 7.1% and 7.40% in the near future to 13.25% and 27.68% in the far future, associated with the increased warming scenario of average temperature in the range of 0.69 ± 0.15 °C and 0.89 ± 0.11 °C between 2021 and 2050 and 1.17 ± 0.22 °C and 1.78 ± 0.24 °C between 2051 and 2080 for SSP2(4.5)and SSP5(8.5),respectively.The range of projection here is similar and consistent with the reported findings of previous studies (Vizy et al., 2013;Sylla et al., 2016;Almazroui et al., 2020), and the projected changes may be linked to variability and changes in West African Monsoon features, like changes in the intensity and localisation of the African easterly waves and jets, monsoon flows, and integrated moisture flux divergence (Teichmann et al., 2013;Mariotti et al., 2014;Sylla et al., 2015).
The result of the changes in spatial and temporal distributions from the baseline of projected green and blue water components under the two climate change emission scenarios SSP2(4.5) and SSP5(8.5)for the near future (2021-2050) and far future (2051-2080) periods of the Yobe-Komadugu watershed is displayed in Figures 6-11, respectively.The watershed was chosen because it is characterised by incidences of climate extremes.The most recent and notable events were the reported heavy windstorm in April 2022 and the downpour in May 2022 that affected approximately 180 communities and resulted in the loss of lives, food, buildings, livestock, and farmlands (SEMA, 2022).In addition, it is an important agriculture production region and, as a main contributor of water resources to replenish the larger Lake Chad, understanding the hydrologic variability and present and future water resource environmental footprint sustainability status at the desired watershed levels will enhance adequate river basin planning and management.The delineated watershed boundary and the sub-basins are shown in Figure 5.

Spatial and temporal variations in green water flow under different climate change scenarios
The hydrological cycle is expected to intensify due to increased rainfall and a warmer atmosphere, as evidenced by the projected increase in atmospheric temperature as a result of CO 2 emissions, which indicates a greater evaporative demand and increases GWF,  ,D, accounting for 6.0% and 8.0% relative to the baseline period, while a projected increase was observed in spatial changes of mean GWF of 418.75 mm and 457.86 mm for SSP5(8.5),as shown in Figures 6C,E, accounting for 6.4% and 16.34% mean increases relative to the baseline period in near and far future time slices, respectively.Few exceptions were noted with contrasting GWF hydrologic features where declining GWF was predicted, especially in the downstream (sub-basin 20, 21, and 25-30) of the watershed.
Analysis of the distribution and changes in the mean monthly variation of GWF in the near (2021-2050) and far future (2051-2080) relative to the baseline (1982-2011) period, as depicted in Figure 7, showed a consistent projected increase between spring and summer months in the range of 12.95%-33.54%and 5.93%-31.02%(Figure 7A) in the near future for SSP2(4.5)and SSP5(8.5),respectively, while a projected increase in the mean monthly GWF in the range of 23.25%-65.76%and 26.39%-87.43%,as shown in Figure 7B, in the far future was estimated for SSP2(4.5)and SSP5(8.5),respectively.However, there is generally a projected sharp decline of GWF in autumn and winter seasons across the basin, projected to be approximately 53.38% and 54.10% in the near future and 54.72% and 36.0% in the far future for SSP2(4.5)and SSP5(8.5),respectively.The reason for the enhanced projected GWF may be related to the increased temperature in the tropical regions between April and September due to an increase in CO 2 emission concentration.The projected increase in precipitation events also enhances vegetation cover and the activity of actual plant transpiration.

Spatial and temporal variations in green water storage under different climate change scenarios
The result of the mean annual GWS, represented by the soil moisture conditions, which changes over time, indicated a substantial projected decline in all sub-basins of the watershed from the baseline period, with an annual average value of 341.89-324.79mm (Figure 8A) and 302.43 mm for SSP2(4.5),as shown in Figures 8B, D, and 299.45 mm and 293.45 mm for SSP5(8.5) in Figures 8C,E, accounting for the projected decline of 4.99% and 11.54% in basin GWS for SSP2(4.5)and 12.41% and 14.17% for SSP5(8.5) in the near and far future periods, respectively.This decline may not be unconnected to the huge overexploitation of groundwater resources for irrigation practices by further lowering the water table level in the basin, and possibly increasing surface air temperature could also affect soil water flow regimes, thereby increasing the groundwater evaporative demands.
Analysis of the distribution and changes in the mean monthly variation of GWS of the near and far future periods relative to the baseline scenario, as depicted in Figure 9, showed a consistent projected decline in most months in the range of 2.75%-44.11%at a mean rate of 6.95 mm/month and 24.97%-69.99% at a mean rate of 15.65 mm/month (Figure 9A) in 2021-2050 and 5.47%-54.45% at a mean rate of 6.28 mm/month and 26.47%-70.59%at a mean rate of 15.85 mm/month (Figure 9B) in 2051-2080 for SSP2(4.5)and SSP5(8.5),respectively.However, there is an exception in the monsoon season across the basin, with projected increases in GWS of approximately 53.28% (18.58 mm/month) and 60.02% (20.71 mm/month) in 2021-2050 and 15.87% (5.72 mm/ month) and 56.83% (23.91 mm/month) in 2051-2080 based on the two emission scenarios, respectively.The projected increase in the monsoon season is generally significant between the month of July and September, which is associated with high rainfall intensities and interannual seasonal variability, as corroborated by Almazroui et al. (2020).

Spatial and temporal variations in blue water flow in different climate change scenarios
The climate change impact on spatial and temporal variations in blue water flow was quantified at the sub-basin level in the watershed.BWF showed high variability, and the dynamics are quite distinct in the upstream and downstream parts of the watershed.For example, BWF is projected to decline at mean annual rates of 38.9 mm/year and 37.25 mm/year at the subbasin (1-13, 15, 18, and 22) upstream, while associated projected increases of 54.66 mm/year and 55.27 mm/year at sub-basin (14, 16, 17, 19-21, and 23-30) downstream, in 2021-2050, as shown in Figures 10B,C, for SSP2(4.5)and SSP5(8.5),respectively, from the baseline period  depicted a mean annual BWF of 37.83 mm for the entire basin (Figure 10A).Similarly, the dynamics remain the same for the far future but with reduced magnitudes of decline from the baseline of 25.98 mm/year and 29.69 mm/year upstream and increased magnitudes of 77.23 mm/ year and 98.97 mm/year downstream in 2051-2080 (Figures 10D, E) for SSP2(4.5)and SSP5(8.5)emission scenarios, respectively.However, analysis of changes in BWF in the entire watershed depicted projected increases from the baseline period of 2.85 mm/year and 4.76 mm/year in 2021-2050 and 20.21 mm/ year and 52.01 mm/year in 2051-2080 for the CO 2 emission scenarios, respectively.
Analysis of the distribution and changes in the mean monthly variation in BWF of the near and far future periods relative to the baseline scenario (Figure 11) showed that the projected decline is prevalent between months in the winter and spring seasons (Figures 11A, B), where precipitation events are non-existent or sub-optimal in the tropical regions.However, the summer and autumn months showed a projected increase in BWF relative to the baseline period, which may be associated with increased monsoon rainfall events and intensities, thereby intensifying wet extremes and dry spell lengths by shortening the Sahel rainy seasons, as predicted in previous studies (Sarr, 2012;Sylla et al., 2016;Almazroui et al., 2020).
The projected declines oinBWF are in the range of 0.17-4.88mm/month and 0.17-6.42mm/month (Figure 11A) between 2021-2050 and 0.13-6.0mm/month and 0.07-5.49mm/ month (Figure 11B) between 2051-2080 for SSP2(4.5)and SSP5(8.5),respectively.However, there is a generally sharp increase in BWF in the monsoon season across the basin, especially in August, with a projected p of up to 4. 76 and 4.96 mm/month between 2021-2050 and 11.66 and 23.8 mm/ month between 2051-2080 based on the two emission scenarios.These sharp changes in BWF across the months validate the significant increase in heavy rainfall events and changes in seasonality that exacerbated incidences of frequent weather extremes, i.e., flooding and droughts in the Sahel region (Boko et al., 2007;Niang et al., 2014).

Climate change impact on and socioeconomic drivers of spatial variation in projected green and blue water sustainability
Green and blue water sustainability was determined at the sub-basin scale for baseline  and projected changes in the near (2021-2050) and far (2051-2080) future based on the two CO 2 emission scenarios using the Sustainability index, as shown by the spatial maps in Figure 12 and Figure 13, respectively.The baseline period showed that green water is moderate to extremely sustainable (ES) in seven sub-basins, accounting for 16.50% of the watershed area, with Sustainability index (SI) ranging from 0.19 to 0.3, 0.5 to 0.71, and 0.81 to 1.0 in sub-basins (10 and 27), (7 and 26), and (1, 2, and 8) (Figure 12A), respectively.The remainder of the watershed was characterized by a low level of green water sustainability, except subbasins 3, 20, and 24, which are high-risk geographic hotspots.The favourable Sustainability index of the sub-basins located upstream of the watershed may be due to land use and land cover features, which are a mixture of scanty vegetation and bare land with associated low green water footprints as a result of little to non-existent rainfed agricultural practices.
Analysis of the projected green water sustainability indicated that there is a 1-2-fold shift in the sustainability threshold across the basin, with a steady to sharp decline of the favourable basin green water sustainability status from the baseline of 16.50% to 15.9% for SSP2(4.5)(Figure 12B) and 0% for SSP5(8.5) in 2021-2050 (Figure 12C), and the far future also indicated a decline of the watershed green water sustainability threshold of 1.86% for SSP2(4.5)(Figure 12D) and 0% for SSP5(8.5)(Figure 12E) in 2051-2080 of the watershed area.The geographic hotspots (SI < − 0.5) are generally situated upstream of the watershed in all scenarios and are an indication that climate change may have a more profound effect on the high to extremely unsustainable green water status, which is evident from the continuous increase in green water flow and decreased green water storage, which is a phenomenon that may have been causing increased humidity and affecting the timing, spatial pattern, and intensity of rainfall in a basin, as suggested by Du et al. (2018), and as CO 2 emissions rise, the efficiency of the utilisation of water in the photosynthetic process increases, resulting in CO 2 fertilisation (Donohue et al., 2017).
The changes in blue water sustainability in the watershed for the baseline period in Figure 13A were assessed to be 15.61%HS, 5.4% MS, 15.65% MU, and 63.34% HU-EU (potential blue water geographic hotspots) of the watershed area.The high level of blue water sustainability is predominant upstream of the watershed; however, sub-basins 1, 2, and 6 are shown to be highly unsustainable, which may be related to the absence of viable stream channels and the high rate of evaporative demands, which characterized the basin as semi-arid with severe drought events and high interannual rainfall variability due to the effect of Intertropical Convergence Zone (ITCZ) migration (Thompson and Polet, 2000).
Analysis of the influence of climate change and changes in socioeconomic activities on projected blue water sustainability indicated further increases in blue water geographic hotspots across the watershed area of 71.53% and 75.38% (Figures 13B, C) between 2021 and 2050 and 73.51% and 76.35% (Figures 13D, E) between 2051 and 2080 for SSP2(4.5)and SSP5(8.5),respectively.Our model results showed that the blue water security hotspots regions have negative SIs ranging from 0.5 up to as high as 16.58 for both SSP2(4.5)and SSP5(8.5),respectively.The blue water's continued unviability may be caused by major river systems drying up and reduced flows brought on by the overuse of groundwater and surface water resources as a result of intensive irrigation practices.These consistent patterns could be scaled with the SSP emission scenarios, which have shown a strong correlation between anthropogenic GHG emissions and potential environmental impacts, as corroborated by Adeyeri et al. (2019).Some of the viable blue water sustainable subbasins are characterised by interconnected large streams that form the Komadugu-Yobe and Komadugu-Gana river sub-systems that support different ecological processes and socioeconomic activities, such as fish production, pastoralism, and forest regeneration, with a population of over 20 million people depending on these activities in the basin.The continued decline in sustainable blue water may be worrisome to local and national strategic freshwater management plans and a threat to diplomatic relationships among countries that share the basins.

Climate change impact on and socioeconomic drivers of the temporal variability of projected changes in green and blue water sustainability
The green and blue water sustainability assessment at the local basin scale will require an understanding of the temporal pattern of freshwater circulation at a monthly timescale to improve and stabilize the basin ecosystems.Figures 14A-E show a heat map of the severity of the baseline and projected monthly changes in green water sustainability across the 30 sub-basins of the watershed.The results indicated that green water is more sustainable in the preand post-monsoon months, with indices in the range of 0.15-0.95(Figure 14A), although there is a consistent projected change in the sustainability status from moderately unsustainable to highly and extremely unsustainable green water in the monsoon months between April and June, indicating a transition to potential geographic water sustainability hotspots across all the climate change scenarios, as shown in Figures 14B-E.
However, a gradual change in the favourable green water sustainability status emerges in the July-August with 23% (Figure 14B) and 80% (Figure 14C) in the near future and 16.7% (Figure 14D) and 70% (Figure 14E) in the far future for SSP2(4.5)and SSP5(8.5)emission scenarios, respectively, across the entire watershed, and this may be connected to the sudden projected increase in rainfall events and intensities in the semi-arid climate.Assessment of the baseline blue water sustainability status (Figure 15A) indicated that sub-basins 1-19 (upstream) showed a moderate-to-high blue water sustainability in the monsoon months of May-September, with indices that ranged from 0.47 to 0.98; conversely, sub-basins 20-30 (downstream) generally exhibit the potential for geographic blue water sustainability hotspots, indicating highly to extremely unsustainable status 92.2% of the time during the monsoon period with indices in the range of 0.78 to-4.2.However, our analysis of monthly blue water availability indicated that the environmental flow requirement to maintain a healthy aquatic ecosystem cannot be met in 60.3% of the months in the baseline period and have been generally identified during the low-flow periods between the months of November and March and should have been classed as a "no abstraction period," and streams should be protected across the basin.
The unsustainable blue water status may be closely related to the mass exploitation of ground and surface water for domestic and agricultural (irrigation) practices, high rate of surface water evaporation, and plant transpiration due to increased surface air temperature that triggered declining runoff contribution and shrinkage to the major Lake Chad, which is consistent with the findings of Lemoalle et al. (2012) andZhu et al. (2017).The assessment of the model output for blue water sustainability status showed projected increases in the "no abstraction period" to 74.4% (Figure 15B) and 65% (Figure 15C) for the near future and 66.11% (Figure 15D) and 65.3% (Figure 15E) in the far future for CO 2 emission scenarios SSP2(4.5) and SSP5(8.5),respectively.
The projections here indicate that the gradual increase in precipitations may have a direct impact on the sustainability of green water resources, where the monsoon months of July-August experienced a projected change in green water sustainability status from MU to MS, as shown in Figures 14C, E. However, blue water sustainability status tends to be degraded relative to the baseline in all emission scenarios considered, and a favourable blue water status may only be achieved through enforcing regulations to protect intense groundwater withdrawal, especially during low-flow periods, and exploring innovative river basin water conservation strategies.According to the anticipated changes in the sustainability of green and blue water, more than half of the watershed will be ecologically fragile.Without prompt action by water authorities to improve ecological resilience and adaptation to reduce the shrinkage of wetlands and larger Lake Chad in the face of changing climate and socioeconomic activities, some regions' freshwater geographic sustainability hotspot statuses may be beyond the recovery points, which will make restoration quite difficult.

Discussion
This study has demonstrated that the introduction of data pruning through machine learning in the hydrologic modelling process to enhance the previous frameworks will be beneficial, especially in data-sparse watersheds, where the climatological dynamics are highly variable and vulnerable to climate change.Although it is a challenge to understand the model stage that provides the majority of the overall uncertainty, as alluded to by Martin et al. (2020), various sources of uncertainty in climate impact modelling for water resource assessment include GHG concentration pathways (scenario uncertainty), GCM parameterisation (input uncertainty), and hydrologic model internal processes (structural uncertainty) and their interlinkages.According to Schewe et al. (2014), regional or local application may vary across different parts of the globe.The modelling scheme objective of managing scenarios and input uncertainty was prioritised in this study because they are generally classed as medium to high contributors of error propagation (Fang et al., 2018) and their management can cover for the limitations of inadequate data in order to prevent amplifying structural model uncertainty, although these uncertainties will inevitably remain and need to be addressed in practice through appropriate management actions by improved process understanding and adequate implementation of expert feedback on parameter calibration and model sensitivity mapping to narrow their effect in impact studies, as corroborated by Smith et al. (2018).
The relationship between the simulated and observation data based on the calibration statistics showed a satisfactory performance, although varied across the sub-watersheds.The result showed that a better uncertainty range (Figures 4A-D) was achieved with wider spatial coverage represented by the P-factor and R-factor values in the basin relative to the regionalisation modelling approach adopted in previous studies (Schuol et al., 2008;Faramarzi et al., 2013), which tend to be unrealistic and produce unsatisfactory performance and uncertainty range.This approach has often produced contradictory results, as suggested by Oudin et al. (2008), which may portray poor water resource dynamics and lead to a policy decision that affects resilience planning, especially in watersheds characterised by high climate variability.
The spatial variation in the green and blue water flow and storage in this study, as illustrated in Section 3.2, are quite intensified downstream, and this may be attributed to the presence of arable land, high population density, and intense agricultural activities, and the trends in the temporal variation in monthly changes are synonymous with semi-arid regions, as corroborated by Muratoglu et al. (2022) in the Euphrates.The projected unsustainable green and blue water status necessitates the review of water use policies, especially awareness programmes for local farmers on the willingness to change and adopt farming practices and strategies that lessen pressure on green water resources to prevent further intensification of current local basin water conflicts, as suggested by Li et al. (2020), failure of which will undoubtedly perpetuate the great uncertainty over how water availability and demand will change in the future (Sorg et al., 2014), and appropriate environmental flow regulations are critical for improving projected blue water sustainability status, especially in critical periods of the year, as suggested in previous studies (e.g., Hejazi et al., 2014).

Conclusion and future work
In this study, we developed a framework by integrating machine learning-based Boruta random feature selection as an input data refining process with process-based SWAT hydrologic models to optimise the calibration process.The accepted or rejected model parameter solutions based on a set error threshold were used to test whether models developed based on this framework can simultaneously improve baseline and future climate projections and accurately simulate water balance components in watersheds with insufficient ground-based modelling data, which is necessary for successful and reliable hydrologic modelling at the local scale.
Applying the framework to four sub-watersheds that form the larger Lake Chad basin defined by distinct morphological properties, we found that the model simulates the hydrologic fluxes of ET with varying degrees of acceptability.While ET can be simulated accurately, there are significant trade-offs in parameter sensitivity ranges in the calibration process across the sub-watersheds.Some of the key findings and conclusions in the research are summarized in this section.The integrated hydrologic modelling process in this study can reliably represent the spatiotemporal distribution of the watershed hydrology, irrespective of the different morphological characteristics of the four sub-watersheds, and reduce uncertainty from the input data (e.g., precipitation and temperature), which are the main drivers of water balance models.The feature selection mechanism could reduce uncertainty propagation within acceptable thresholds in the data input process and provide ensembles whose projections can be relied upon and consistent with previous studies for water security assessment.
Green water is the dominant freshwater component across the basin relative to blue water, and climate change may be a significant factor influencing the spatial and temporal changes in projected green water sustainability status.The combination of socioeconomic drivers and climate change may have a significant impact on the projected blue water sustainability status across the basin.High GWF (ET), temperature, and the flat terrain in the Yobe-Komadugu watershed may affect the spatial distribution of projected natural runoff distribution, and thus, the projected blue water footprint exceeds the blue water availability, and human water use can only be met by using up the environmental flows, resulting in the degradation of rivers and groundwater potential.Additionally, given the WF hotspots found in this study, new appropriate water abstraction targets should be quantified as part of future research, as well as its impact on blue water, which has a higher opportunity cost due to its potential as an input in many supply chains for emerging industries other than agriculture to help improve water management efforts at the local river basin scale.
We should also point out that the results and conclusions reported in this study are based on certain configurations of the model parameters, input dataset, reference data, and hydrologic model.The established modelling framework, however, is independent of model and data type and may be used to assess the effectiveness of hydrologic state variables and fluxes at small-scale watershed levels.Nevertheless, some obvious limitations are that the study does not consider the effects of some watershed management practices like irrigation withdrawals and reservoir regulations due to unavailability of data and future topographic changes in terrain and slope, which will be significant driving factors governing the hydrologic response to land use and land cover changes.
This research focuses on blue and green water sustainability; however, efforts are required to extend the current work to grey water assessment by developing innovative ways and building observation datasets to further extend the model calibration and validation efforts to increase the confidence of hydrologic variable outputs required to reliably measure and quantify grey water footprint and sustainability for managing wastewater discharge and the application of fertilizers and pesticides for water pollution control.The water Sustainability index and status could be useful in the development of effective local river basin policies and regulations.Future work should involve addressing some of the identified by extending the current study through the incorporation of more water balance variables into the optimisation process and analysing their effects on the general trade-offs in the accuracy of modelling output.

FIGURE 2
FIGURE 2 Description of morphological data in the study: (A) digital elevation model and meteorological points, (B) soil types, and (C) land use and cover data.

FIGURE 3
FIGURE 3Schematic overview of the integrated SWAT and BRF modelling framework (IMF) for reliable water balance modelling in data-sparse regions.

FIGURE 6
FIGURE 6Changes in the spatial distribution of annual green water flow in the Yobe-Komadugu watershed.

FIGURE 7
FIGURE 7 Changes in the temporal distribution of mean monthly green water flow during (A) 2021-2050 and (B) 2051-2080 in the Yobe-Komadugu watershed.

FIGURE 8
FIGURE 8Changes in the spatial distribution of green water storage in the Yobe-Komadugu watershed.

FIGURE 9
FIGURE 9 Changes in the temporal distribution of the mean monthly green water storage during (A) 2021-2050 and (B) 2051-2080 in the Yobe-Komadugu watershed.

FIGURE 10
FIGURE 10Changes in the spatial distribution of blue water flow in the Yobe-Komadugu watershed.

FIGURE 11
FIGURE 11 Changes in the temporal distribution of the mean monthly blue water flow during (A) 2021-2050 and (B) 2051-2080 in the Yobe-Komadugu watershed.

FIGURE 12
FIGURE 12Spatial risk map of changes in baseline and projected green water environmental sustainability in the Yobe-Komadugu watershed.

FIGURE 13
FIGURE 13Spatial risk map of changes in baseline and projected blue water environmental sustainability in the Yobe-Komadugu watershed.

FIGURE 14
FIGURE 14Heat map showing temporal changes in mean monthly baseline and projected green water environmental sustainability in the Yobe-Komadugu watershed.

FIGURE 15
FIGURE 15Heat map showing temporal changes in mean monthly baseline and projected blue water environmental sustainability in the Yobe-Komadugu watershed.

TABLE 1
Input data required for hydrologic model development.

TABLE 3
Sectoral water use information in Chad and Nigeria.

TABLE 4
Model sensitive parameters, ranges, and best-fitted values at sub-watersheds.

TABLE 5
Median of the projected changes in annual precipitation and temperature in the Yobe-Komadugu watershed.