- 1 Department of Civil and Environmental Engineering, Old Dominion University, Norfolk, VA, United States
- 2 Department of Engineering Technology, Old Dominion University, Norfolk, VA, United States
Bridge-pier scour is a leading cause of flood-induced bridge failure, yet practice still lacks transparent, physics-informed tools that link data-driven prediction with design guidance. This study develops an interpretable, physics-aware machine-learning framework to predict equilibrium scour depth and translate those predictions into actionable strategies for flood-resilient infrastructure. Using the 2014 U.S. Geological Survey Pier-Scour Database (569 laboratory cases), five models: Gradient Boosting, AdaBoost (Tree), XGBoost, Gaussian Process (RBF kernel), and Kernel Ridge (polynomial), were trained and evaluated with K-fold cross-validation. Model performance was evaluated using R2, RMSE, and MAE. Gradient Boosting performed best, achieving training and testing R2 of 0.99 and 0.96, a near-ideal parity fit, and consistent accuracy across folds. Interpretability is provided by SHAP, whose attributions align with hydraulics; the pier width normal to flow accounts for 70.6% of the total importance in predicting scour depth. Predicted scour is mapped to four scenario envelopes that capture rare, peak, and sustained hydraulic extremes and yield clear design checks for flood resilience. A physics-based imputation scheme for sediment critical velocity and duration of flow is integrated in the framework so that missing inputs are handled in a hydraulically consistent way. The developed models are deployed in an interactive web app, allowing practitioners to obtain code-free scour predictions across all learners. Applied to the Knik River bridge and benchmarked against related work, the framework improves accuracy and provides actionable margins for design verification, maintenance prioritization, retrofit planning, emergency response, and transparent risk communication.
1 Introduction
Bridge failures severely disrupt national transportation networks and pose serious threats to hydraulic infrastructure and human life. Beyond the immediate injuries and fatalities, the loss of service can significantly hinder economic growth (Cook et al., 2015; Diaz et al., 2009). Historical examples include the 1907 Quebec Bridge collapse, which killed 75 workers during construction (Pearson and Delatte, 2006), and the 1967 Silver Bridge failure, which claimed 46 lives in service (Harik et al., 1990). More recent incidents, such as the 2007 Tuojiang Bridge collapse during construction, which caused 64 deaths, 22 injuries, and an estimated direct economic loss of 39.7 million yuan (Tang and Huang, 2024), and the 2018 Morandi (Polcevera) Viaduct failed in Italy, which resulted in 43 deaths and approximately 100 million yuan in losses (Morgese et al., 2020). According to (Zhang et al., 2022), natural hazards account for more than 50% of all bridge failures. The increasing frequency and intensity of natural hazards tied to climate change, along with global population growth and urbanization, are amplifying risks to civil infrastructure worldwide.
Among these hazards, flooding remains the leading cause of bridge damage and failure (Argyroudis and Mitoulis, 2021; Ismael et al., 2024). Fu et al. (2012) and Xu et al. (2016) analyzed Chinese bridge collapses from 2000–2012 and found that nearly 46% were caused by floods. Similarly, a U.S. survey identified over 500 U.S. bridge failures between 1989 and 2000, with 48.31% attributed to flooding (Wardhana and Hadipriono, 2003). NOAA (2015) reported a 612% rise in hydraulic damage rates compared with the 1960s, reflecting the growing exposure of bridges to extreme hydrologic events. Hydraulic bridge failures are primarily driven by scour, flooding, or ice-floe actions, and the associated risk perceptions in construction and infrastructure settings affect how these hazards are managed (Ismael and Shealy, 2018). Intensifying extreme rainfall and flood events associated with climate change are direct drivers of hydraulic failures, precipitation-related risks to bridge safety and performance are expected to grow (Nasr et al., 2021), highlighting how uncertainty and risk-based decision processes influence infrastructure vulnerability (Shealy et al., 2017). The AASHTO LRFD Bridge Design Specifications state that most bridge failures in the U.S. and elsewhere are due to scour, the dominant hydraulic effect on bridges (AASHTO, 1998). Scour develops during floods when high-velocity flows increase near-bed shear stresses that mobilize and remove sediment around foundations (USGS, 2016). Specifically, scour-related failures due to flooding are estimated to account for about 60% of bridge collapses nationwide (Wang et al., 2017), highlighting the need to explicitly incorporate scour effects into bridge design and evaluation.
Local scour refers to sediment erosion and transport that develop around hydraulic and marine structures under flowing water. When flow enters a pier, streamlines separate and form a complex three-dimensional flow field characterized by a horseshoe-vortex at the upstream face, a downward jet, and wake vortices downstream (Chen H. et al., 2025). These vortices significantly increase local bed shear stress relative to the approach flow, triggering sediment entrainment and progressive degradation of the bed topography (Ma et al., 2024). The resulting scour can undermine structural stability and has been linked to several major bridge collapses (Yang et al., 2018). As scour deepens, the exposed length of the foundation increases, reducing lateral stiffness, decreasing buckling resistance, and lowering the overall factor of safety (Anisha et al., 2022). Advancing the mechanistic understanding and quantification of local scour is therefore essential for ensuring the safe operation of hydraulic infrastructure and for designing bridges that can withstand extreme flood events.
In parallel, artificial intelligence (AI) and machine learning (ML) have emerged as transformative tools across engineering disciplines, driven by advances in data availability and computational power. Machine learning, a core branch of AI, can autonomously detect complex patterns in high-dimensional data and improve performance without explicit programming (Rahman and Chavan, 2025). ML has demonstrated success across diverse domains, including engineering education (Ismael, 2023), medical diagnostics (Asif et al., 2025), accounting (Magazzino and Haroon, 2025), chemistry (Seal et al., 2025), and civil engineering (Khatir et al., 2025). Its growing popularity in civil and hydraulic engineering is largely due to its ability to process large, heterogeneous datasets and model nonlinear relationships that are difficult to capture using traditional empirical or analytical methods (Aldoseri et al., 2023).
Recent developments in hydraulic engineering have accelerated the use of soft computing and ML algorithms for scour prediction (Akib et al., 2014). Pal et al. (2012) applied an M5 model tree to multi-dimensional datasets and achieved performance comparable to a back-propagation neural network, outperforming empirical formulas. Cheng et al. (2015) coupled an Evolutionary RBF Neural Network with an Artificial Bee Colony optimization algorithm, producing higher accuracy than both AI baselines and conventional equations. Choi et al. (2017) employed an adaptive neuro-fuzzy inference system (ANFIS) using five key variables, flow depth, pier width, critical velocity, sediment size, and mean velocity, achieving strong predictions of equilibrium scour depth compared with artificial neural networks and empirical relations.
Although these studies have advanced scour prediction, a critical gap remains in understanding the mechanistic importance and interaction of input variables, particularly under extreme hydraulic forcing. Most existing models focus on predictive accuracy but provide limited insight into the physical significance of features or their relationships with governing hydraulic laws. This study directly addresses that gap by integrating physics-informed constraints into data-driven modeling and applying interpretable machine learning techniques to reveal variable importance and dependencies consistent with hydraulic theory. This study also implements rigorous generalization assessment through (K)-fold cross-validation, supplemented by hold-out testing and targeted stress tests, to demonstrate accuracy and robustness in both training and testing regimes.
To translate recent advances in machine learning into engineering practice, this study develops a physics-informed framework for predicting bridge-pier scour depth. It further operationalizes these predictions into a flood-resilient design tool by defining extreme-condition scenario envelopes that capture rare, peak, and sustained hydraulic events. The framework quantifies the mechanistic relevance of key hydraulic variables through SHAP (Shapley Additive Explanations) to ensure that data-driven predictions align with physical laws and interpretable scour mechanisms. The framework is demonstrated using the Knik River bridge piers in southcentral Alaska to estimate credible upper bounds on scour and to evaluate structural sufficiency under flood conditions. This approach enables early-phase design to incorporate defensible maximum scour depths, supporting more flood-resilient, cost-effective, and sustainable infrastructure towards flood resilience. For existing assets, the same toolchain yields a quantitative scour-based flood-risk criterion to prioritize monitoring, proactive maintenance, or decommissioning where warranted. This approach enhances predictive reliability, explains why specific drivers matter, and bridges data-driven inference with first-principles hydraulics to support decision-oriented design, monitoring, and maintenance for flood-resilient and sustainable infrastructure.
2 Materials and methods
Figure 1 presents a comprehensive summary of the methodology implemented in this investigation. It details the entire workflow, beginning with data collection and preprocessing, and proceeding through successive stages of model training, validation, interpretability analysis, and scenario generation. This schematic representation clarifies the sequential relationships among each methodological component and supports reproducibility of the study’s approach.
2.1 Data sources
The dataset used in this study is the 2014 U.S. Geological Survey Pier-Scour Database (PSDb-2014), compiled by Benedict and Caldwell (2014) through a systematic literature review of published laboratory and field measurements of pier-scour. For the present analysis, the laboratory subset consisting of 569 data points was utilized. These data encompass a wide range of hydrologic, sediment, and geometric conditions, representing diverse flow regimes and experimental configurations. The dataset is widely regarded as a benchmark source for scour prediction studies and provides a reliable foundation for reproducible model development and comparison with prior research.
2.2 Variables and notation
The equilibrium scour depth at a bridge pier is governed by three categories of parameters: (i) bed-material properties, (ii) water inflow conditions, and (iii) pier geometry (Pizarro et al., 2020). In alignment with this framework, the principal variables adopted in this study include the pier width normal to flow (bn), approach flow velocity (Vo), sediment critical velocity (Vc), approach flow depth (yo), median sediment size (D50), geometric standard deviation of the sediment-size distribution (σg), and the duration of flow/scouring (T). Accordingly, the equilibrium scour depth at the pier (ys) can be expressed as a function of these variables, as shown in Equation 1.
2.3 Statistical evaluation of the dataset
The histograms in Figures 2A–H illustrate the distributional features summarized in Table 1, revealing the effective range of each variable. The pier width
Figure 2. Histograms of input and output features, (A) Pier width normal to flow bn (ft), (B) Approach flow velocity Vo (ft/s), (C) Sediment critical velocity Vc (ft/s), (D) Approach flow depth yo (ft), (E) Median sediment size D50 (mm) (F) Geometric standard deviation of the sediment-size distribution σg, (G) Duration of flow T (min), and (H) Scour depth at the pier ys (ft).
Given the extensive positive skew across several variables (
The Pearson correlation heatmap (Figure 3) indicates two dominant patterns among the study variables. The critical velocity
Figure 3. Pearson correlation heatmap for the laboratory dataset showing linear correlations among input variables and output feature ys (ft).
2.4 Preprocessing
All variables were standardized with consistent symbols and units, including bn (ft), Vo (ft/s), Vc (ft/s), yo (ft), D50 (mm), σg (−), T (min), and ys (ft) by trimming headers and mapping aliases; columns were then forced to numeric with placeholders treated as NaN (Jacobson et al., 2024; Kang, 2013; Peng et al., 2023). For the laboratory dataset, rows containing missing values were removed to ensure that descriptive statistics and model training were based exclusively on observed measurements. The resulting cleaned dataset was then stored and used to develop the scour-depth prediction models (Kang, 2013; Peng et al., 2023).
Because the compiled measurements originated from multiple laboratories, the raw distributions contain outliers and scale heterogeneity. We therefore applied normalization and standardization (feature scaling) before modeling to improve numerical conditioning and robustness. As evident in Table 1 and the histograms in Figures 2A–H, the dataset is biased; certain value ranges are over-represented while others are limited or absent, which can bias learning and increase the risk of overfitting (Charilaou and Battat, 2022). The adopted preprocessing workflow, including consistent labeling, type coercion, outlier-aware scaling, and missing-data handling, was therefore implemented to reduce these effects and enhance generalization performance in the predictive modeling stage.
For the Knik River piers, all records were retained, and units were harmonized across variables. A new hydraulic descriptor, the Froude number, was engineered as given in Equation 2 (CHOW, 1959).
where
Two variables frequently missing from the records, Vc and T, were not statistically imputed. Instead, they were estimated using physics-integrated formulas within three defined hydraulic scenarios (baseline, worst-case, and extreme-quantile) to enable risk-aware model evaluation. For modeling, tree-based algorithms operated on raw-scale variables, whereas kernel-based algorithms applied in-pipeline standardization. All cleaned and scenario-augmented datasets were archived to ensure full transparency and reproducibility.
2.5 Machine learning models
2.5.1 Gradient boosting
Gradient Boosting formulates supervised learning as a process of functional gradient descent, building a strong predictor by sequentially adding weak learners (typically shallow decision trees) fit to the negative gradients of a specified loss function (Friedman, 2001). This framework unifies boosting methods across differentiable objectives for both regression and classification tasks. Model generalization is regulated through shrinkage, tree-depth constraints, subsampling, and early stopping, which collectively limit variance and overfitting while allowing the model to capture nonlinear relationships and higher-order interactions with minimal feature engineering (Friedman, 2002).
Rooted in the principle of iteratively focusing on difficult-to-predict samples (Freund and Schapire, 1995), Gradient Boosting remains one of the most effective algorithms for structured tabular data where accuracy and robustness are critical. In this study, Gradient Boosting was selected as a primary benchmark due to its balance between interpretability and predictive power, making it well suited for capturing the nonlinear hydraulic interactions underlying bridge-pier scour.
2.5.2 AdaBoost
AdaBoost (adaptive boosting) constructs a strong predictive model by sequentially combining multiple weak learners, typically shallow decision trees. Schapire (1990) demonstrated that boosting weak learners can yield a strong classifier. AdaBoost trains an initial tree, then reweights the training samples, reducing weights on correctly classified points and increasing weights on misclassified ones, so subsequent trees focus on the hard cases (CAO et al., 2013). This error-driven reweighting and retraining repeats over multiple rounds, and the final predictor is a weighted sum of all weak learners, with weights reflecting each learner’s accuracy, which together improve performance iteratively (Schapire, 2013).
2.5.3 XGBoost
XGBoost is a state-of-the-art gradient-boosting framework that ensembles regression trees to deliver efficient and accurate prediction on structured tabular data (Liang et al., 2020). It builds trees sequentially, with each new tree fitted to the residuals of the current model to minimize the overall objective, i.e., an additive, gradient-descent interpretation of boosting (Jin and Agrawal, 2003). The training process employs a second order (Newton) approximation of the loss function with explicit (L1/L2) regularization on leaf weights, supports stochastic subsampling of rows and columns, and is optimized for parallel and distributed computation. Learning typically terminates after a predefined number of trees or once additional iterations provide negligible improvement (Shahani et al., 2021).
XGBoost has been widely adopted for its combination of speed, accuracy, and scalability across engineering and scientific applications (Liang et al., 2020). In this study it was used to evaluate performance improvements gained from regularization and second-order optimization, and to test the model’s capability to capture complex nonlinear interactions among hydraulic and geometric variables influencing scour depth.
2.5.4 Gaussian process regression (RBF kernel)
Gaussian Process Regression (GPR) is a Bayesian, nonparametric modeling approach that defines a prior directly over functions and produces both point predictions and calibrated uncertainty estimates.
Using the radial basis function (RBF) kernel, GPR encodes smooth, stationary relationships in the data. Also, its key hyperparameters (signal variance, length scale, noise level) are typically learned by maximizing the marginal likelihood, yielding models that balance data fit with complexity automatically (Rasmussen and Williams, 2005). RBF–GPR is well-suited to moderate-sized datasets where uncertainty quantification matters, though exact training scales cubically with the number of samples; sparse and variational methods mitigate this cost while preserving accuracy (Neal, 1996). These properties make RBF–GPR a principled baseline for nonlinear regression and a robust comparator to tree-based ensembles in scientific and engineering prediction tasks. In this study, GPR provides both high-fidelity predictions and interpretable uncertainty bounds essential for assessing confidence in scour-depth estimation under variable hydraulic conditions.
2.5.5 Kernel ridge regression (polynomial kernel)
Kernel Ridge Regression (KRR) combines ridge regression’s L-2 regularization with the kernel trick, providing a convex, closed-form solution in the dual space. This formulation enables the learning of nonlinear relationships without explicitly generating polynomial features (Gammermann, 2000). KRR captures interaction terms up to a specified degree using a polynomial (Poly) kernel, producing a global, smoothly varying fit with power-law extrapolation tendencies. Model control is achieved through the ridge penalty and kernel hyperparameters (degree, scale, offset), which are typically optimized via cross-validation. KRR is particularly effective for small-to mid-size datasets because it is stable, non-iterative, and avoids local minima. However, its dense
2.6 Model development
After preprocessing, the standardized laboratory dataset was randomly partitioned into 80% training and 20% testing subsets, following recommendations by Bichri et al. (2024). A fixed seed ensured full reproducibility. To better preserve the empirical distribution of the target variable (ys), stratified random splitting was performed using binned scour depths, ensuring that both training and test sets captured the complete range of scour magnitudes and covariate combinations. This strategy mitigates selection bias and supports more reliable model generalization (Kapoor and Narayanan, 2023).
Model training and any preprocessing operations that could introduce data leakage (e.g., scaling for kernel-based methods) were encapsulated within scikit-learn pipelines and fitted exclusively on the training data. The held-out test data were never used during training, hyperparameter tuning, or preprocessing (Wu et al., 2025). This setup provides an unbiased assessment of out-of-sample performance while ensuring that both splits represent diverse hydraulic, geometric, and sediment conditions (Hameed et al., 2025).
Model development and analysis were performed in Google Colab (Python 3.12.12, Linux-6.6.105+), using NumPy 2.0.2, pandas 2.2.2, scikit-learn 1.6.1, SHAP 0.50.0, and Matplotlib 3.10.0. A total of five machine-learning models were developed and benchmarked, with their hyperparameters optimized using Random Search (RandomizedSearchCV, 50 random trials). The models were Gradient Boosting, AdaBoost (tree-based), XGBoost, Gaussian Process Regression with an RBF kernel, and Kernel Ridge Regression with a polynomial kernel. Each model was implemented within a leak-safe pipeline: tree ensembles (Gradient Boosting, AdaBoost with shallow trees, and XGBoost) used raw feature scales, whereas kernel-based methods (GPR with RBF and KRR with Poly) incorporated standardized features via StandardScaler to inputs. The five models, their algorithmic families, and principal hyperparameters are summarized in Table 2.
2.7 Model performance evaluation
Model performance was assessed using three commonly adopted statistical indicators: the coefficient of determination (R2), root mean square error (RMSE) and mean absolute error (MAE) evaluated for both training and testing subsets to verify generalization and detect potential overfitting. These performance metrics are widely used in the literature for validating predictive models (Khajavi et al., 2025; Madurwar et al., 2025; Ramujee and Praseeda, 2025).
The coefficient of determination (R2) quantifies the proportion of variance in the observed data explained by the model, ranging from −
Together, these three indicators provide a balanced evaluation framework: R2 reflects the model’s explanatory power, RMSE captures overall prediction accuracy with sensitivity to large errors, and MAE represents typical error magnitude and robustness to outliers. The mathematical formulations for these evaluation metrics are presented below in Equations 3–5.
where
2.8 K-fold cross validation
K-Fold cross-validation partitions the available training data into K equally sized, non-overlapping folds and performs K repeated training and validation cycles. In each cycle, one-fold serves as the validation set while the remaining (K-1) folds constitute the temporary training subset (Teodorescu and Obreja Braşoveanu, 2025; Wilimitis and Walsh, 2023). This rotation ensures that every observation is used for validation exactly once and for training (K-1) times, producing a distribution of performance scores that reflects model sensitivity to data variability and the bias–variance trade-off (Kapoor and Narayanan, 2023).
To estimate out-of-sample performance and minimize split-specific bias, 5-fold shuffled cross-validation was applied on the training set only (representing 80% of the total data), consistent with the approach used by Al-Shamasneh et al. (2025). The training data were stratified by scour depth ranges so that each fold maintained a similar distribution of the response variable, ensuring balanced representation across folds. A fixed random seed was employed to ensure repeatability.
For each fold, models were trained on 80% of the training data and validated on the remaining 20%. All preprocessing steps, including scaling for kernel-based methods, were encapsulated within the scikit-learn pipelines to prevent data leakage. Performance was recorded as R2 and RMSE for each fold, allowing for a consistent comparison of model stability and predictive accuracy across the five folds.
2.9 Sensitivity analysis
Sensitivity analysis examines how variations in model inputs influence the predicted output. In this study, it was applied to quantify how changes in key hydraulic, geometric, and sediment variables affect the predicted equilibrium scour depth (y s ). The analysis employed SHAP (SHapley Additive ExPlanations), a principled framework for both global and local model interpretability based on cooperative game theory.
SHAP treats each input feature as a “player” in a game whose contribution to the prediction is computed as the average of its marginal effects across all possible future combinations. The resulting Shapley values satisfy the desirable properties of local accuracy, missingness, and consistency, making SHAP a unique and theoretically grounded additive explanation model (Aas et al., 2021). This framework unifies numerous prior feature-attribution methods and provides both local explanations (case-specific contributions that increase or decrease predictions) and global explanations through mean absolute SHAP values aggregated across samples.
TreeSHAP extends this formulation to decision trees and ensembles, enabling polynomial-time computation of exact Shapley values while also accounting for feature interaction effects. It provides practical visualization tools such as mean SHAP bar charts, beeswarm summaries, and dependence plots that capture local-to-global behavior without compromising fidelity to model predictions (Lundberg et al., 2020; Lundberg and Lee, 2018).
Overall, SHAP provides a theoretically sound and practically effective bridge between per-instance reasoning and global sensitivity of complex models. When applied with consideration for inter-variable dependence and supported by complementary diagnostics, it enables transparent, reproducible insights into how hydraulic, geometric, and sediment parameters collectively influence predicted scour (Alasmari et al., 2025).
2.10 Extreme-condition scour scenarios for flood resilience
Bridge scour arises from the interaction of hydraulic intensity, event duration, sediment mobility, and pier geometry. Field datasets rarely span the full range of conditions that govern safety, and key drivers such as the critical velocity for bed mobility (Vc) and the effective duration of mobility (T) are often missing (Belmokhtar et al., 2025; Shanmugam et al., 2025). A scenario framework allows us to impute defensible values where observations are incomplete by implementing physics-informed integration into the models. The framework further propagates credible extremes through a physics-guided predictor of scour depth (ys) and summarizes risk per pier by taking the envelope across scenarios. Scenarios convert patchy records into decision-ready evidence about plausible and upper-bound scour. To operationalize the analysis, this study considers four application scenarios, Q99, WC-VcT, WC-Flow + Base, and WC-Flow + VcT.
The Knik River bridge in southcentral Alaska is examined as a case study to estimate credible upper bounds on pier scour and to cross-validate structural sufficiency under forecast and design-flood conditions. The proposed methodology estimates extreme pier scour under rare and persistent flood conditions by developing four physically consistent, pier-specific scenarios: Q99, WC-Flow, WC-VcT, and WC-Flow + VcT. Scour depth is predicted after recalculating the mobility (sediment motion) threshold velocity and event duration to reflect changes in hydraulic and sediment states. Using this setup, we can fold Vc and T into the scenario envelope via physics-informed learning model integration and validate them in cases with missing Vc or T values. This technique shows the model can both predict pier scour depth and impute the missing parameters in a physics-consistent way. The process begins with standardized inputs for each pier, including
Values of
Where
and
Values of duration of flow (T) are estimated from an advection-based time scale modified by the mobility ratio, using
where
The event duration is adjusted accordingly using
Sediment characteristics are forced toward their upper distribution tails, and
While applying smaller adjustments to
with
For each pier, the maximum estimated scour depth across all scenarios,
By applying this methodology, all scenarios remain physically realistic. The resulting scour envelopes reveal whether the peak intensity of flow or prolonged duration dominates pier scour risk for each pier of the Knik River bridge. The resulting pier-specific values are summarized in Table 3 (A), Vc (ft/s) by pier across each scenario, and Table 3 (B), T (min) by pier across each Scenario. Together, these tables show how mobility thresholds drop, and durations lengthen as scenarios intensify. These patterns directly amplify predicted scour and help flag piers most vulnerable under extreme, long-lasting floods.
3 Results
3.1 Model performance
The developed models were evaluated based on the performance indicators, including R2, RMSE, and MAE, which were also implemented by other researchers for ML model performance evaluation (Khoshvaght et al., 2025; Koçak, 2025; Shobayo et al., 2025). A higher R2 (closer to 1) means the model explains more of the variation and fits the data better (Mamudu et al., 2025). Lower RMSE and MAE reflect higher accuracy, and values of 0 correspond to a perfectly fitting model (Chai and Draxler, 2014).
Across models, performance is uniformly strong, with tree ensembles leading as illustrated in Figures 4A–E plots. Gradient Boosting delivers the top generalization, having the best fit line R2 (0.98), and the parity plot shows predictions clustered tightly around the 45° line, as shown in Figure 4A. The best-fit line nearly overlaps the parity line over most of the range, indicating minimal systematic bias, with only a few high-value points pulling slightly above the diagonal. AdaBoost (Tree) closely mirrors this behavior, having fit-line R2 = 0.97, showing a similarly tidy spread and a best-fit line that tracks the diagonal with small deviations at the upper end, as depicted in Figure 4B. The developed XGBoost model parity plot is largely well aligned. The best-fit line for XGBoost remains close to the 1:1 line but is influenced by a handful of larger residuals at high targets, which explains the higher test RMSE. Kernel methods provide stable baselines with smooth behavior. Gaussian Process (RBF) and Kernel Ridge (Poly), have fit line R-squared value of 0.97 and 0.95, maintain compact clouds near the origin and a gradual, orderly dispersion as values grow. In both plots, the best-fit line sits just below the parity line at higher observed values, reflecting mild underprediction in the extreme range while remaining well calibrated through the bulk of the data.
Figure 4. Parity plots comparing observed versus predicted pier-scour depth ys (ft) for five models: (A) Gradient Boosting, (B) AdaBoost (tree-based), (C) XGBoost, (D) Gaussian Process Regression (RBF), and (E) Kernel Ridge Regression (poly). Blue markers denote training samples and red markers testing samples; the dashed line indicates the ideal 1:1 relationship and the solid line the best-fit regression, with fitted equations and (R2) values shown in each panel.
Moreover, to assess overfitting and underfitting, the gap between training and testing metrics should be small across all performance indicators (Aliferis and Simon, 2024; Emmert-Streib and Dehmer, 2019). Table 4 summarizes the training and testing performance of the developed models. Gradient Boosting and AdaBoost (Tree) exhibit small training-testing gaps, high testing R2 (0.96) with modest increases in RMSE and MAE, indicating good generalization. XGBoost shows slight signs of overfitting, with near-perfect training metrics having R2 (0.999) and deteriorating more on the testing set R2 (0.939), higher RMSE, and MAE. The resulting difference in R2 between training and testing is 0.06, indicating that the model performs very well overall, with only a small number of test samples acting as outliers that fall noticeably below the 1:1 parity line. The kernel methods, Gaussian Process (RBF) and Kernel Ridge (Poly), yield the largest test errors and lower test R2 (0.927–0.929), reflecting a slightly weaker generalization. Overall, the parity visuals and summary table agree that the predictions are well centered with tight residual structure for most targets, and the best-fit lines lie closely to the ideal diagonal across models, most notably for Gradient Boosting and AdaBoost, highlighting a strong capture of all the data points.
3.2 Residuals comparison of the developed models in training and testing
Residual plots (observed minus predicted) are used to screen for structural non-linearity, outliers, and to compare behavior on training versus testing data as a check on generalization (Kumar et al., 2025; Sharma et al., 2025). Checking residuals on the test and training sets helps separate true signal from overfitting, and models that generalize well show small, random-looking residuals in both. Figures 5A–E depicts residual histograms (observed vs. predicted) for all five models are tightly centered near zero, with substantial overlap between the training (blue) and testing (red) curves, indicating that none of the models is markedly overfitting. Gradient Boosting shows a compact, symmetric spread around zero, suggesting low bias and stable variance. AdaBoost (Tree) and Gaussian Process (RBF) exhibit a mild right-shift of the test curve (slightly positive bias), but the displacement is small relative to their overall dispersion. XGBoost produces a narrow core with light right tails, implying good central accuracy with a few underpredicted cases. Kernel Ridge (Poly) has slightly broader tails than the boosting models, though its training–testing overlap remains high. Overall, the distributions are unimodal and roughly symmetric, residual magnitudes are modest, and the similarity of train/test shapes supports good generalization across the range of scour depths represented in the data.
Figure 5. Residual histograms for pier-scour depth predictions ys (ft) over the laboratory dataset for five models: (A) Gradient Boosting, (B) AdaBoost (tree-based), (C) XGBoost, (D) Gaussian Process Regression (RBF), and (E) Kernel Ridge Regression (poly). Residuals are defined as observed minus predicted
For the Gradient Boosting model, residuals
3.3 Observed vs. predicted strength of models across all the data points
The actual vs. predicted plots, as shown in Figures 6A–E, demonstrate how well each model tracks measured scour depths across the whole sample index, also utilized by other studies (Nandi and Das, 2025b; Showkat et al., 2025). Overall, alignment is strong for Gradient Boosting and AdaBoost, whose test traces are close to the observed values with very minor variations in the top tail, indicating good generalization from low to high values. XGBoost follows the pattern well but demonstrates the previous overfitting tendency; training points are practically precise, but a few significant (ys) test spikes are slightly under-predicted. The kernel approaches, Gaussian Process (RBF) and Kernel Ridge (Poly), recreate the central range stably while smoothing the highest peaks more than the boosting models, which is consistent with their slightly higher test errors. Errors are concentrated at the largest (ys) events across all the models, as is predicted in hydraulics, where extremes are infrequent and difficult to learn, but most mid-range depths are well captured (McInerney et al., 2020).
Figure 6. Actual versus predicted pier-scour depth ys (ft) for the laboratory dataset under five models: (A) Gradient Boosting, (B) AdaBoost (tree-based), (C) XGBoost, (D) Gaussian Process Regression (RBF), and (E) Kernel Ridge Regression (poly). In each panel, green lines show observed and predicted values for the training set and purple lines for the testing set, with inset boxes reporting the corresponding train and test (R2).
3.4 Cross-validation of performance by K-fold validation
The current study utilizes K-fold for cross-validation to estimate out-of-sample performance by splitting the data into
3.5 Global drivers of pier-scour depth
To identify the global drivers of pier-scour depth, SHAP (Tree SHAP) was applied to the trained models to decompose predictions into additive feature contributions and aggregate them across the dataset. This yields a ranked importance profile, supported by beeswarm and dependence plots that highlight which hydraulic, geometric, and sediment variables most consistently increase or decrease predicted scour. SHAP is now widely used to quantify global feature importance and has become a leading tool for model interpretation, with recent studies showing it provides consistent, dataset-wide insights into how predictors drive outputs (Cappelli et al., 2023; Cappelli and Grimaldi, 2023; Mushtaq et al., 2024).
Figures 7A, B depicts SHAP feature-importance and SHAP beeswarm, both of which jointly demonstrate that the pier width normal to flow bn (ft) is by far the dominant predictor of scour depth ys (ft), accounting for 70.6% of the model’s overall importance. Flow speed Vo (ft/s) with 9.8% and approach depth yo (ft) with 7.5% offer the next highest contributions, followed by event duration T (min) with 4.3%, and sediment gradation (σg) with 4.0% respectively. On the other hand, critical velocity (Vc) and sediment size (D50) have a minor global influence of 2.3% and 1.4%. Moreover, the beeswarm plot, as shown in Figure 7B, clarifies directionality and nonlinearity at the observation level. Large bn values (warm points to the right) consistently increase predicted ys, while small bn values reduce it, an effect that is strong and monotonic, also reported by Baranwal and Das (2024a), Fuladipanah et al. (2023). Also, higher Vo tends to shift predictions upward, while lower Vo tends to shift them downward, aligning with standards that faster approach flow promotes scour. The SHAP plot shows the effect of yo is more moderate and slightly nonlinear; higher depths generally push ys upward, but with a visible spread that suggests interactions with Vo and bn. The plot also depicts that longer T shows a mild positive trend (more exposure leads to more scour growth), also validated by Melville and Chiew (1999). Sediment-size distribution (σg) exhibits mixed local effects (both signs), consistent with gradation influencing scour depth, aligning well with Mir et al. (2018) conclusions. In contrast, higher Vc typically reduces predicted scour (points with high Vc cluster on the negative SHAP side), reflecting that a bed that requires larger velocities to mobilize is less prone to scour under the same forcing, as reported by Arneson et al. (2012). D50 effects are small and mostly negative in this sample, indicating limited incremental predictive power once bn, Vo, and yo are known. Taken together, Figures 7A, B) indicates a physically consistent hierarchy, geometry (bn) dominates, hydraulics (Vo, yo, T) provide substantial but secondary control, and mobility metrics (σg, Vc, D50) modulate scour at the margins. The tight, mostly one-sided SHAP pattern for bn and Vo also suggests the model learned stable, interpretable relationships rather than relying on incorrect interactions.
Figure 7. SHAP-based interpretation of the Gradient Boosting model for pier-scour depth ys (ft) using the laboratory dataset. (A) Global SHAP feature-importance bar plot showing each input’s share of total importance. (B) SHAP beeswarm plot, where point position gives SHAP value (impact on ys) and color indicates feature value (blue = low, red = high). (C) Scatter plot of Pearson correlation with (ys) versus SHAP importance, with the diagonal line marking agreement between correlation- and SHAP-based rankings.
To evaluate the consistency between classical correlation analysis and interpretable feature importance, a feature-correlation versus SHAP comparison plot is given in Figure 7C. The horizontal axis reports the Pearson correlation between each input and the target (ys), and the vertical axis reports the global SHAP importance as a percentage of the total contribution. Overall, the two measures are consistent, with (bn) exhibiting both the highest correlation and the largest SHAP importance, confirming its dominant influence on the model predictions. Variables such as (yo) and (T) show moderate positive correlations and intermediate SHAP contributions, whereas weakly correlated inputs such as (D50) and (Vc) contribute only marginally. The gradation parameter (σg) displays a slightly negative correlation with (ys) but only modest SHAP importance, indicating that its effect is small and predominantly inverse. Overall, this comparison indicates that the Gradient Boosting model’s learned importance structure is broadly aligned with the underlying statistical relationships in the data.
3.6 Local interpretation of pier-scour depth
Figures 8A–D presents SHAP dependence plots, where the x-axis corresponds to the value of the input feature and the y-axis to its SHAP value, representing feature’s marginal contribution to the predicted scour depth (ys). Point colors encode the value of a secondary feature (as indicated by the accompanying color bar), with cooler tones represent lower values and warmer tones higher values. Systematic color gradients that coincide with changes in SHAP values highlight potential interaction effects between the two features. Within these plots, pier width (bn) is the dominant predictor. SHAP values increase almost monotonically with (bn) as shown in Figure 8A, implying larger expected scour for wider piers (Fuladipanah et al., 2023). Approach flow velocity (Vo) exerts a strong positive effect that saturates at higher speeds. Whereas initial depth (yo) contributes positively but with greater dispersion, consistent with a secondary role modulated by geometry and hydraulics, as evident from Dong et al. (2025) work. Event duration (T) shows a threshold response, rapid SHAP increases from short to moderate durations, followed by a plateau, indicating diminishing marginal effects for long exposures. The colour overlay clarifies interactions: higher (Vc) and larger (bn) elevate the SHAP contributions of (Vo) and (yo), while smaller (D50) (finer sediment) aligns with larger SHAP values for (T), meaning duration is more consequential on easily mobilized beds. Together, these patterns support the importance order (bn > Vo > yo > T) and reveal nonlinear, partially saturating responses shaped jointly by geometry, hydraulics, and sediment properties.
Figure 8. Panels (A–D) show SHAP dependence plots for the four most influential variables, (A) Pier width normal to flow bn (ft), (B) Approach flow velocity Vo (ft/s), (C) Approach flow depth yo (ft), and (D) Duration of flow T (min). Each point corresponds to one observation, with the x-axis giving the feature value and the y-axis giving its SHAP value, representing feature’s contribution to the predicted scour depth ys (ft).
3.7 Generated scenarios assessment
This section evaluates the generated scenarios, comparing predicted scour responses across stress-tested combinations of hydraulic, geometric, and sediment conditions to identify sensitivity patterns, potential worst cases, and the robustness of model conclusions. The predicted scour depth at Knik River bridge piers across all scenarios is presented in Table 6, with the frequency histograms in Figure 9A showing a clear ordering across scenarios. Among all the scenarios, the combined worst case (WC-Flow + VcT) dominates the upper range with a right-shifted distribution and a long upper tail, indicating both higher typical scour and more frequent extremes (several peaks around 4.7–4.9 ft). The WC-Flow scenario is still severe but lies slightly below the combined case, with a distribution centered at smaller depths and a shorter upper tail, consistent with peak hydrodynamic forcing without the extra scour development from extended duration. The WC-VcT scenario concentrates at lower depths with less spread, meaning longer periods above the mobility threshold increase scour, but the increases are smaller than those produced by peak-flow events. The Q99 scenario clusters narrowly around 4.0–4.3 ft and typically lies 0.3–0.6 ft below WC-Flow + VcT, marking a rare-but-plausible benchmark distinct from the engineered worst case.
Figure 9. Scenario-based predictions of pier-scour depth ys (ft) for seven case-study piers under four hydraulic/sediment scenarios (Q99, WC-Flow, WC-VcT, WC-Flow + VcT) from the Gradient Boosting model. (A) Overlapping histograms of predicted ys (ft) for each scenario, showing shifts in distribution. (B) Boxplots of ys (ft) by scenario, with boxes for the interquartile range, whiskers for the full range, circles for outliers, and triangles for the mean. (C) Bar chart of ys (ft) versus pier ID, comparing scenario-specific scour depth at each pier.
The summary statistics plot, as shown in Figure 9B illustrates the distributions of scour-depth across the scenarios. Scenario WC-Flow + VcT yields the highest medians (3.8–4.3 ft) and the widest interquartile (IQR) ranges (1.2–1.6 ft), confirming both elevated typical scour and variability. Scenario WC-Flow shows slightly lower medians (3.4–3.9 ft) and narrower IQRs (0.9–1.3 ft). In many cases, medians differ from the combined scenario by only 0.1–0.3 ft, indicating sites where peak velocity is the primary driver. On the other hand, scenario WC-VcT produces distinctly lower medians (2.0–2.5 ft) with tighter IQRs (0.6–0.9 ft), highlighting a milder central tendency despite the role of duration of event (T). Scenario Q99 remains high but stable, reinforcing its use as a decision threshold separating climatological extremes from design-envelope stress tests.
The per-pier comparison depicted in Figure 9C reveals heterogeneous sensitivity. At several piers, WC-Flow nearly matches WC-Flow + VcT (differences lie 0.1–0.3 ft), signaling locations where peak flow alone governs risk and thus may warrant rapid-response triggers tied to rising velocity. At some piers, the larger gap between WC-Flow + VcT and WC-Flow signals a strong duration (Vc, T) effect. In other words, these sites are therefore more vulnerable to extended floods or multi-storm sequences and warrant long-term monitoring and additional protective measures. In contrast, at sites where WC-VcT matches or exceeds the other scenario, it indicates that what matters most is how long the strong flow lasts, not just how high the single peak flow. In other words, longer periods of strong current can dig more scour even if the peak is not the biggest, so watch how long the current stays strong enough to move sediment. Therefore, tracking how long the flow remains above the sediment-mobility threshold is essential, and this can be operationalized using the scenario envelope.
Across the figures, WC-Flow + VcT consistently produces the largest and most variable scour depths, WC-Flow is a close second at many piers, WC-VcT is lower but still consequential for duration-sensitive sites, and Q99 offers a compact, high-end benchmark below the engineered worst case. Together, these results support a tiered risk triage: scale design envelopes to the combined worst case, deploy rapid-trigger monitoring where peak flow dominates, and prioritize duration-aware mitigation where persistence drives risk. These results show that the proposed framework remains accurate and interpretable across diverse hydraulic conditions and scenario stress tests.
3.8 Practical tool
To facilitate practical implementation, an interactive bridge pier scour prediction application was developed and made available on the Hugging Face platform. The tool combines five calibrated models: Gradient Boosting, AdaBoost, XGBoost, Gaussian Process Regression, and Kernel Ridge Regression, in a unified interface. Users enter seven required input variables, including pier width normal to flow (bn), approach flow velocity (Vo), sediment critical velocity (Vc), approach flow depth (yo), median sediment size (D50), geometric standard deviation of the sediment-size distribution (σg), and the duration of flow/scouring (T), all constrained to their allowable ranges. For each set of input values, the application provides the predicted scour depth from every model, allowing direct comparison of model outputs. All trained model files as well as the Python scripts for model training, evaluation, and SHAP-based interpretation are included in the Hugging Face repository. This structure promotes accessibility for practitioners, guarantees transparency and reproducibility for researchers, and is fully usable without any prior knowledge of coding. The link to the developed application, model scripts, and SHAP analysis code is provided in the data availability section of the manuscript.
4 Discussion
4.1 Comparison with related work
Table 7 provides a quantitative comparison of the present models with established bridge-pier scour predictors spanning various methodological scopes. Among the developed models, the tree-based algorithms, the Gradient Boosting model achieves superior generalization, evidenced by the highest testing R2 value (0.959) and the lowest test RMSE (0.145). AdaBoost offers closely comparable performance (test R2 0.958, RMSE 0.146). Relative to cylindrical-pier models reported in Fuladipanah et al. (2023), including MARS, GEP, and M5 model tree, the present tree-based regressors demonstrate clear improvement in both test accuracy and error rates (testing R2 for MARS, GEP, and M5 were 0.917, 0.872, and 0.698; RMSE were 0.090, 0.114, and 0.284, respectively). Similarly, complex-pier models such as those developed by Tien Bui et al. (2020) achieved the highest accuracy of R2= 0.91 in training for ANN among the developed models and dropped to 0.82 in testing. Overall, the current boosted tree approaches yield a more balanced combination of predictive performance and model simplicity for predicting bridge-pier scour.
The SHAP analysis in this study reveals a clear hierarchy among predictors. Geometry, especially the width parameter
A unique feature of the present study is the application of a comprehensive scenario envelope (Q99, WC-Flow, WC-VcT, WC-Flow + VcT) for stress testing, which supplies robust, decision-ready outputs that are not present in previous related work. Taken together, the ML techniques high out-of-sample accuracy, minimal training–testing performance gaps, and scenario-driven stress testing set this approach apart from prior studies, while aligning outcomes with core hydraulic principles and practical engineering expectations. Moreover, the present study developed an interactive practical tool for bridge pier–scour prediction, enabling direct use of the trained models by practitioners. As summarized in Table 7, the existing bridge-scour literature does not offer a comparable implementation-oriented tool alongside its modeling frameworks.
4.2 Engineering implementation
Scour depth is the lowering of the riverbed around a bridge pier and is a key parameter for bridge resilience. During severe floods, the stage and velocity rise quickly, forming intense downflow, horseshoe, and wake vortices around the pier. These vortices concentrate shear, mobilize the surrounding sediments, and carry them away, exposing foundations. If the scour depth grows beyond design limits, the pier’s capacity is reduced, and structural failure can occur (Lee and Hong, 2019). This makes scour depth central to flood-resilient design and operations. It must be monitored in time, with procedures to track scour growth and issue early warnings when thresholds are approached. Historically, scour has been the leading cause of bridge failures, with flood-driven high stages and velocities acting as the primary driver of scour development, explored by the current study. Floods cannot be prevented, but their impacts can be managed. At the design stage, prediction of scour depth for the site using hydraulic conditions and credible flood forecasts is crucial. While in operation, the identification of piers at higher risk, scheduling timely strengthening or protection, and, when necessary, temporarily removing a bridge from service to avoid catastrophic outcomes is the need of the day.
The current study goes beyond building a single predictor of pier-scour depth and undertakes a full model development, validation, and interpretation cycle aimed at practical deployment. Multiple machine learners were trained and tuned, and their performance was quantified on both training and testing splits using complementary metrics (R2, RMSE, MAE) to expose accuracy, bias, and dispersion (Rana et al., 2025). To guard against optimistic estimates tied to a particular split, the study ran K-fold cross-validation, summarizing fold-wise scores and variability. The cross-validation framework revealed the metrics to diagnose overfitting (high train/low test, large fold variance) and underfitting (uniformly low scores) rather than relying on a single, whole-dataset score (White and Power, 2023). Residual and parity plots were reviewed alongside the metrics to verify that errors were pattern-free, and that upper-tail behavior was understood. Also, the study emphasized model insights, not only predictions. The employed framework aggregated explanations across the fitted models and generated SHAP analyses to rank the global importance of the hydraulic, geometric, and sediment variables for predicting scour depth and probe local dependence and interactions, e.g., how the effect of approach velocity changes with pier width or initial depth for a specific pier or duration (Nandi and Das, 2025a). These explanations link the learned relationships to hydraulics, help identify regime-dependent behavior (peak-dominated vs. duration-sensitive), and provide actionable levers for design and operations.
Moreover, the workflow was designed for field use. Starting from curated USGS records, targeted feature engineering is applied (including physics-informed estimates of missing drivers such as Vc and T, trained and validated models with transparent checks, and then wrapped the results in a scenario envelope that supports forecasting, triage of at-risk piers, and targeted monitoring or reinforcement. The emphasis is on reproducible, decision-ready outputs rather than record-keeping alone. Furthermore, the study outlines a forward path: incorporating additional toolkits (uncertainty quantification, conformal prediction, dynamic (Vo/Vc) exposure metrics), expanding site-specific scenarios as data improve, and deepening physics–ML integration to better represent near-threshold mobility and extreme events. Taken together, the approach demonstrates how machine learning and AI can be applied systematically and responsibly to strengthen flood-resilient bridge design and operations, while leaving clear hooks for future refinement.
To check whether the hydraulic structures remain adequate under forecast and design flood conditions, the current study used Knik River bridge piers located in southcentral Alaska as a case study to establish conservative upper limits on pier scour. Scenario analysis translates predictive insights into actionable strategies for flood-resilient bridge design (Kosič et al., 2023). Each pier is tested against four critical stress scenarios: Q99 (representing rare high extremes of Vo, yo, D50, σg), WC-Flow (short, intense floods), WC-VcT (events with low Vc and long T), and WC-Flow + VcT (the most conservative envelope). For each scenario, input variations trigger re-computation of Vc and T, ensuring dynamic consistency in predictions. Design recommendations typically emerge in two forms. Sites where peak flows dominate are best managed with measures that dissipate velocity or split flow, such as guide banks and pier upgrades. While sites sensitive to duration require countermeasures focused on resisting prolonged flood exposure, like strengthening the piers, deeper cutoffs, and toe protection (Lagasse et al., 2001). The WC-Flow + VcT scenario provides an upper benchmark for intervention: piers with predicted ys above established safety limits under this scenario should be prioritized for structural upgrades or heightened flood monitoring. This risk-based framework allows authorities and practitioners to strategically allocate engineering resources by identifying the governing scour scenario for each pier. Sites vulnerable to peak flows and sites sensitive to event duration can be managed with scenario-tailored toolkits, optimizing expenditures and maximizing risk reduction. Importantly, over-design can be avoided where existing resilience is sufficient, as the scenario analysis clarifies which piers require immediate strengthening and which can be safely monitored without strengthening. This targeted approach provides a robust pathway for enhancing bridge safety and longevity, especially under future flood uncertainty.
The study also uses a physics-informed scenario envelope to handle missing inputs and to turn forecasts into actionable risk. When key variables are unavailable at a site, most often the critical velocity (Vc) and the effective duration of mobility (T) can be computed by establishing hydraulic relations and embedding those formulas inside the scenario framework. By doing so, each scenario (e.g., peak-flow, duration-focused, combined worst case, or Q99) carries internally consistent approach flow velocity (Vo), (yo), (Vc), and (T), allowing the model to estimate scour even where records are incomplete. Using forecast flood stage and velocity, the scenario envelope identifies which piers will face high intensity (Vo/Vc) and long exposure (large T), ranks them by predicted scour, and triggers early warnings for high-risk assets.
The approach is transparent, safe, and sustainable because it ties imputation to physics rather than ad hoc guesswork, remains usable when monitoring gaps exist, and scales naturally to real-time operations. In this way, the study contributes to flood resilience by combining publicly available hydrologic data, physics-based estimation of missing drivers, and scenario-based analytics to guide timely protection, monitoring, and communication before hazardous conditions develop.
4.3 Limitations and scope
The models are based solely on laboratory data, and while laboratory experiments cover many scenarios, caution is needed when applying these results to field conditions. Factors like scale effects, armoring, debris or ice impacts, and live-bed sediment movement are less represented in lab settings compared to real-world environments. For the Knik River site, certain input variables, especially Vc and T, were either estimated or set according to scenario definitions, which introduces uncertainty to the predictions. The Q99 scenario draws on percentile ranks within the dataset, not on a hydrological basis; more realistic predictions could be achieved by linking these scenarios to basin-specific flood frequency and sediment mobility models. Lastly, the relatively limited variation in D50 and σg data restricts the model’s ability to fully explore the influence of sediment gradation.
4.4 Future work
Field calibration and validation at bridges with high-quality monitoring should be prioritized, along with Bayesian or bootstrap uncertainty quantification for
5 Conclusion
This study presents a transparent, physics-aware toolchain for predicting bridge-pier scour depth (ys) and turning those predictions into clear guidance for flood-resilient design and operations. Trained on the PSDb-2014 laboratory data, all the developed models, including Gradient Boosting, AdaBoost, XGBoost, Kernel Ridge (Poly), and Gaussian Process (RBF), performed well in both training and testing. Specifically, the tree ensemble with Gradient Boosting generalizes well, having training and testing R2 values of 0.99 and 0.96, respectively. Moreover, the tree ensemble models showed small, well-behaved residuals, which means they track measured scour depth closely in unseen cases. Also, the Gradient Boosting parity fit line
The developed model’s generalization was assessed using a 5-fold cross-validation approach within an external training–testing framework, where all data processing and physics-informed updates were performed strictly within each training fold to prevent data leakage. This procedure produced consistent, low-error outcomes across the folds, which closely aligned with the results on the held-out test data, indicating that the models are robust and suitable for real-world applications. Across folds, performance ranked Gradient Boosting > XGBoost > AdaBoost (Tree) > Kernel Ridge (Poly) > Gaussian Process (RBF), reinforcing confidence in the algorithms’ robustness and interpretability. To enhance the interpretability of model predictions and verify their physical consistency, SHAP analysis was employed to quantify the contributions of each input variable to the predicted scour depth (ys). The results clearly indicate that bridge pier width (bn) is by far the most influential factor, accounting for 70.6% of the explained variance, followed by approach flow velocity (Vo) at 9.8% and approach depth (yo) at 7.5%. Event duration (T) gains relevance in cases of prolonged exposure, while sediment gradation (σg), critical velocity (Vc), and median sediment size (D50) exert smaller, yet still interpretable, effects on scour outcomes. Moreover, the present study developed an interactive practical tool for bridge pier–scour prediction, allowing practitioners to directly use the trained models without requiring coding expertise.
The framework enforces physics-informed updates to critical velocity and event duration, maintaining physical realism and preventing model drift under hydraulic extremes. Applied to the Knik River bridge piers, the study categorized extreme conditions into four scenario envelopes that capture rare, peak, and sustained flood events to guide flood-resilient design and risk management. The combined worst-case (WC-Flow + VcT) typically sets the upper design bound, WC-Flow governs peak-driven risk, WC-VcT addresses long-duration vulnerabilities, and Q99 provides a realistic rare-event benchmark. Additionally, the framework validates that missing input data can be effectively handled using realistic imputation without compromising modeling integrity. These envelopes support risk-based triage and practical action in extreme flood events. Where peak flow dominates, rapid-trigger monitoring and velocity-reducing measures (e.g., flow deflectors, local armoring) should be prioritized. Where duration governs scour, duration-resistant countermeasures (e.g., toe protection, deeper cutoffs, improved embedment) become more effective. The framework, therefore, offers a direct path from data and models to asset-level decisions: screen piers with the envelope, identify the governing mechanism (intensity vs. duration), and select fit-for-purpose measures. Because the models are interpretable, stakeholders can audit why a site is flagged (e.g., large bn and high Vo) and trace the effect of each input on predicted scour depth. This systematic, physics-consistent approach supports flood-resilient design decisions, maintenance prioritization, retrofit planning, emergency response, and clear risk communication.
Data availability statement
The interactive app tool, together with all trained model files and the Python scripts used for SHAP analysis, is available at the provided link (https://huggingface.co/spaces/Adilkhan01/Scour).
Author contributions
AK: Formal Analysis, Data curation, Methodology, Conceptualization, Visualization, Software, Writing – original draft, Investigation. DI: Conceptualization, Writing – review and editing, Supervision, Resources, Project administration, Validation.
Funding
The author(s) declared that financial support was not received for this work and/or its publication.
Conflict of interest
The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declared that generative AI was not used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
Aas, K., Jullum, M., and Løland, A. (2021). Explaining individual predictions when features are dependent: more accurate approximations to shapley values. Artif. Intell. 298, 103502. doi:10.1016/j.artint.2021.103502
Akib, S., Mohammadhassani, M., and Jahangirzadeh, A. (2014). Application of ANFIS and LR in prediction of scour depth in bridges. Comput. Fluids 91, 77–86. doi:10.1016/j.compfluid.2013.12.004
Al-Shamasneh, A. R., Karim, F. K., Mahmoodzadeh, A., Alghamdi, A., Alqahtani, A., Alsubai, S., et al. (2025). High-Fidelity machine learning framework for fracture energy prediction in fiber-reinforced concrete. Comput. Model. Eng. and Sci. 144, 1573–1606. doi:10.32604/cmes.2025.068887
Alasmari, S. M., Sakly, H., Kraiem, N., and Algarni, A. (2025). Phishing detection in IoT: an integrated CNN-LSTM framework with explainable AI and LLM-enhanced analysis. Discov. Internet Things 5, 102. doi:10.1007/s43926-025-00202-9
Aldoseri, A., Al-Khalifa, K. N., and Hamouda, A. M. (2023). Re-Thinking data strategy and integration for artificial intelligence: concepts, opportunities, and challenges. Appl. Sci. 13, 7082. doi:10.3390/app13127082
Aliferis, C., and Simon, G. (2024). Overfitting, underfitting and general model overconfidence and under-performance pitfalls and best practices in machine learning and AI. 477–524. doi:10.1007/978-3-031-39355-6_10
Anisha, A., Jacob, A., Davis, R., and Mangalathu, S. (2022). Fragility functions for highway RC bridge under various flood scenarios. Eng. Struct. 260, 114244. doi:10.1016/j.engstruct.2022.114244
Arachchige, C. N. P. G., and Prendergast, L. A. (2024). Confidence intervals for median absolute deviations. Commun. Stat. Simul. Comput. 55, 1–10. doi:10.1080/03610918.2024.2376198
Argyroudis, S. A., and Mitoulis, S. A. (2021). Vulnerability of bridges to individual and multiple hazards-floods and earthquakes. Reliab Eng. Syst. Saf. 210, 107564. doi:10.1016/j.ress.2021.107564
Arneson, L. A., Zevenbergen, L. W., and Lagasse, P. F.P.E.C. (2012). Evaluating scour at bridges. Fifth.
Asif, S., Wenhui, Y., Ur-Rehman, S., Ul-ain, Q., Amjad, K., Yueyang, Y., et al. (2025). Advancements and prospects of machine learning in medical diagnostics: unveiling the future of diagnostic precision. Archives Comput. Methods Eng. 32, 853–883. doi:10.1007/s11831-024-10148-w
Baranwal, A., and Das, B. S. (2024a). Scouring around bridge pier: a comprehensive analysis of scour depth predictive equations for clear-water and live-bed scouring conditions. AQUA — Water Infrastructure, Ecosyst. Soc. 73, 424–452. doi:10.2166/aqua.2024.235
Baranwal, A., and Das, B. S. (2024b). Live-Bed scour depth modelling around the Bridge pier using ANN-PSO, ANFIS, MARS, and M5Tree. Water Resour. Manag. 38, 4555–4587. doi:10.1007/s11269-024-03879-9
Belmokhtar, M., Schmidt, F., Chevalier, C., and Ture Savadkoohi, A. (2025). Monitoring of a bridge experiencing scour using frequency domain decomposition mixed with DBSCAN algorithm: unsupervised modal analysis of output-only system. Archives Civ. Mech. Eng. 25, 193. doi:10.1007/s43452-025-01235-1
Benedict, S. T., and Caldwell, A. W. (2014). A pier-scour database: 2,427 field and laboratory measurements of pier scour. Data Ser. doi:10.3133/ds845
Bentegri, H., Rabehi, M., Kherfane, S., Nahool, T. A., Rabehi, A., Guermoui, M., et al. (2025). Assessment of compressive strength of eco-concrete reinforced using machine learning tools. Sci. Rep. 15, 5017. doi:10.1038/s41598-025-89530-y
Bichri, H., Chergui, A., and Hain, M. (2024). Investigating the impact of train/Test split ratio on the performance of pre-trained models with custom datasets. Int. J. Adv. Comput. Sci. Appl. 15. doi:10.14569/IJACSA.2024.0150235
Brandimarte, L., Paron, P., and Baldassarre, G. D. (2012). Bridge pier scour: a review of processes, measurements and estimates. Environ. Eng. Manag. J. 11, 975–989. doi:10.30638/eemj.2012.121
Cao, Y., Miao, Q.-G., Liu, J.-C., and Gao, L. (2013). Advance and prospects of AdaBoost algorithm. Acta Autom. Sin. 39, 745–758. doi:10.1016/S1874-1029(13)60052-X
Cappelli, F., and Grimaldi, S. (2023). Feature importance measures for hydrological applications: insights from a virtual experiment. Stoch. Environ. Res. Risk Assess. 37, 4921–4939. doi:10.1007/s00477-023-02545-7
Cappelli, F., Tauro, F., Apollonio, C., Petroselli, A., Borgonovo, E., and Grimaldi, S. (2023). Feature importance measures to dissect the role of sub-basins in shaping the catchment hydrological response: a proof of concept. Stoch. Environ. Res. Risk Assess. 37, 1247–1264. doi:10.1007/s00477-022-02332-w
Chai, T., and Draxler, R. R. (2014). Root mean square error (RMSE) or mean absolute error (MAE)? – arguments against avoiding RMSE in the literature. Geosci. Model Dev. 7, 1247–1250. doi:10.5194/gmd-7-1247-2014
Charilaou, P., and Battat, R. (2022). Machine learning models and over-fitting considerations. World J. Gastroenterol. 28, 605–607. doi:10.3748/wjg.v28.i5.605
Chen, F., Yang, W., Liu, F., Zhu, L., and Sun, Z. (2025). Experimental Study of sediment incipient velocity and scouring in submarine cable burial areas. Water (Basel) 17, 1310. doi:10.3390/w17091310
Chen, H., Zhang, J., Zhang, P., Guo, Y., Ji, Y., and Fu, R. (2025). Large eddy simulation of the flow field characteristics around a jacket foundation under unidirectional flow actions. Ocean. Eng. 317, 120057. doi:10.1016/j.oceaneng.2024.120057
Cheng, M.-Y., Cao, M.-T., and Wu, Y.-W. (2015). Predicting equilibrium scour depth at Bridge piers using evolutionary radial basis function neural network. J. Comput. Civ. Eng. 29, 04014070. doi:10.1061/(ASCE)CP.1943-5487.0000380
Choi, S.-U., Choi, B., and Lee, S. (2017). Prediction of local scour around bridge piers using the ANFIS method. Neural Comput. Appl. 28, 335–344. doi:10.1007/s00521-015-2062-1
Cook, W., Barr, P. J., and Halling, M. W. (2015). Bridge failure rate. J. Perform. Constr. Facil. 29, 04014080. doi:10.1061/(ASCE)CF.1943-5509.0000571
de Lange, S. I., Niesten, I., van de Veen, S. H. J., Baas, J. H., Lammers, J., Waldschläger, K., et al. (2024). Fine sediment in mixed sand-silt environments impacts bedform geometry by altering sediment mobility. Water Resour. Res. 60, e2024WR037065. doi:10.1029/2024WR037065
Diaz, E. E. M., Moreno, F. N., and Mohammadi, J. (2009). Investigation of common causes of Bridge collapse in Colombia. Pract. Periodical Struct. Des. Constr. 14, 194–200. doi:10.1061/(ASCE)SC.1943-5576.0000006
Dong, H., Li, Z., and Sun, Z. (2025). Study on the mechanism of local scour around Bridge piers. J. Mar. Sci. Eng. 13, 1021. doi:10.3390/jmse13061021
Emami, H., Azarnavid, B., Raeisi Isa-Abadi, A., and Fardi, M. (2025). An efficient ensemble learning model for time-dependent scour depth estimation. J. Supercomput. 81 (15), 1378. doi:10.1007/s11227-025-07856-w
Eini, N., Bateni, S. M., Jun, C., Heggy, E., and Band, S. S. (2023). Estimation and interpretation of equilibrium scour depth around circular bridge piers by using optimized XGBoost and SHAP. Eng. Appl. Comput. Fluid Mech. 17 (1). doi:10.1080/19942060.2023.2244558
Emmert-Streib, F., and Dehmer, M. (2019). Evaluation of regression models: model assessment, model selection and generalization error. Mach. Learn Knowl. Extr. 1, 521–551. doi:10.3390/make1010032
Freund, Y., and Schapire, R. E. (1995). A desicion-theoretic generalization of on-line learning and an application to boosting. 23–37. doi:10.1007/3-540-59119-2_166
Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Ann. Statistics 29. doi:10.1214/aos/1013203451
Friedman, J. H. (2002). Stochastic gradient boosting. Comput. Stat. Data Anal. 38, 367–378. doi:10.1016/S0167-9473(01)00065-2
Fu, Z., Ji, B., Cheng, M., and Maeno, H. (2012). “Statistical analysis of the causes of Bridge collapse in China,” in Forensic Engineering 2012 (Reston, VA: American Society of Civil Engineers), 75–83. doi:10.1061/9780784412640.009
Fuladipanah, M., Hazi, M. A., and Kisi, O. (2023). An in-depth comparative analysis of data-driven and classic regression models for scour depth prediction around cylindrical bridge piers. Appl. Water Sci. 13, 231. doi:10.1007/s13201-023-02022-0
Gammermann, A. (2000). Support vector machine learning algorithm and transduction. Comput. Stat. 15, 31–39. doi:10.1007/s001800050034
Habal, A. H. Y., and Benbouras, M. A. (2025). California bearing ratio and compaction parameters prediction using advanced hybrid machine learning methods. Asian J. Civ. Eng. 26, 121–146. doi:10.1007/s42107-024-01179-6
Hameed, M. M., Alomar, M. K., Razali, S. F. M., and Salem, A. (2025). Integrated approach of extreme learning machines and locally weighted linear regression for improved discharge coefficient prediction. Sci. Rep. 15, 21761. doi:10.1038/s41598-025-03812-z
Harik, I. E., Shaaban, A. M., Gesund, H., Valli, G. Y. S., and Wang, S. T. (1990). United States Bridge Failures, 1951–1988. J. Perform. Constr. Facil. 4, 272–277. doi:10.1061/(ASCE)0887-3828(1990)4:4(272)
Hassan, W. H., and Jalal, H. K. (2021). Prediction of the depth of local scouring at a bridge pier using a gene expression programming method. SN Appl. Sci. 3, 159. doi:10.1007/s42452-020-04124-9
Hastie, T., Tibshirani, R., and Friedman, J. (2009). The elements of statistical learning. New York, New York, NY: Springer. doi:10.1007/978-0-387-84858-7
Ismael, D. (2023). Enhancing online Hands-On learning in engineering education: student perceptions and recommendations paper presented at 2023 ASEE annual conference and exposition. Baltimore, Maryland. doi:10.18260/1-2--43358
Ismael, D., and Shealy, T. (2018). Sustainable construction risk perceptions in the Kuwaiti construction industry. Sustainability 10, 1854. doi:10.3390/su10061854
Ismael, D., Hutton, N., Erten-Unal, M., Considine, C., Vandecar-Burdin, T., Davis, C., et al. (2024). Community-Centric approaches to coastal hazard assessment and management in southside Norfolk, Virginia, USA. Atmosphere 15, 372. doi:10.3390/atmos15030372
Jacobson, L. P., Parker, C. B., Cella, D., Mroczek, D. K., Lester, B. M., Smith, P. B., et al. (2024). Approaches to protocol standardization and data harmonization in the ECHO-wide cohort study. Pediatr. Res. 95, 1726–1733. doi:10.1038/s41390-024-03039-0
Jamal, A. S., and Ahmed, A. N. (2025). Estimating compressive strength of high-performance concrete using different machine learning approaches. Alexandria Eng. J. 114, 256–265. doi:10.1016/j.aej.2024.11.084
Jin, R., and Agrawal, G. (2003). “Communication and memory efficient parallel decision tree construction,” in Proceedings of the 2003 SIAM international conference on data mining (Philadelphia, PA: Society for Industrial and Applied Mathematics), 119–129. doi:10.1137/1.9781611972733.11
Julien, P. Y. (2010). Erosion and sedimentation. Cambridge University Press. doi:10.1017/CBO9780511806049
Kang, H. (2013). The prevention and handling of the missing data. Korean J. Anesthesiol. 64, 402–406. doi:10.4097/kjae.2013.64.5.402
Kapoor, S., and Narayanan, A. (2023). Leakage and the reproducibility crisis in machine-learning-based science. Patterns 4, 100804. doi:10.1016/j.patter.2023.100804
Khajavi, H., Rastgoo, A., and Masoumi, F. (2025). Sensitivity assessment and comparative analysis of machine learning and numerical models for predicting dam break-induced water levels. Iran. J. Sci. Technol. Trans. Civ. Eng. doi:10.1007/s40996-025-02003-0
Khatir, A., Capozucca, R., Khatir, S., Magagnini, E., Le Thanh, C., and Riahi, M. K. (2025). Advancements and emerging trends in integrating machine learning and deep learning for SHM in mechanical and civil engineering: a comprehensive review. J. Braz. Soc. Mech. Sci. Eng. 47, 419. doi:10.1007/s40430-025-05697-5
Khoshvaght, H., Permala, R. R., Razmjou, A., and Khiadani, M. (2025). A critical review on selecting performance evaluation metrics for supervised machine learning models in wastewater quality prediction. J. Environ. Chem. Eng. 13, 119675. doi:10.1016/j.jece.2025.119675
Koçak, E. (2025). Comprehensive evaluation of machine learning models for real-world air quality prediction and health risk assessment by AirQ+. Earth Sci. Inf. 18, 447. doi:10.1007/s12145-025-01941-7
Kosič, M., Prendergast, L. J., and Anžlin, A. (2023). Analysis of the response of a roadway bridge under extreme flooding-related events: scour and debris-loading. Eng. Struct. 279, 115607. doi:10.1016/j.engstruct.2023.115607
Kumar, A., Sen, S., and Sinha, S. (2025). Machine learning based prediction models for the compressive strength of high-volume fly ash concrete reinforced with silica fume. Asian J. Civ. Eng. 26, 1683–1701. doi:10.1007/s42107-025-01277-z
Lee, S. O., and Hong, S. H. (2019). Turbulence characteristics before and after scour upstream of a scaled-down Bridge pier model. Water (Basel) 11, 1900. doi:10.3390/w11091900
Liang, W., Luo, S., Zhao, G., and Wu, H. (2020). Predicting hard rock pillar stability using GBDT, XGBoost, and LightGBM algorithms. Mathematics 8, 765. doi:10.3390/math8050765
Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., et al. (2020). From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2, 56–67. doi:10.1038/s42256-019-0138-9
Ma, H., Zhang, S., Li, B., and Huang, W. (2024). Local scour around the monopile based on the CFD-DEM method: experimental and numerical study. Comput. Geotech. 168, 106117. doi:10.1016/j.compgeo.2024.106117
Madurwar, K., Basem, A., Nikhade, A., Azher, A. A., Khedker, S., Hadi, A. A., et al. (2025). SHAP-enhanced hybrid PSO-ensemble framework models for interpretable prediction of UHPC compressive strength. Asian J. Civ. Eng. doi:10.1007/s42107-025-01540-3
Magazzino, C., and Haroon, M. (2025). The interrelation among environmental quality, public accounts, and macroeconomic fundamentals: an analysis of OECD countries using machine learning techniques. Environ. Dev. 54, 101175. doi:10.1016/j.envdev.2025.101175
Mamudu, L., Aldrees, A., Dan’azumi, S., and Yahaya, A. (2025). Evaluating the predictive accuracy of some regression models and artificial neural networks in streamflow forecasting (a case study of the Kaduna River, Northwest Nigeria). Model Earth Syst. Environ. 11, 125. doi:10.1007/s40808-025-02296-0
McInerney, D., Thyer, M., Kavetski, D., Laugesen, R., Tuteja, N., and Kuczera, G. (2020). Multi-temporal hydrological residual error modeling for seamless subseasonal streamflow forecasting. Water Resour. Res. 56, e2019WR026979. doi:10.1029/2019WR026979
Melville, B. W., and Chiew, Y.-M. (1999). Time scale for local scour at bridge piers. J. Hydraulic Eng. 125, 59–65. doi:10.1061/(ASCE)0733-9429(1999)125:1(59)
Mir, B. H., Lone, M. A., Bhat, J. A., and Rather, N. A. (2018). Effect of gradation of bed material on local scour depth. Geotechnical Geol. Eng. 36, 2505–2516. doi:10.1007/s10706-018-0479-x
Mohammed, S. H., Hussein, L. B., and Mohammed, A. S. (2025). Sensitivity-based prediction of self-compacting concrete strength using hybrid modeling techniques. Asian J. Civ. Eng. 26, 3485–3506. doi:10.1007/s42107-025-01383-y
Morgese, M., Ansari, F., Domaneschi, M., and Cimellaro, G. P. (2020). Post-collapse analysis of Morandi’s Polcevera viaduct in Genoa Italy. J. Civ. Struct. Health Monit. 10, 69–85. doi:10.1007/s13349-019-00370-7
Mushtaq, H., Akhtar, T., Hashmi, M.Z. ur R., Masood, A., and Saeed, F. (2024). Hydrologic interpretation of machine learning models for 10-daily streamflow simulation in climate sensitive upper Indus catchments. Theor. Appl. Climatol. 155, 5525–5542. doi:10.1007/s00704-024-04932-8
Nandi, B., and Das, S. (2025a). Prediction of maximum scour around circular Bridge piers using semi-empirical and machine learning models. Water (Basel) 17, 2610. doi:10.3390/w17172610
Nandi, B., and Das, S. (2025b). Predicting Max scour depths near two-pier groups using ensemble machine-learning models and visualizing feature importance with partial dependence plots and SHAP. J. Comput. Civ. Eng. 39, 04025007. doi:10.1061/JCCEE5.CPENG-6150
Nasr, A., Björnsson, I., Honfi, D., Larsson Ivanov, O., Johansson, J., and Kjellström, E. (2021). A review of the potential impacts of climate change on the safety and performance of bridges. Sustain Resilient Infrastruct. 6, 192–212. doi:10.1080/23789689.2019.1593003
Neal, R. M. (1996). “Bayesian learning for neural networks,” in Lecture notes in statistics. New York, New York, NY: Springer. doi:10.1007/978-1-4612-0745-0
Pal, M., Singh, N. K., and Tiwari, N. K. (2012). M5 model tree for pier scour prediction using field dataset. KSCE J. Civ. Eng. 16, 1079–1084. doi:10.1007/s12205-012-1472-1
Pearson, C., and Delatte, N. (2006). Collapse of the Quebec Bridge, 1907. J. Perform. Constr. Facil. 20, 84–91. doi:10.1061/(ASCE)0887-3828(2006)20:1(84)
Peng, J., Hahn, J., and Huang, K.-W. (2023). Handling missing values in information systems research: a review of methods and assumptions. Inf. Syst. Res. 34, 5–26. doi:10.1287/isre.2022.1104
Piraei, R., Niazkar, M., Cislaghi, A., Afzali, S. H., and Mohammadi, A. (2025). Enhancing prediction of equilibrium scour depth around bridge piers using staking machine learning models. Earth Syst. Environ. 9 (3), 1669–1689. doi:10.1007/s41748-025-00722-y
Pizarro, A., Manfreda, S., and Tubaldi, E. (2020). The science behind scour at Bridge foundations: a review. Water (Basel) 12, 374. doi:10.3390/w12020374
Rahman, F., and Chavan, R. (2025). Machine learning application in prediction of scour around Bridge piers: a comprehensive review. Archives Comput. Methods Eng. 32, 1299–1322. doi:10.1007/s11831-024-10167-7
Ramujee, K., and Praseeda, D. (2025). Prediction of compressive strength of geopolymer concrete using optimised machine learning algorithms. Asian J. Civ. Eng. doi:10.1007/s42107-025-01541-2
Rana, M. S., Hossain, M. M., and Li, F. (2025). Comparative analysis of machine learning models for predicting the compressive strength of ultra-high-performance steel fiber reinforced concrete. J. Eng. Res. 13, 3051–3069. doi:10.1016/j.jer.2025.01.004
Rasmussen, C. E., and Williams, C. K. I. (2005). Gaussian processes for machine learning. The MIT Press. doi:10.7551/mitpress/3206.001.0001
Schapire, R. E. (1990). The strength of weak learnability. Mach. Learn. 5, 197–227. doi:10.1007/BF00116037
Schapire, R. E. (2013). “Explaining AdaBoost,” in Empirical inference (Berlin, Heidelberg: Springer), 37–52. doi:10.1007/978-3-642-41136-6_5
Seal, S., Mahale, M., García-Ortegón, M., Joshi, C. K., Hosseini-Gerami, L., Beatson, A., et al. (2025). Machine learning for toxicity prediction using chemical structures: pillars for success in the real world. Chem. Res. Toxicol. 38, 759–807. doi:10.1021/acs.chemrestox.5c00033
Shahani, N. M., Kamran, M., Zheng, X., Liu, C., and Guo, X. (2021). Application of gradient boosting machine learning algorithms to predict uniaxial compressive strength of soft sedimentary rocks at Thar coalfield. Adv. Civ. Eng. 2021, 2565488. doi:10.1155/2021/2565488
Shanmugam, N. S., Chen, S.-E., Tang, W., Chavan, V. S., Diemer, J., Allan, C., et al. (2025). Spatial interpolation of Bridge scour point cloud data using ordinary kriging method. J. Perform. Constr. Facil. 39, 06024002. doi:10.1061/JPCFEV.CFENG-4218
Sharma, S., Garg, A., and Shukla, B. K. (2025). Machine learning-based prediction of flexural strength in graphene-enhanced RC beams using ANN, GPR, REP tree, and AR_M5P models. Asian J. Civ. Eng. 26, 2991–3006. doi:10.1007/s42107-025-01355-2
Shealy, T., Ismael, D., Hartmann, A., and van, B. M. (2017). “Removing certainty from the equation: using choice architecture to increase awareness of risk in engineering design decision making,” in 15th Engineering Project Organization Conference with 5th International Megaprojects Workshop. Stanford, United States: EPOS.
Shobayo, O., Adeyemi-Longe, S., Popoola, O., and Okoyeigbo, O. (2025). A comparative analysis of machine learning and deep learning techniques for accurate market price forecasting. Analytics 4, 5. doi:10.3390/analytics4010005
Showkat, R., Jalal, F. E., and Babu, G. L. S. (2025). Estimation of soil water characteristic curve using machine-learning algorithms and its application in embankment response. J. Comput. Civ. Eng. 39, 04025012. doi:10.1061/JCCEE5.CPENG-6062
Tang, D., and Huang, M. (2024). The sustainable development of bridges in China: collapse cause analysis, existing management dilemmas and potential solutions. Buildings 14, 419. doi:10.3390/buildings14020419
Teodorescu, V., and Obreja Braşoveanu, L. (2025). Assessing the validity of k-Fold cross-validation for model selection: evidence from bankruptcy prediction using random Forest and XGBoost. Computation 13, 127. doi:10.3390/computation13050127
Tien Bui, D., Shirzadi, A., Amini, A., Shahabi, H., Al-Ansari, N., Hamidi, S., et al. (2020). A hybrid intelligence approach to enhance the prediction accuracy of local scour depth at complex Bridge piers. Sustainability 12, 1063. doi:10.3390/su12031063
USGS (2016). EarthWord - Scour. Available online at: https://www.usgs.gov/news/science-snippet/earthword-scour.
van Rijn, L. C. (1984). Sediment transport, part III: bed forms and alluvial roughness. J. Hydraulic Eng. 110, 1733–1754. doi:10.1061/(ASCE)0733-9429(1984)110:12(1733)
Wang, C., Yu, X., and Liang, F. (2017). A review of bridge scour: mechanism, estimation, monitoring and countermeasures. Nat. Hazards 87, 1881–1906. doi:10.1007/s11069-017-2842-2
Wardhana, K., and Hadipriono, F. C. (2003). Analysis of recent Bridge failures in the United States. J. Perform. Constr. Facil. 17, 144–150. doi:10.1061/(ASCE)0887-3828(2003)17:3(144)
White, J., and Power, S. D. (2023). k-Fold cross-validation can significantly over-estimate true classification accuracy in common EEG-Based passive BCI experimental designs: an empirical investigation. Sensors 23, 6077. doi:10.3390/s23136077
Wilimitis, D., and Walsh, C. G. (2023). Practical considerations and applied examples of cross-validation for model development and evaluation in health care: tutorial. JMIR AI 2, e49023. doi:10.2196/49023
Wu, J., You, H., Sun, B., and Du, J. (2025). LLM-Driven pareto-optimal multi-mode reinforcement learning for adaptive UAV navigation in urban wind environments. IEEE Access 13, 163550–163570. doi:10.1109/ACCESS.2025.3611336
Xu, F. Y., Zhang, M. J., Wang, L., and Zhang, J. R. (2016). Recent highway Bridge collapses in China: review and discussion. J. Perform. Constr. Facil. 30, 04016030. doi:10.1061/(ASCE)CF.1943-5509.0000884
Xu, C., Wen, Q., Li, P., Liu, H., and Huang, Z. (2025). An experimental Study of wave-induced local scour at a dual-function OWC-Pile breakwater. China Ocean. Eng. 39, 1097–1111. doi:10.1007/s13344-025-0086-6
Yang, Y., Qi, M., Li, J., and Ma, X. (2018). Evolution of hydrodynamic characteristics with scour hole Developing around a Pile Group. Water (Basel) 10, 1632. doi:10.3390/w10111632
Yang, T., Gallagher, C. M., and McMahan, C. S. (2019). A robust regression methodology via M-estimation. Commun. Stat. Theory Methods 48, 1092–1107. doi:10.1080/03610926.2018.1423698
Zhang, G., Liu, Y., Liu, J., Lan, S., and Yang, J. (2022). Causes and statistical characteristics of bridge failures: a review. J. Traffic Transp. Eng. Engl. Ed. 9, 388–406. doi:10.1016/j.jtte.2021.12.003
Keywords: flood-resilient design, scenario-based risk assessment, infrastructure safety, machine learning applications, predictive modeling
Citation: Khan A and Ismael D (2026) Interpretable machine learning for bridge-pier scour prediction and flood resilience. Front. Built Environ. 11:1731114. doi: 10.3389/fbuil.2025.1731114
Received: 23 October 2025; Accepted: 30 December 2025;
Published: 30 January 2026.
Edited by:
Rocio L. Segura, Polytechnique Montréal, CanadaReviewed by:
Azam Abdollahi, Santa Clara University, United StatesYusuf Uzun, Necmettin Erbakan University, Türkiye
Copyright © 2026 Khan and Ismael. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Dalya Ismael, ZGlzbWFlbEBvZHUuZWR1
†ORCID: Adil Khan, orcid.org/0000-0001-5027-5190; Dalya Ismael, orcid.org/0009-0003-7410-3045