- 1 School of Earth Sciences and Engineering, Hohai University, Nanjing, China
- 2 Department of Chemical Engineering, Ahmadu Bello University, Zaria, Nigeria
The precise prediction of petrophysical properties in tight reservoirs is essential for accurate reservoir characterization but remains impeded by significant lithological heterogeneity and complex, nonlinear relationships among well-log features. To address this, we propose a robust and interpretable machine learning framework that synergizes a stacked ensemble architecture with a post hoc physics-informed refinement step for predicting porosity and water saturation. The methodology employs a multi-stage process: (1) model-specific recursive feature elimination with cross-validation (RFECV) to identify optimal feature subsets; (2) a hybrid Genetic Algorithm–Particle Swarm Optimization (GA–PSO) strategy for efficient hyperparameter tuning; and (3) a stacked ensemble integrating Random Forest (RF), LightGBM, and CatBoost, with a Ridge regression meta-learner. We evaluate two configurations: hyperparameter optimization alone (Hybrid_Hyper_XGB) and joint optimization of hyperparameters and stacking weights (Stacked_Hybrid_Full). The superior Stacked_Hybrid_Full model is further enhanced by a post hoc physics-based refinement, where priors derived from the Wyllie time-average equation augmented with density-neutron crossplots and the Archie-Simandoux model are blended as soft regularizers, ensuring geological consistency without retraining. Comprehensive validation demonstrates that the physics-informed Stacked_Hybrid_Full model achieves superior performance, with R 2 values exceeding 0.91 for porosity and 0.83 for water saturation. Depth-resolved analysis confirms a significant reduction in prediction error and improved capture of structural features, particularly within laminated and low-porosity intervals. Model interpretability, probed via SHapley Additive exPlanations (SHAP), identifies permeability, resistivity, gamma ray, and shear velocity as the dominant predictive features and elucidates nontrivial interaction effects aligned with petrophysical principles. This work presents a transferable workflow that successfully bridges data-driven prediction with physical plausibility. The framework significantly enhances predictive robustness and model transparency for petrophysical characterization in heterogeneous tight reservoirs, offering substantial practical utility for reservoir evaluation in unconventional plays.
1 Introduction
Unconventional reservoirs, including shale gas, tight oil, and coalbed methane formations, have emerged as vital contributors to global energy security, accounting for a substantial portion of the world’s available hydrocarbon resources. Accurate prediction of petrophysical properties in unconventional reservoirs is essential for adequate reservoir characterization and optimizing exploration strategies (Zou et al., 2013). However, these petrophysical properties, derived from well log data, exhibit complex nonlinear relationships driven by geological heterogeneity (Yang and Zou, 2019).
Predicting reservoir properties in tight oil formations remains particularly challenging due to inherent complexities, including spatial heterogeneity and anisotropy, which complicate the application of conventional petrophysical and geophysical methodologies (Yang et al., 2016). Unlike traditional petrophysical models, machine learning (ML) algorithms are capable of independently identifying hidden, nonlinear relationships between input features and target outputs, thereby demonstrating superior performance in modeling complex systems (Bai et al., 2022; Dong et al., 2023; Sang et al., 2022; Wood, 2022). Recent studies have shown that ensemble machine learning (EML) models, particularly those based on decision tree architectures, achieve predictive performance comparable to that of deep learning models when applied to tabular datasets (Shwartz-Ziv and Armon, 2021). Notably, the architecture and hyperparameters of EML models can be efficiently optimized by using advanced heuristic techniques (Akande et al., 2017; Gu et al., 2021a; Salem et al., 2022).
Given that well-logging datasets are inherently tabular, EML models are well-suited for predictive tasks such as estimating reservoir properties, often outperforming conventional petrophysical approaches (Bai et al., 2022; Gu et al., 2022; Wang et al., 2020). These models effectively capture complex, nonlinear interdependencies among well-log variables, which is particularly beneficial for characterizing unconventional reservoirs (Abbas et al., 2023; Al-Mudhafar, 2015; Anifowose et al., 2019). Among the most widely employed EML algorithms for reservoir property prediction are Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM) (Al-Mudhafar, 2020; Al-Mudhafar and Wood, 2022).
Despite their predictive power, EML models are often criticized for their limited interpretability, frequently being referred to as “black-box” models, which is an issue that poses barriers to their acceptance in industrial applications (Adadi and Berrada, 2018; Lipton, 2016; Murdoch et al., 2019). To address this issue, recent developments in artificial intelligence have prioritized the advancement of explainable machine learning techniques, collectively referred to as Explainable Artificial Intelligence (XAI). XAI methodologies facilitate transparency and foster trust by bridging the gap between model outputs and human interpretability (Barredo Arrieta et al., 2020; Loh et al., 2022). Among these, Shapley Additive Explanations (SHAP) has emerged as a prominent tool, offering both global and local interpretability insights (Feng et al., 2021; Kavzoglu and Teke, 2022; Markus et al., 2025).
Recent advances in metaheuristic optimization and feature selection have increased efficiency in the use of machine learning for geoscience (Nssibi et al., 2023; Selvam et al., 2024; Zhang et al., 2025). The hybrid approaches Genetic Algorithm-Particle Swarm Optimization (GA-PSO) and Particle Swarm Optimization-XGBoost have improved the prediction of permeability and lithology in tight sandstones by hyperparameter optimization and feature selection (Gu et al., 2021a; Gu et al., 2021b; Sheykhinasab et al., 2023). Enriched PSO–XGBoost interpretable models using SHAP have developed accurate and interpretable models of permeability (Liu and Liu, 2022), and carbonates have had logging-based permeability estimated with novel simulated annealing–genetic hybrid SA–GA–XGBoost (Huang et al., 2025). SHAP-based interpretability has been used to explain permeability predictions (Feng et al., 2024; Mohammadian et al., 2022; Zhang et al., 2024) and shear wave velocity estimations (Zhang et al., 2023), highlighting its effectiveness in identifying key influential features and enhancing trust in ML-derived insights. Regardless, most of the studies neglect hybrid optimization and explainability, treating them separately and focusing on adopted distinctive approaches (LightGBM, XGBoost) instead of a seamless ensemble interpretable design.
Physics-informed machine learning (PIML) integrates governing physical laws such as rock-physics relationships and fluid-flow principles directly into neural network architectures to improve generalization, reduce overfitting, and enhance physical interpretability (Shao et al., 2024). PIML has been applied to upscale permeability from core to reservoir scales using time-lapse geo-electrical data, improving characterization of subsurface fluid dynamics (Sakar et al., 2024). Probabilistic PIML formulations have also strengthened seismic petrophysical inversion by incorporating wave-physics constraints, yielding more reliable porosity estimates from seismic attributes (Khassaf et al., 2025). Advances in multi-scale fracture analysis and hybrid optimization further emphasize the importance of interpretable ML approaches in tight reservoirs, where bedding-parallel fractures exert strong control on permeability (Su et al., 2025; Wen et al., 2025). Similarly, peridynamic simulations of rock deformation demonstrate how nonlinear mechanical behavior can be integrated into PIML frameworks for improved porosity and saturation prediction under heterogeneous overburden conditions (Li et al., 2025; Tian et al., 2025). PIML approaches that incorporate petrophysical constraints such as Gassmann’s equations, Archie’s law, and geological priors for organic-rich intervals have shown substantial improvements in predicting reservoir properties in tight and unconventional formations (Abid et al., 2025; Gai et al., 2025; Pothana and Ling, 2025; Shao et al., 2024). These methods enhance robustness across heterogeneous systems, reducing prediction errors in low-porosity zones and during dynamic processes such as waterflooding (Mabiala et al., 2025).
EML models are increasingly utilized for petrophysical prediction, leveraging optimized hyperparameter tuning to enhance performance. Unlike previous studies where optimization and stacking are decoupled, this work performs simultaneous GA-PSO optimization of both base-model hyperparameters and stacking weights, followed by a post hoc refinement step applied to the stacked ensemble predictions, where a prior is computed from well logs using the Wyllie time-average equation augmented with density-neutron adjustments for porosity and Archie-Simandoux hybrid for water saturation, serving as a soft regularizer. This prior constrains the ensemble outputs via weighted blending tuned via cross-validation, ensuring physical consistency without retraining the models or risking data leakage. This study framework integrates model-specific recursive feature elimination with cross-validation (RFECV) for robust feature selection, a GA-PSO approach for simultaneous tuning of hyperparameters and EML stacking weights, a stacked ensemble model employing a Ridge regression meta-learner, and physics-informed blending to inject geological priors. To evaluate the framework’s efficiency, three distinct labelled scenarios are investigated: (i) Stacked_Hybrid_Full, which optimizes both hyperparameters and stacking weights augmented with physics-informed constraints; (ii) Hybrid_Hyper_XGB, which optimizes only base model hyperparameters with equal-weight averaging; and (iii) Baseline_XGB, a reference model with fixed hyperparameters. To enhance interpretability, SHAP analysis is employed to quantify the contributions of individual features and stacked models, providing geological insights.
2 Overview of study area and data preparation
The Ordos Basin, a major hydrocarbon rich basin in northern China comprises six principal tectonic units: the Yimeng Uplift, Western Margin Thrust Belt, Tianhuan Depression, Yishan Slope, Weibei Uplift, and Jinxi Flexural Fold Belt. During the Late Triassic, progressive tectonic closure drove a basinwide transition from shallow-marine to predominantly lacustrine conditions (Ji et al., 2022). Within this framework, the Yanchang Formation, particularly the Chang 7 Member of muddy shales with thin sandstone interbeds records maximum lacustrine expansion and hosts kerogen-rich, thermally mature successions central to unconventional shale-oil prospectivity (Shi et al., 2022). The study area lies at the confluence of sediment supply from the southwestern and northeastern basin margins, where hydrocarbons in tight sandstones and associated source rocks exhibit minimal lateral migration, indicating largely in-situ accumulation. These accumulations include high-quality shale oils and tight oils, with notable enrichment in the Changqing Oilfield (Liu et al., 2022).
Multi-well log datasets from 10 boreholes are integrated into a unified structure with consistent formatting and identifiers. The wells are distributed within the same work area, spanning lateral facies transitions from proximal delta-front sand-dominated successions to more distal lacustrine mud-rich intervals. Therefore the wells share the same geological conditions and lithological characteristics at the target formation. This spatial spread ensures that the training dataset captures representative heterogeneity of the Chang 7 tight reservoirs (Liu et al., 2022; Yang et al., 2016). Preprocessing involves imputing missing values through linear interpolation and backward filling, removing duplicate and non-informative columns, and normalizing features to zero mean and unit variance for numerical stability. This yielded 5,697 reservoir data points, varying across wells. To assess generalizability, three well-covered wells served as the independent test set, while the remaining 3,864 samples trained the ensemble models. Figure 1 illustrates a representative well log, highlighting depth-dependent variability in elastic and petrophysical properties.
Figure 1. Well log A (a) V P (P-wave velocity), (b) V S (S-wave velocity), (c) density, (d) porosity, (e) water saturation, and (f) shale volume.
To augment the geophysical relevance of the dataset, rock physics-derived attributes including V
P/V
S ratio, acoustic impedance (
3 Methodology
This section presents a comprehensive hybrid machine learning framework to predict porosity and water saturation in unconventional reservoirs using well log data. The methodology integrates hybrid feature selection and ensemble learning enhanced by metaheuristic optimization and interpretable machine learning techniques.
3.1 Feature selection
The implementation of RFECV requires the selection of a machine learning algorithm as the base estimator, which is responsible for generating feature importance rankings used during the recursive elimination process (Chang et al., 2020). Hence, feature selection is conducted by using RFECV tailored to three base learners, including RF, LightGBM, and CatBoost. Each model undergoes independent selection cycles by using 5-fold cross-validation to identify the most informative predictors. We quantified feature selection stability using Jaccard similarity between cross-validation folds and selection frequency analysis. Features selected in ≥80% of folds were considered stable. Unified feature sets are derived by aggregating SHAP importance across all models, selecting the top k features where k represents the median optimal feature count across individual models. The Jaccard similarity between CV folds is
3.2 Hybrid optimization
3.2.1 Model scenario
To evaluate the impact of hybrid optimization and model integration strategies, three scenarios are implemented:
a. Stacked_Hybrid_Full: Focuses on optimizing both base model hyperparameters and stacking weights through a meta-model that learns optimal blending.
b. Hybrid_Hyper_XGB: Optimizes only hyperparameters of the base models, followed by a simple ensemble, originally equal averaging, without explicit weight optimization.
c. Baseline_XGB: A reference model with fixed, non-optimized hyperparameters (e.g., n_estimators = 100, max_depth = 4, learning_rate = 0.05).
3.2.2 Optimization framework
The optimization framework utilizes a hybrid GA-PSO approach to tune hyperparameters for base models (RF, LightGBM, CatBoost) and, in the case of Stacked_Hybrid_Full, stacking weights. This method integrates the global search efficiency of genetic algorithms with the local refinement precision of particle swarm optimization, facilitating effective exploration of the hyperparameter space. The defined hyperparameter ranges are as follows:
• RF: Number of trees (n_estimators, 50–300), maximum depth (max_depth, 3–15), minimum samples to split (min_samples_split, 2–15), minimum samples per leaf (min_samples_leaf, 1–10).
• LightGBM: Number of trees (n_estimators, 50–300), maximum depth (max_depth, 3–15), learning rate (learning_rate, 0.01–0.2), feature fraction (feature_fraction, 0.5–0.8), bagging fraction (bagging_fraction, 0.5–0.8), minimum data in leaf (min_data_in_leaf, 10–50).
• CatBoost: Number of iterations (iterations, 50–300), depth (depth, 3–15), learning rate (learning_rate, 0.01–0.2), L2 regularization (l2_leaf_reg, 1–10), border count (border_count, 32–255).
This yields a total of 15 hyperparameters across the three base models. For the Stacked_Hybrid_Full scenario, an additional three dimensions are included to optimize the stacking weights, enhancing the ensemble’s adaptability.
3.2.3 GA-PSO hybrid optimization
Hyperparameters for base models (RF, LightGBM and CatBoost) and stacking weights for Stacked_Hybrid_Full are optimized using a hybrid GA-PSO approach. The optimization population consists of N = 10 particles, evolved over G = 15 generations. Each particle
For t = 1,….,15, where w is the inertia weight,
The GA component applies crossover (Equation 3):
and mutation with probability
The fitness function minimizes the 10-fold cross-validated mean squared error (MSE) using Equation 5:
This optimization is applied to Stacked_Hybrid_Full and Hybrid_Hyper_XGB, loading scenario-specific parameters, respectively, while Baseline_XGB uses fixed hyperparameters.
3.2.4 Convergence monitoring and stability assessment
To ensure robust optimization convergence and address optimization stability, we implemented comprehensive monitoring mechanisms (Equations 6–8):
where w = 4, and
where G = 15, N = 5 final generations. Generation sufficiency validation is assessed through marginal gains
3.2.5 Stacking and ensemble modeling
The Stacked_Hybrid_Full scenario integrates enhanced stacking techniques to optimize both hyperparameters and weights. Base learner predictions
forming the stacking matrix
solved as
The derived weights β provide critical interpretability insights; for porosity estimation, optimal α = 1.0 yields β = [-0.032, 0.331, 0.701] for [RF, LightGBM, CatBoost], indicating CatBoost’s dominant role (70.1%) in capturing complex porosity relationships. For water saturation, stronger regularization (α = 10.0) produces β = [−0.169, 0.552, 0.617], reflecting balanced contributions between CatBoost and LightGBM for saturation physics. Negative weights function as error-correction mechanisms, downweighting models prone to systematic biases while maintaining ensemble diversity. This approach reduces overfitting (via CV), enhances diversity, and improves interpretability through the geologically meaningful linear coefficients β.
The Hybrid_Hyper_XGB scenario optimizes base model hyperparameters
for M = 3 (RF, LightGBM and CatBoost). The ensemble prediction uses equal-weight averaging (Equation 12):
lacking a meta-model, which distinguishes it from Stacked_Hybrid_Full and preserves its focus on hyperparameter optimization alone.
The Baseline_XGB scenario serves as a reference with fixed hyperparameters
3.2.6 Physics-informed enhancement
A physics-informed hybrid ensemble is introduced through a post hoc refinement step that constrains the stacked ensemble
where
We fit a monotone map g (Equation 14) (isotonic or robust linear, constrained increasing) so that
where
Apply g to obtain the porosity prior on any set used only as a prior in the blend (Equation 15):
With the porosity and true resistivity
The values (a = 1, m = 2, n = 2) are standard for water-wet, low-clay sandstones, as validated in Ordos Basin studies (e.g., average a = 0.98 ± 0.05, m = 1.95 ± 0.1, n = 2.05 ± 0.08 from core-log calibration in Chang 7;) (Yang et al., 2016). These assume clean quartz matrix with R
w from spontaneous potential logs, avoiding Simandoux corrections for minor shaliness (V
sh <30%). Training/validation use the original porosity (dataset label) inside Archie to build
Since the Stacked ensemble prediction is
Here,
3.3 SHAP interpretability analysis
SHAP introduced by Lundberg and Lee (2017), is a unified framework designed to interpret predictions from complex machine learning models. To overcome the interpretability limitations of black-box models, SHAP constructs a simplified surrogate model that approximates the behavior of the original predictive model. This surrogate decomposes the output into additive contributions from each input feature, facilitating transparent insight into the reasoning behind individual predictions. SHAP analysis, a core component of the methodology, quantifies feature contributions for the base models (RF, LightGBM, CatBoost) within Stacked_Hybrid_Full, providing insights into geological drivers (e.g., Vp, Vs, Vp/Vs). SHAP values are computed by using the TreeExplainer for each base model, enabling interpretation of individual model contributions to the stacked ensemble. For a given base model (m), the prediction for a sample
where
where (S) is a subset of the feature indices excluding (j), and
3.4 Workflow overview
Figure 2 illustrates the proposed framework for predicting porosity and water saturation in unconventional reservoirs by using well log data. The methodology integrates robust data preprocessing, including mean imputation and robust scaling, with a multi-stage machine learning pipeline. The workflow employs a hybrid GA-PSO to simultaneously tune hyperparameters of base models (RF, LightGBM, CatBoost) and stacking weights. The pipeline constructs a stacked ensemble with a ridge regression meta-learner and a post hoc physics informed refinement, incorporating bias correction and regularization to address overestimation. Performance is assessed via 5-fold cross-validation, per-well metrics (R 2 , RMSE), and leave-one-well-out CV. SHAP analysis, applied to features and base models quantifies contributions of the features and base models, enhancing model interpretability and providing geological insights.
4 Results
4.1 Feature selection analysis
RFECV revealed clear model-dependent variability across the ensemble learners for both porosity and water saturation prediction targets (Figures 3, 4). The optimal number of features ranged between 8 and 11 depending on the model, reflecting differences in feature interaction complexity captured by each algorithm. For the porosity target, RF achieved optimal performance with 10 features, CatBoost with eight features, and LightGBM with 11 features (Figure 3). Similarly, for the water saturation target, the optimal subsets comprised eight features (Random Forest), 11 features (CatBoost), and nine features (LightGBM) (Figure 4).
Figure 3. RFECV analysis with confidence intervals for porosity, (a) RF, (b) CatBoost and (c) LightGBM.
Figure 4. RFECV analysis with confidence intervals for water saturation, (a) RF, (b) CatBoost and (c) LightGBM.
Feature selection stability analysis across five cross-validation folds indicated that Random Forest exhibited the highest robustness (mean Jaccard = 0.86 ± 0.05), followed by LightGBM (0.62 ± 0.08) and CatBoost (0.36 ± 0.12). Pairwise fold comparisons for Random Forest exceeded 0.80 in 85% of cases, highlighting its ability to consistently identify key predictors (Figures 5a,d for porosity; Figures 6a,d for water saturation). Core geophysical attributes including V S, Poisson’s ratio, and ρ displayed selection frequencies above 80% across all models (Figure 5b for porosity; Figure 6b for water saturation). These attributes form the backbone of elastic and petrophysical interdependencies, making them consistently favored by the models. In contrast, derived and engineered attributes such as Vp/Vs showed intermediate selection frequencies (20%–60%) and strong model-specific variability (Figure 5c for porosity; Figure 6c for water saturation).
Figure 5. Feature selection stability snalysis for porosity. (a) Mean Jaccard similarity across models, (b) heatmap of feature selection frequency per fold (color scale 0–1), (c) histogram of optimal feature counts per model and (d) bar chart comparing stability metrics (mean Jaccard, STD, min Jaccard).
Figure 6. Feature selection stability snalysis for water saturation. (a) Mean Jaccard similarity across models, (b) heatmap of feature selection frequency per fold (color scale 0–1), (c) histogram of optimal feature counts per model and (d) bar chart comparing stability metrics (mean Jaccard, STD, min Jaccard).
4.2 GA-PSO optimization convergence
The hybrid GA-PSO optimization demonstrated robust convergence across all modeling scenarios (Figure 7). Convergence analysis revealed Hybrid_Hyper_XGB for porosity converges at generation 7 (46.7% of total) with excellent stability and Stacked_Hybrid_Full for porosity converging at generation 5 (33.3%) with high stability. While Hybrid_Hyper_XGB for water saturation converges at generation 8 (53.3%) with perfect stability and Stacked_Hybrid_Full for water saturation converging at generation 10 (66.7%) with good stability.
Figure 7. Convergence metrics, (a) convergence generation analysis, (b) optimization marginal gains, (c) R 2 stability in final generations and (d) relative performance improvement.
Hyperparameter trajectories (Figure 8) indicates consistent stabilization trends for n_estimators (220–250), max_depth (8–9) and learning_rate (0.07) after generation 6.
Figure 8. Hyperparameter trajectories, (a) n_estimation evolution, (b) max_depth evolution, (c) learning rate evolution and (d) convergence speed comparism.
R 2 evolution profiles (Figure 9) highlight that both porosity and water saturation targets achieved monotonic improvements during optimization. The Stacked_Hybrid_Full model maintained higher average and best R 2 values at each generation, outperforming Hybrid_Hyper_XGB by +0.002–0.003 in porosity and +0.004 in water saturation. Relative performance improvements reached 0.22% (porosity) and 0.29% (water saturation), with stable R 2 standard deviations <2.4 × 10−4 in final generations (Table 1).
Figure 9. R 2 evolution profiles, (a) porosity best R 2 evolution, (b) porosity average R 2 evolution, (c) water saturation best R 2 evolution and (d) water saturation average R 2 evolution.
As shown in Table 1, marginal R 2 gains fell below 0.0002 beyond generation 10, confirming an optimal exploration–exploitation balance within 15 generations. The Stacked_Hybrid_Full model achieved faster convergence (5–10 generations) than Hybrid_Hyper_XGB (7–8 generations) while delivering higher R 2 stability in the final iterations (Figure 9).
4.3 Learning beahviour and cross-well validation on training data
To understand the impact of training data size on model performance and stability, learning curve analysis is performed for each target property. The learning curves are generated by progressively increasing the number of training samples from the combined set of seven wells and computing the mean cross-validation R 2 score at each increment. Learning curves (Figure 10) reveal that all models improve with increased training data, plateauing around 3,500 samples. Stacked_Hybrid_Full maintained the highest R 2 across all sample sizes, showing better sample efficiency and stability.
Leave-One-Well-Out (LOWO) cross-validation was employed on the seven-well training dataset to quantify model robustness and check for overfitting. This technique systematically holds out data from one well at a time for validation while training on the remaining wells, ensuring the model’s performance is assessed on entirely unseen spatial/geological contexts. This mitigates overfitting by preventing the model from memorizing training-specific patterns and promotes generalization across diverse well conditions. Figures 11a,b demonstrate the LOWO-validated performance for porosity and water saturation predictions. These dual-axis bar charts show R 2 and RMSE for each held-out well. Hatched bars indicate the Hybrid_Hyper_XGB scenario; solid bars indicate Stacked_Hybrid_Full. Consistently high R 2 (>0.8 for most wells in porosity; variable but generally >0.7 in water saturation) and low RMSE (e.g., <1.2 for porosity across wells) confirm robust generalization without signs of overfitting, such as inflated training metrics degrading on validation.
Figure 11. Cross-well performance evaluation on training data (a) porosity and (b) water saturation.
4.4 Depth-wise well-log predictions
To quantify the impact of the post hoc physics constraints, we performed an ablation by comparing pure ensemble predictions that is pre-blending (Stacked_Hybrid_Pure) against the physics-constrained version (Stacked_Hybrid_Full). Metrics (Table 2) reveal blended boosts R 2 by 2%–4% (e.g., 0.8945 to 0.826 for porosity aggregate; 95% CI [0.88, 0.90]) and reduces MAE by 10%–15% versus pure, with narrower residuals in low-porosity (<5%) zones (Figure 12). This outperforms direct Wyllie/Archie baselines (Traditional) (R 2 = 0.47/0.58), while stabilizing variance, confirming soft constraints mitigate overfitting without restricting high-fidelity predictions. Depth overlays (Figure 13) highlight improved tracking of shaly transitions, validating geological consistency. These gains underscore the post hoc approach as a scalable PIML bridge for tight reservoirs.
Figure 12. Porosity predictions comparing Stacked_Hybrid_Full, Stacked_Hybrid_Pure and Traditional ablations for Wells, (a) 1, (b) 2 and (c) 3.
Figure 13. Water saturation predictions comparing Stacked_Hybrid_Full, Stacked_Hybrid_Pure and Traditional ablations for Wells, (a) 1, (b) 2 and (c) 3.
With physics refinement confirmed, we now compare scenarios emphasizing optimization strategy impact. For porosity (Figure 14), Stacked_Hybrid_Full predictions track actual values closely across depths (1,920–2,020 m in (a), 2,060–2,140 m in (b) and (c)), capturing heterogeneities such as porosity spikes at ∼1,980 m and ∼2,120 m. Hybrid_Hyper_XGB and Baseline_XGB overestimate low-porosity zones, leading to smoothed profiles that overlook fine-scale variations. Water saturation profiles (Figure 15) show similar Stacked_Hybrid_Full superiority, with accurate reproduction of saturation gradients (e.g., sharp transitions at ∼2,100–2,120 m), whereas baseline models introduce relics in high-saturation layers.
Figure 14. Porosity predictions comparing Stacked_Hybrid_Full, Hybrid_Hyper_XGB and Baseline models for wells (a) 1, (b) 2, (c) 3.
Figure 15. Water saturation predictions comparing Stacked_Hybrid_Full, Hybrid_Hyper_XGB and Baseline models for wells (a) 1, (b) 2, (c) 3.
4.5 Model performance evaluation
Scatter plots of predicted versus actual values for porosity (Figure 16) and water saturation (Figure 17) across the three independent test wells demonstrate the Stacked_Hybrid_Full model’s superior performance. For porosity, Stacked_Hybrid_Full achieves R 2 values of 0.783, 0.901, and 0.860 (a–c), with points closely aligned along the 1:1 line (0%–14% range), outperforming Hybrid_Hyper_XGB (R 2 = 0.586, 0.600, 0.823) and Baseline_XGB (R 2 = 0.272, 0.562, 0.576), though all models show increased scatter at <4% porosity. For water saturation, Stacked_Hybrid_Full yields R 2 values of 0.902, 0.823, and 0.927, aligning tightly with actual values (30%–100% range), surpassing Hybrid_Hyper_XGB (R 2 = 0.744, 0.673, 0.740) and Baseline_XGB (R 2 = 0.604, 0.331, 0.541), with greater deviations in the 50%–80% range. Stacked_Hybrid_Full’s hybrid stacking enhances its ability to model nonlinear reservoir properties, while Hybrid_Hyper_XGB and Baseline_XGB’s limitations suggest a need for improved handling of heterogeneity.
Table 3 shows the R 2 , RMSE, and MAE for each model (Stacked_Hybrid_Full, Hybrid_Hyper_XGB, Baseline_XGB) across all three wells. The error matrix includes both absolute (RMSE, MAE) and relative (R 2) metrics, to provide a complete picture of model performance.
Residual histograms (Figures 18, 19) reveal error symmetries, which indicate that Stacked_Hybrid_Full has narrow, symmetric error distributions centered near zero, suggesting minimal bias. Hybrid_Hyper_XGB shows moderate spread, while Baseline_XGB residuals are skewed with heavier tails.
Residual-versus-depth scatter plots (Figures 20, 21) test for depth-dependent biases. Stacked_Hybrid_Full residuals cluster randomly around zero without trends, affirming model independence from depth. Hybrid_Hyper_XGB and Baseline_XGB display funnelling (increasing spread at greater depths) and slight positive biases below 2,100 m, potentially linked to unmodeled geological factors.
4.6 Model interpretability analysis using SHAP
For transparency and interpretability in the ensemble prediction pipeline, SHAP analysis is performed exclusively on the Stacked_Hybrid_Full scenario. This model is selected for interpretability as it exhibits superior generalization performance and structural fidelity across all evaluation metrics.
The bar charts and beeswarm plots of mean SHAP values assess the significance and directional influence of input features on porosity and water saturation predictions. Global SHAP feature importance (Figures 22a, 23b) identifies permeability, shale volume, and depth as the dominant predictors for porosity; permeability, resistivity, and depth dominate for water saturation. Beeswarm plots (Figures 22b, 23b) show how feature value ranges influence predictions, such as high permeability and low shale volume increase porosity, while low resistivity and high permeability raise water saturation.
Figure 23. (a) Bar chart of the mean SHAP values and (b) Beeswarm summary plots for water saturation.
Interaction plots (Figures 24, 25) reveal coupled lithology–fluid effects. For porosity, permeability interacts strongly with gamma ray, shale volume, and density; for water saturation, interactions are strongest between permeability and V S, resistivity, and shale volume. These interactions highlight the multi-factor controls on reservoir properties and the model’s ability to capture nonlinear relationships and the value of integrating multiple logs.
Figure 24. Interaction plots for porosity. (a) Permeability with gamma, (b) shale volume with density, (c) depth with permeability, and (d) density with permeability.
Figure 25. Interaction plots for water saturation. (a) Permeability with V S, (b) resistivity with permeability, (c) depth with shale volume, and (d) gamma with permeability.
Collectively, this interpretability analysis validates the scientific robustness of the stacked ensemble. By leveraging SHAP analysis, the framework not only delivers superior prediction accuracy but also provides geologically meaningful explanations for the observed trends. Such transparency is vital for reservoir characterization, risk-informed decision-making, and field development planning.
5 Discussion
This study presents a PIML framework for predicting petrophysical properties in tight reservoirs with the results demonstrating a robust integration of feature selection, model optimization, and physical principles, leading to significant improvements in predictive accuracy and geological consistency.
RFECV identified model-dependent optimal feature subsets, typically comprising 8–11 features for predicting porosity and water saturation. The RF algorithm demonstrated superior selection stability, evidenced by a mean Jaccard similarity of 0.86 ± 0.05, whereas CatBoost exhibited higher variability (0.36 ± 0.12). Core petrophysical attributes namely V S, Poisson’s ratio and ρ were consistently selected with a frequency exceeding 80%, underscoring their fundamental relationship with the target properties.
Hyperparameter optimization via the GA-PSO hybrid algorithm converged efficiently, requiring only 5–10 generations for the Stacked_Hybrid_Full model and 7–8 for the Hybrid_Hyper_XGB model. This process stabilized key parameters within narrow, effective ranges (n_estimators: 220–250; max_depth: 8–9; learning_rate: 0.07). The marginal improvement in R 2 diminished below a threshold of 0.0002 after the 10th generation, with the Stacked_Hybrid_Full model achieving a final performance gain of +0.002–0.004 over other ensembles. Learning curve analysis confirmed the sample efficiency of the approach, with model performance plateauing at approximately 3,500 training samples.
Cross-well validation affirmed the robustness of the developed models. For porosity prediction, the Stacked_Hybrid_Full model consistently excelled (R 2: 0.835–0.969; minimum RMSE: 0.61%), with the exception of one well where the Hybrid_Hyper_XGB model performed best. Predictions for water saturation exhibited greater inter-well variability, with a notably high RMSE in Well 5, highlighting the persistent challenges in modeling fluid saturation within highly heterogeneous intervals.
A post hoc physics-refinement step was implemented, blending log-derived physical models from Wyllie’s equation for porosity and the Archie-Simandoux model for water saturation as soft regularizers within the ML framework. This scalable PIML step yielded a 2%–4% gain in R 2 and a 10%–15% reduction in MAE, without requiring model retraining. The resulting depth profiles demonstrated a superior capacity to capture subsurface heterogeneity, outperforming traditional methods by 50%–90% in accuracy while ensuring geological consistency.
Analysis of test-well predictions confirmed the superiority of the Stacked_Hybrid_Full model. Residual scatter plots were symmetric and showed no systematic bias with depth, in contrast to baseline models which exhibited significant skewness and funnel-shaped error distributions.
SHAP analysis provided model interpretability, identifying permeability, shale volume, and depth as the primary drivers for porosity, which are consistent with mechanical compaction principles. For water saturation, permeability, resistivity, and depth were the dominant features, aligning with the theoretical foundations of Archie’s law. Beeswarm plots and interaction analysis further revealed directional trends (e.g., low shale volume increasing porosity) and feature couplings (e.g., permeability–gamma ray), thereby validating the synergistic use of multiple well logs.
6 Conclusion
This study has introduced and validated a robust, interpretable ensemble learning framework for the prediction of porosity and water saturation in complex tight reservoirs. The methodology directly addresses the persistent challenges of pronounced lithological heterogeneity and strong nonlinear feature interactions by integrating a structured machine learning pipeline with foundational petrophysical principles. The core of the framework leverages a stacked ensemble architecture, synergizing RF, LightGBM, and CatBoost as base learners with a Ridge regression meta-learner.
RFECV identified compact, model-specific feature subsets, with core attributes like V S, Poisson’s ratio, and ρ consistently selected, which underscores the model’s inherent alignment with established petrophysical relationships.
The dual-phase GA-PSO strategy effectively combined the global exploration capability of Genetic Algorithms with the local refinement of Particle Swarm Optimization. This hybrid approach achieved rapid convergence within 5–10 generations, stabilizing optimal hyperparameters and yielding an efficient and diverse set of base learners for the stacking ensemble. The joint optimization of both hyperparameters and stacking weights in the Stacked_Hybrid_Full configuration proved critical, enabling it to consistently outperform the hyperparameter-tuned Hybrid_Hyper_XGB model. Cross-well validation confirmed its robustness, with the model achieving superior and demonstrating stable, unbiased residuals across diverse well conditions. A fundamental contribution is the post hoc integration of rock physics priors using Wyllie’s equation for porosity and the Archie-Simandoux model for water saturation acting as soft regularizers. This scalable PIML step resulted in significant gains and improved the capture of heterogeneous depth profiles, outperforming conventional methods by 50%–90% without requiring retraining.
Furthermore, the framework provides critical interpretability through SHAP analysis, which quantitatively identified permeability, shale volume, and depth as primary drivers for porosity, and permeability, resistivity, and depth for water saturation. These findings are in direct agreement with mechanical compaction theory and Archie’s law. The analysis further revealed specific feature interactions (e.g., permeability-gamma ray coupling), validating the model’s ability to capture the multi-log synergies essential for accurate petrophysical characterization.
While this framework demonstrates robust performance within the geological context of the studied basin, the method could be applicable for the reservoir rocks from similar lithologies of tight clastic strata of different areas with minor adjustments. Its generalizability to formations with fundamentally different lithology or petrophysical characteristics (e.g., carbonates or gas hydrates) requires further validation. Consequently, a key objective for future work is to extend and adapt this methodology for application across diverse tectonic units and reservoir types, which will involve domain adaptation techniques and the integration of additional, domain-specific physical laws.
Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.
Author contributions
UM: Validation, Conceptualization, Software, Methodology, Resources, Visualization, Funding acquisition, Formal Analysis, Writing – original draft, Data curation. JB: Validation, Methodology, Writing – review and editing, Supervision, Data curation, Funding acquisition, Resources, Visualization. MA: Methodology, Visualization, Validation, Writing – review and editing. FR: Methodology, Data curation, Writing – review and editing, Validation. EO: Software, Writing – review and editing, Data curation, Visualization.
Funding
The authors declare that financial support was received for the research and/or publication of this article. This work is supported by the Fundamental Research Funds for Central Universities of Hohai University (Grant no. B240201039) and the National Natural Science Foundation of China (Grant no. 42174161).
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The authors declare that no Generative AI was used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
Abbas, M. A., Al-Mudhafar, W. J., and Wood, D. A. (2023). Improving permeability prediction in carbonate reservoirs through gradient boosting hyperparameter tuning. Earth Sci. Inf. 16 (4), 3417–3432. doi:10.1007/s12145-023-01099-0
Abid, M., Ba, J., Markus, U. I., Tariq, Z., and Ali, S. H. (2025). Modified approach to estimate effective porosity using density and neutron logging data in conventional and unconventional reservoirs. J. Appl. Geophys. 233, 105571. doi:10.1016/J.JAPPGEO.2024.105571
Adadi, A., and Berrada, M. (2018). Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160. doi:10.1109/ACCESS.2018.2870052
Akande, K. O., Owolabi, T. O., Olatunji, S. O., and AbdulRaheem, A. A. (2017). A hybrid particle swarm optimization and support vector regression model for modelling permeability prediction of hydrocarbon reservoir. J. Petroleum Sci. Eng. 150, 43–53. doi:10.1016/J.PETROL.2016.11.033
Al-Mudhafar, W. (2015). Integrating bayesian model averaging for uncertainty reduction in permeability modeling. Proc. Annu. Offshore Technol. Conf. 1, 33–52. doi:10.4043/25646-MS
Al-Mudhafar, W. J. (2020). “Integrating electrofacies and well logging data into regression and machine learning approaches for improved permeability estimation in a carbonate reservoir in a giant southern Iraqi oil field,” in Proceedings of the annual offshore technology conference, 2020-May. doi:10.4043/30763-MS
Al-Mudhafar, W. J., and Wood, D. A. (2022). “Tree-based ensemble algorithms for lithofacies classification and permeability prediction in heterogeneous carbonate reservoirs,” in Proceedings of the annual offshore technology conference. doi:10.4043/31780-MS
Anifowose, F., Abdulraheem, A., and Al-Shuhail, A. (2019). A parametric study of machine learning techniques in petroleum reservoir permeability prediction by integrating seismic attributes and wireline data. J. Petroleum Sci. Eng. 176, 762–774. doi:10.1016/J.PETROL.2019.01.110
Bai, Y., Tan, M., Cao, H., Tang, J., and Liang, Z. (2022). Intelligent classification of carbonate reservoir quality using multisource geophysical logging and seismic data. IEEE Trans. Geoscience Remote Sens. 60, 1–12. doi:10.1109/TGRS.2022.3140790
Barredo Arrieta, A., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., et al. (2020). Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 58, 82–115. doi:10.1016/J.INFFUS.2019.12.012
Chang, J., Li, J., Kang, Y., Lv, W., Xu, T., Li, Z., et al. (2020). Unsupervised domain adaptation using maximum mean discrepancy optimization for lithology identification. Geophysics, 86, ID19–ID30. doi:10.1190/geo2020-0391.1
Dong, S. Q., Sun, Y. M., Xu, T., Zeng, L. B., Du, X. Y., Yang, X., et al. (2023). How to improve machine learning models for lithofacies identification by practical and novel ensemble strategy and principles. Petroleum Sci. 20 (2), 733–752. doi:10.1016/j.petsci.2022.09.006
Feng, D.-C., Wang, W.-J., Mangalathu, S., and Taciroglu, E. (2021). Interpretable XGBoost-SHAP machine-learning model for shear strength prediction of squat RC walls. J. Struct. Eng. 147 (11), 04021173. doi:10.1061/(ASCE)ST.1943-541X.0003115
Feng, P., Wang, R., Sun, J., Yan, W., Chi, P., and Luo, X. (2024). An interpretable ensemble machine-learning workflow for permeability predictions in tight sandstone reservoirs using logging data. Geophysics 89 (5), MR265–MR280. doi:10.1190/GEO2023-0657.1
Gai, J., Jiang, W., Wang, T., Su, X., Dong, C., Yang, E., et al. (2025). A hybrid physics-informed machine learning framework for water cut prediction in waterflooding reservoirs. Results Eng. 28, 107856. doi:10.1016/j.rineng.2025.107856
Gu, Y., Zhang, D., and Bao, Z. (2021a). A new data-driven predictor, PSO-XGBoost, used for permeability of tight sandstone reservoirs: a case study of member of Chang 4 + 5, Western Jiyuan Oilfield, Ordos Basin. J. Petroleum Sci. Eng. 199, 108350. doi:10.1016/j.petrol.2021.108350
Gu, Y., Bao, Z., and Zhang, D. (2021b). A smart predictor used for lithologies of tight sandstone reservoirs: a case study of member of Chang 4 + 5, Jiyuan Oilfield, Ordos Basin. Petroleum Sci. Technol. 39 (7–8), 175–195. doi:10.1080/10916466.2021.1881114
Gu, Y., Yang, Y., Gao, Y., Yan, S., Zhang, D., and Zhang, C. (2022). Data-driven estimation for permeability of simplex pore-throat reservoirs via an improved light gradient boosting machine: a demonstration of sand-mud profile, Ordos Basin, northern China. J. Petroleum Sci. Eng. 217, 110909. doi:10.1016/j.petrol.2022.110909
Huang, C., Zhu, X., Lu, M., Zhang, Y., and Yang, S. (2025). XGBoost algorithm optimized by simulated annealing genetic algrithm for permeability prediction modeling of carbonate reservoirs. Sci. Rep. 15 (1), 14882. doi:10.1038/S41598-025-99627-Z
Ji, X., Wang, H., Ge, Y., Liang, J., and Xu, X. (2022). Empirical mode decomposition-refined composite multiscale dispersion entropy analysis and its application to geophysical well log data. J. Petroleum Sci. Eng. 208, 109495. doi:10.1016/J.PETROL.2021.109495
Kavzoglu, T., and Teke, A. (2022). Predictive performances of ensemble machine learning algorithms in landslide susceptibility mapping using random forest, extreme gradient boosting (XGBoost) and natural gradient boosting (NGBoost). Arabian J. Sci. Eng. 47 (6), 7367–7385. doi:10.1007/S13369-022-06560-8/METRICS
Khassaf, A. K., Al-Hameed, Z. M., Al-Mohammedawi, N. R., Al-Mudhafar, W. J., Wood, D. A., Abbas, M. A., et al. (2025). “Physics-informed machine learning for enhanced permeability prediction in heterogeneous carbonate reservoirs,” in Proceedings of the annual offshore technology conference. doi:10.4043/35892-MS
Li, L., Chen, J., Gao, C., Zhou, Z., Li, M., Zhang, D., et al. (2025). Peridynamics simulation of hydraulic fracturing in three-dimensional fractured rock mass. Phys. Fluids 37 (7). doi:10.1063/5.0274871/3355519
Lipton, Z. C. (2016). The mythos of model interpretability. Commun. ACM 61 (10), 36–43. doi:10.1145/3233231
Liu, J. J., and Liu, J. C. (2022). Permeability predictions for tight sandstone reservoir using explainable machine learning and particle swarm optimization. Geofluids 2022, 1–15. doi:10.1155/2022/2263329
Liu, Q., Li, P., Jin, Z., Sun, Y., Hu, G., Zhu, D., et al. (2022). Organic-rich formation and hydrocarbon enrichment of lacustrine shale strata: a case study of Chang 7 member. Sci. China Earth Sci. 65 (1), 118–138. doi:10.1007/s11430-021-9819-y
Loh, H. W., Ooi, C. P., Seoni, S., Barua, P. D., Molinari, F., and Acharya, U. R. (2022). Application of explainable artificial intelligence for healthcare: a systematic review of the last decade (2011–2022). Comput. Methods Programs Biomed. 226, 107161. doi:10.1016/j.cmpb.2022.107161
Lundberg, S. M., and Lee, S. I. (2017). A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst., 4766–4775. doi:10.48550/arXiv.1705.07874
Mabiala, A. P., Cai, Z., Kouassi, A. K. F., Zhang, H., Mwakipunda, G. C., and Mahamadou, A. S. (2025). Integrating advanced machine learning models for accurate prediction of porosity and permeability in fractured and Vuggy carbonate reservoirs: insights from the Tarim Basin, Northwestern, China. SPE J. 30 (06), 3307–3333. doi:10.2118/226198-PA
Markus, U. I., Ba, J., Abid, M., Faruwa, A. R., and Oli, I. C. (2025). Rock physics and machine learning for lithology identification and estimation of unconventional reservoir properties. Arabian J. Sci. Eng., 1–22. doi:10.1007/S13369-025-10101-4/METRICS
Mohammadian, E., Kheirollahi, M., Liu, B., Ostadhassan, M., and Sabet, M. (2022). A case study of petrophysical rock typing and permeability prediction using machine learning in a heterogenous carbonate reservoir in Iran. Sci. Rep. 12 (1), 1–15. doi:10.1038/s41598-022-08575-5
Murdoch, W. J., Singh, C., Kumbier, K., Abbasi-Asl, R., and Yu, B. (2019). Interpretable machine learning: definitions, methods, and applications. Proc. Natl. Acad. Sci. U. S. A. 116 (44), 22071–22080. doi:10.1073/pnas.1900654116
Nssibi, M., Manita, G., and Korbaa, O. (2023). Advances in nature-inspired metaheuristic optimization for feature selection problem: a comprehensive survey. Comput. Sci. Rev. 49, 100559. doi:10.1016/J.COSREV.2023.100559
Pothana, P., and Ling, K. (2025). Physics-integrated neural networks for improved mineral volumes and porosity estimation from geophysical well logs. Energy Geosci. 6 (2), 100410. doi:10.1016/J.ENGEOS.2025.100410
Sakar, C., Schwartz, N., and Moreno, Z. (2024). Physics-informed neural networks trained with time-lapse geo-electrical tomograms to estimate water saturation, permeability and petrophysical relations at heterogeneous soils. Water Resour. Res. 60 (8), e2024WR037672. doi:10.1029/2024wr037672
Salem, A. M., Yakoot, M. S., and Mahmoud, O. (2022). A novel machine learning model for autonomous analysis and diagnosis of well integrity failures in artificial-lift production systems. Adv. Geo-Energy Res. 6 (2), 123–142. doi:10.46690/AGER.2022.02.05
Sang, W., Yuan, S., Han, H., Liu, H., and Yu, Y. (2022). Porosity prediction using semi-supervised learning with biased well log data for improving estimation accuracy and reducing prediction uncertainty. Geophys. J. Int. 232 (2), 940–957. doi:10.1093/GJI/GGAC371
Selvam, R., Hiremath, P., Cs, S. K., Ramakrishna Bhat, R., Tomar, V., Bansal, M., et al. (2024). Metaheuristic algorithms for optimization: a brief review. Eng. Proc. 59 (1), 238. doi:10.3390/ENGPROC2023059238
Shao, R., Wang, H., and Xiao, L. (2024). Reservoir evaluation using petrophysics informed machine learning: a case study. Artif. Intell. Geosciences 5, 100070. doi:10.1016/J.AIIG.2024.100070
Sheykhinasab, A., Mohseni, A. A., Barahooie Bahari, A., Naruei, E., Davoodi, S., Aghaz, A., et al. (2023). Prediction of permeability of highly heterogeneous hydrocarbon reservoir from conventional petrophysical logs using optimized data-driven algorithms. J. Petroleum Explor. Prod. Technol. 13 (2), 661–689. doi:10.1007/s13202-022-01593-z
Shi, J., Zou, Y. R., Cai, Y. L., Zhan, Z. W., Sun, J. N., Liang, T., et al. (2022). Organic matter enrichment of the Chang 7 member in the Ordos Basin: insights from chemometrics and element geochemistry. Mar. Petroleum Geol. 135, 105404. doi:10.1016/j.marpetgeo.2021.105404
Shwartz-Ziv, R., and Armon, A. (2021). Tabular data: deep learning is not all you need. Inf. Fusion 81, 84–90. doi:10.1016/j.inffus.2021.11.011
Su, X., Zhu, R., Zhang, J., Liu, C., Gong, L., Jiang, X., et al. (2025). Multi-scale characterization and control factors of bedding-parallel fractures in continental shale reservoirs: insights from the Qingshankou Formation, Songliao Basin, China. Mar. Petroleum Geol. 182, 107580. doi:10.1016/j.marpetgeo.2025.107580
Tian, F., Liu, Z., Zhou, J., and Shao, J. (2025). Rock cracking simulation in tension and compression by peridynamics using a novel contact-friction model with a twin mesh and potential functions. J. Rock Mech. Geotechnical Eng. 17 (6), 3395–3419. doi:10.1016/J.JRMGE.2024.10.018
Wang, P., Chen, X., Wang, B., Li, J., and Dai, H. (2020). An improved method for lithology identification based on a hidden Markov model and random forests. Geophysics 85 (6), IM27–IM36. doi:10.1190/geo2020-0108.1
Wen, P., Wang, S., Li, J., Dong, K., Ren, Z., Li, Y., et al. (2025). Multiobjective optimization of a pressure maintaining ball valve structure based on RSM and NSGA-II. Sci. Rep. 15 (1), 21342. doi:10.1038/s41598-025-02158-w
Wood, D. A. (2022). Gamma-ray log derivative and volatility attributes assist facies characterization in clastic sedimentary sequences for formulaic and machine learning analysis. Adv. Geo-Energy Res. 6 (1), 69–85. doi:10.46690/ager.2022.01.06
Yang, Z., and Zou, C. (2019). “Exploring petroleum inside source kitchen”: connotation and prospects of source rock oil and gas. Petroleum Explor. Dev. 46 (1), 181–193. doi:10.1016/S1876-3804(19)30018-7
Yang, H., Li, S., and Liu, X. (2016). Characteristics and resource prospects of tight oil in Ordos Basin, China. Petroleum Res. 1 (1), 27–38. doi:10.1016/S2096-2495(17)30028-5
Zhang, T., Chai, H., Wang, H., Guo, T., Zhang, L., and Zhang, W. (2023). Interpretable machine learning model for shear wave estimation in a carbonate reservoir using LightGBM and SHAP: a case study in the Amu Darya right bank. Front. Earth Sci. 11. doi:10.3389/FEART.2023.1217384/FULL
Zhang, J., Ma, G., Yang, Z., Mei, J., Zhang, D., Zhou, W., et al. (2024). Knowledge extraction via machine learning guides a topology-based permeability prediction model. Water Resour. Res. 60 (7), e2024WR037124. doi:10.1029/2024WR037124
Zhang, R., Wang, J., Liu, C., Su, K., Ishibuchi, H., and Jin, Y. (2025). Synergistic integration of metaheuristics and machine learning: latest advances and emerging trends. Artif. Intell. Rev. 58 (9), 1–64. doi:10.1007/S10462-025-11266-Y
Zou, C. N., Yang, Z., Tao, S. Z., Yuan, X. J., Zhu, R. K., Hou, L. H., et al. (2013). Continuous hydrocarbon accumulation over a large area as a distinguishing characteristic of unconventional petroleum: the Ordos Basin, North-Central China. Earth-Science Rev. 126, 358–369. doi:10.1016/j.earscirev.2013.08.006
Keywords: petrophysical property prediction, tight reservoirs, stacked ensemble, GA-PSO optimization, physics-informed machine learning
Citation: Markus UI, Ba J, Abid M, Richard FA and Obadiah E (2026) Hybrid optimization of interpretable ensemble machine learning for petrophysical property prediction from well logs. Front. Earth Sci. 13:1721227. doi: 10.3389/feart.2025.1721227
Received: 09 October 2025; Accepted: 27 November 2025;
Published: 05 January 2026.
Edited by:
Soroush Abolfathi, University of Warwick, United KingdomReviewed by:
Suhaib Umer Ilyas, Jeddah University, Saudi ArabiaZhengzheng Cao, Henan Polytechnic University, China
Njitacke Tabekoueng Zeric, University of Buea, Cameroon
Copyright © 2026 Markus, Ba, Abid, Richard and Obadiah. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Jing Ba, amJhQGhodS5lZHUuY24=
Uti Ikitsombika Markus1