ORIGINAL RESEARCH article
Front. Built Environ.
Sec. Construction Materials
Heterogeneity-Aware Stacked Machine Learning for Predicting High-Performance Concrete Compressive Strength
Oregon State University, Corvallis, United States
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Abstract
Accurate prediction of compressive strength (CS) in high-performance concrete (HPC) is essential for optimizing mix design and ensuring structural reliability. Unlike conventional concrete, HPC incorporates low water-to-binder ratios, supplementary cementitious materials, and chemical admixtures that introduce stronger nonlinear interactions and greater mix-design variability, increasing modeling complexity. Traditional empirical models often struggle to capture these coupled effects. This study develops a heterogeneity-aware machine learning (ML) framework based on stacked ensemble modeling to enhance prediction accuracy, robustness, and interpretability. The dataset comprises 1,525 HPC mix designs compiled from five independent source datasets drawn from peer-reviewed studies and reports, using eight mix/curing input features to predict CS. Sixteen regression algorithms, including tree-based models, kernel methods, and neural networks, were implemented as base learners. Their out-of-fold predictions trained meta-learners, with Multiple Linear Regression (MLR), Elastic Net Regression (ENR), Support Vector Regression (SVR), Random Forest (RF), and Extreme Gradient Boosting (XGBoost) evaluated as alternatives. In 10-fold cross-validation, all meta-model configurations achieved high predictive accuracy (R² = 0.97–0.98). To assess generalization, we performed heterogeneity-aware evaluation using source-aware grouped cross-validation, in which one entire dataset is held out at a time to measure domain-shift generalization. Results revealed that models trained on mixed-source data can overestimate generalization and deteriorate when predicting unseen sources, a limitation not commonly evaluated in prior HPC studies. While stacking improved robustness, its accuracy was comparable to the best single model but offered enhanced stability across heterogeneous datasets and reduced systematic error. SHAP analysis confirmed the dominant influence of XGBoost and GB while identifying key material parameters governing CS. The proposed framework supports practical engineering decision-making, including mix optimization and early-stage strength assessment, and offers a scalable, interpretable approach for data-driven HPC prediction.
Summary
Keywords
Compressive strength prediction, ensemble learning, Feature importance, High-performance concrete, Interpretability, stacking
Received
18 January 2026
Accepted
18 February 2026
Copyright
© 2026 Samiadel and Soleimani. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Farahnaz Soleimani
Disclaimer
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.