Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Soil Sci.

Sec. Pedometrics

This article is part of the Research TopicAdvancing spatial prediction of soil properties using remotely sensed data and geospatial artificial intelligence (GeoAI): Challenges, opportunities, and future directionsView all 3 articles

Ensemble Machine Learning for Digital Mapping of Soil pH and Electrical Conductivity in the Andean Agroecosystem of Peru

Provisionally accepted
Carlos  Carbajal llosaCarlos Carbajal llosa1*Antony  BarjaAntony Barja2Samuel  PizarroSamuel Pizarro1
  • 1Instituto Nacional de Innovacion Agraria, La Molina, Peru
  • 2Universidad Nacional Mayor de San Marcos, Lima District, Peru

The final, formatted version of the article will be published soon.

In agricultural systems, soil pH and electrical conductivity (EC) are crucial chemical properties that directly affect nutrient availability and microbial activity, but the challenging environment of the Peruvian Andes has limited research on their estimation. This study aimed to develop an ensemble learning method to predict soil pH and EC in Andean agroecosystems using environmental predictors. By using simple and weighted averaging, we developed a heterogeneous ensemble learning approach that integrates machine learning (ML) algorithms, including Support Vector Machine (SVM), Artificial Neural Network (ANN), Random Forest (RF), and Extreme Gradient Boosting (XGBoost). The weighted ensemble assigns weights to models based on their predictive accuracy, measured by R² from spatial cross-validation. Spatial patterns are noticeable, and pH displays greater spatial clustering than EC. Elevation was the most important predictor in ML models for both parameters. Ensemble models significantly outperformed individual models, with the weighted ensemble achieving R² >0.93 and reducing RMSE by approximately 72%. Among standalone models, RF and XGBoost performed best for pH, while SVM performed the best for EC. ANN models were the least effective. Uncertainty analysis indicated high confidence in pH predictions but moderate to high uncertainty in EC predictions, suggesting that EC is more challenging to predict. Ensemble models with optimized weighting provide robust and accurate mapping of spatially autocorrelated soil properties. The high-confidence pH maps are reliable for soil management decisions, while EC predictions, though more uncertain, effectively identify priority areas for future sampling and investigation.

Keywords: ensemble learning, Spatial machine learning, digital soil mapping, soil pH, electrical conductivity

Received: 26 Jul 2025; Accepted: 20 Oct 2025.

Copyright: © 2025 Carbajal llosa, Barja and Pizarro. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Carlos Carbajal llosa, cmcarbajal@gmail.com

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.