Ensemble machine learning for digital mapping of soil pH and electrical conductivity in the Andean agroecosystem of Peru

Carbajal-Llosa, Carlos; Barja, Antony; Pizarro, Samuel

doi:10.3389/fsoil.2025.1673628

ORIGINAL RESEARCH article

Front. Soil Sci., 06 November 2025

Sec. Pedometrics

Volume 5 - 2025 | https://doi.org/10.3389/fsoil.2025.1673628

This article is part of the Research TopicAdvancing spatial prediction of soil properties using remotely sensed data and geospatial artificial intelligence (GeoAI): Challenges, opportunities, and future directionsView all 3 articles

Ensemble machine learning for digital mapping of soil pH and electrical conductivity in the Andean agroecosystem of Peru

Carlos Carbajal-Llosa^1*

Antony Barja²

Samuel Pizarro³

¹Dirección de Servicios Estratégicos Agrarios, Instituto Nacional de Innovación Agraria (INIA), Lima, Peru
²Escuela Profesional de Ingeniería Geográfica, Facultad de Ingeniería Geológica, Minera, Metalúrgica y Geográfica, Universidad Nacional Mayor de San Marcos (UNMSM), Lima, Peru
³Dirección de Servicios Estratégicos Agrarios, Estación Experimental Agraria Santa Ana, Instituto Nacional de Innovación Agraria (INIA), Huancayo, Peru

In agricultural systems, soil pH and electrical conductivity (EC) are crucial chemical properties that directly affect nutrient availability and microbial activity, but the challenging environment of the Peruvian Andes has limited research on their estimation. This study aimed to develop an ensemble learning method to predict soil pH and EC in Andean agroecosystems using environmental predictors. By using simple and weighted averaging, we developed a heterogeneous ensemble learning approach that integrates machine learning (ML) algorithms, including Support Vector Machine (SVM), Artificial Neural Network (ANN), Random Forest (RF), and Extreme Gradient Boosting (XGBoost). The weighted ensemble assigns weights to models based on their predictive accuracy, measured by R² from spatial cross-validation. Spatial patterns are noticeable, and pH displays greater spatial clustering than EC. Elevation was the most important predictor in ML models for both parameters. Ensemble models significantly outperformed individual models, with the weighted ensemble achieving R² >0.93 and reducing RMSE by approximately 72%. Among standalone models, RF and XGBoost performed best for pH, while SVM performed the best for EC. ANN models were the least effective. Uncertainty analysis indicated high confidence in pH predictions but moderate to high uncertainty in EC predictions, suggesting that EC is more challenging to predict. Ensemble models with optimized weighting provide robust and accurate mapping of spatially autocorrelated soil properties. The high-confidence pH maps are reliable for soil management decisions, while EC predictions, though more uncertain, effectively identify priority areas for future sampling and investigation.

1 Introduction

Soil pH and electrical conductivity (EC) are fundamental physicochemical properties that exert a significant influence over agri-environmental systems. Soil pH, a measure of acidity or alkalinity, governs the solubility and bioavailability of essential nutrients and heavy metals, thereby shaping plant growth, microbial activity, and nutrient cycling (1–3)https://www.zotero.org/google-docs/?FyBhZ0. Salt-affected soils pose a significant threat to soil quality and agricultural productivity, particularly in arid and semi-arid regions, as they affect crop yields and soil health, thereby directly impacting food security (4, 5). EC serves as an indicator of soluble salt concentration and is widely used as a proxy for assessing soil salinity levels, with EC measurements being crucial for understanding soil-water-plant relationships (6, 7). Both properties are crucial for assessing soil fertility, crop suitability, and land degradation, particularly in the context of precision agriculture and sustainable land management.

The spatial variability of soil pH and EC is shaped by a complex interplay of factors, including parent material, land use, topography, climate, and human intervention (8, 9) https://www.zotero.org/google-docs/?wq6crv. Accurate mapping of their distribution is crucial for informing soil management strategies, optimizing fertilizer application, and mitigating the effects of salinity-related constraints. In recent years, digital soil mapping (DSM) has emerged as a powerful tool for characterizing such spatial heterogeneity, enabling data-driven decision-making at various spatial scales (10–12). This is especially relevant in the context of intensifying climate change and increasing anthropogenic pressures, both of which are altering key biophysical processes and productivity determinants across agricultural landscapes (13–15) https://www.zotero.org/google-docs/?AQDjhA.

Machine learning (ML) algorithms have become integral to DSM due to their ability to model complex, nonlinear relationships between soil properties and environmental covariates (16; van der 17). Algorithms such as Random Forest (RF), Support Vector Machines (SVM), Decision Trees, and Gradient Boosting Machines (i.e., XGBoost) have been extensively validated and shown to outperform traditional geostatistical approaches in many settings (18, 19). For instance, Vandana et al. (20) demonstrated the superior performance of RF in predicting soil pH and EC using 202 surface samples (0–15 cm) and 14 environmental predictors, achieving high model accuracy (pH: RMSE = 0.014, R² = 0.81; EC: RMSE = 0.134, R² = 0.73). In Arctic regions, RF has demonstrated superior performance compared to K-Nearest Neighbor (KNN) and Cubist algorithms for predicting soil properties (21) https://www.zotero.org/google-docs/?0LDDI1. Similarly, Bandak et al. (22) reported that Decision Tree models exhibited optimal performance for EC prediction over a 24,000-hectare area, identifying soil moisture, elevation, and vegetation indices as major predictors.

Recent studies have demonstrated the effectiveness of various ML approaches across different environments. Feature selection methods combined with machine learning have been shown to eliminate redundant features and effectively improve model performance. Notably, recursive feature elimination (RFE) combined with RF achieves the highest prediction accuracy for soil pH mapping (23). CatBoost models have emerged as particularly effective for predicting soil salinity, demonstrating superior performance compared to Random Forest and XGBoost models, with the ability to handle categorical data being a key advantage (24, 25). Additionally, on-the-go sensing using apparent electrical conductivity (EC) has proven to be a useful, efficient, and cost-effective surrogate to represent within-field soil spatial variability (26, 27).

While individual ML models offer strong predictive capabilities, ensemble approaches such as stacking and bagging have shown promise in further improving spatial accuracy and robustness (28, 29). Mishra et al. (30) emphasized the benefit of using median predictions from multiple ML algorithms to account for model uncertainty and enhance spatial precision. Ensemble strategies that integrate base and meta-learners, combined with rigorous validation methods such as nested cross-validation, have consistently ranked Random Forest among the best-performing models for predicting diverse soil properties (31, 32). Padarian et al. (33) further emphasized the capacity of ensemble models to capture complex, nonlinear relationships in the soil environment, a major advantage for DSM applications.

Ensemble learning with stacked generalization combines the results from multiple ML algorithms to develop an integrated mapping output with relatively stable performance, though this approach remains relatively uncommon in DSM (34, 35). Weighted model averaging approaches have demonstrated effectiveness in mountainous forested areas, with quantile regression forests achieving the best prediction performance in most cases. Model averaging outperformed individual models in several instances (36, 37). Recent applications of stacking ensemble models have demonstrated significant improvements in various environmental applications, including soil moisture prediction, water quality assessment, and agricultural yield forecasting (38, 39).

In mountainous regions, where soil formation is influenced by steep altitudinal gradients, diverse microclimates, and varied land use systems, the spatial prediction of soil properties is particularly challenging (40, 41). Mountain soils encounter several difficulties in digital mapping, including high local variability, non-linear relationships between environmental covariates and soil properties, and limited accessibility in complex topographical settings (42, 43). Despite these complexities, machine learning has shown promising results in regional applications. For example, Carbajal et al. (44) employed RF, SVM, XGBoost, and artificial neural networks (ANN) to predict soil organic carbon (SOC) in the Peruvian Andes, identifying pH as a key explanatory variable. Machine learning-based digital mapping in mountainous terrain has demonstrated significant success, with random forest regression models achieving high performance (R² = 0.80, 0.79, 0.72, and 0.84 for clay, sand, silt, and SOC, respectively) when integrating soil-forming factors of the scorpan model (45).

Although research on ML-based DSM is expanding, significant gaps in our understanding persist (46, 47). The pronounced microclimatic variability and topographic heterogeneity of Andean agroecosystems likely contribute to substantial spatial variation in both pH and EC. These conditions present both a methodological challenge and an opportunity to refine predictive modeling approaches for soil properties in mountainous landscapes (48). Studies have applied ML to predict soil properties in Andean regions, with pH and EC commonly identified as crucial factors in these models. However, no specific studies have used ensemble methods to predict the pH and EC in Andean soils. Most existing studies are concentrated on other soil properties and temperate or semi-arid regions, where soil-forming factors and land use pressures differ significantly from high Andean environments (49, 50). As such, the application of ensemble learning strategies for the DSM of soil pH and EC in Andean agroecosystems remains underexplored. Addressing this gap is essential for supporting precision agriculture, guiding soil conservation efforts, and promoting climate-resilient land management in one of the world’s most environmentally and agriculturally diverse regions (51, 52).

This study develops an ensemble method for predicting soil pH and electrical conductivity (EC) in Andean agroecosystems using environmental covariates. The key contributions are: (1) enhanced prediction accuracy through stacking ensemble approaches that integrate Random Forest (RF), Support Vector Machines (SVM), Artificial Neural Networks (ANN), and Extreme Gradient Boosting (XGBoost) algorithms; (2) improved spatial mapping in mountainous terrain with high heterogeneity by combining the complementary strengths of each base learner; and (3) demonstration of ensemble learning potential for digital soil mapping in precision agriculture and sustainable land management of Andean environments.

2 Materials and methods

2.1 Study area

The Alto Mantaro sub-basin (11°30’-12°04’S, 75°05’-75°49’W) encompasses 2,114.82 km² in the northwestern Mantaro River basin (Figure 1). As indicated on the Warren Thornthwaite climate classification map provided by the Peruvian National Meteorology and Hydrology Service (SENAMHI), the most important area is characterized by a semi-dry, temperate, and humid climate throughout the year (53). The data indicates a maximum temperature range of 21-25°C, a minimum range of 7-11°C, and annual precipitation estimates between 700 and 2000 millimeters. Leptosols are the most common soil type, characterized by their shallow depth, stony composition, poor development, and lack of distinct features. Formed on solid rock in mountainous regions, these areas are highly susceptible to erosion and have limited agricultural value, though they are used for extensive grazing (54). According to the Ecological and Economic Zoning of the Junín department, the study area exhibits predominant soil textures of sandy loam (34.87%), loam (12.75%), and clay loam (9.65%). The vegetation is primarily Andean grasslands, which account for over 50% of the total area. Moderately steep hills and mountains (15-25° slopes) constitute 26.70% of the area, and agricultural lands span 672.07 km² (31.77%) across the Mantaro Valley provinces of Jauja, Concepción, and Huancayo (55). The sub-basin encompasses 59 districts and 438 population centers, comprising 356,440 inhabitants and 89,345 dwellings (56).

Figure 1

Map of the Alto Mantaro Sub-basin in Peru, showing elevation, rivers, roads, and urban areas. Yellow dots indicate sampling points. An inset shows the location of the Mantaro Basin in Peru.

Figure 1. Study area indicating soil sampling locations and elevation features.

2.2 Soil sampling and laboratory analysis

A conditioned Latin hypercube sample (cLHS) was used to determine 204 sample points. Stratified random sampling, based on covariate distributions, is used by the cLHS method to select samples. This improves the sampling scheme by minimizing an energy function that reflects its accuracy in approximating a Latin hypercube for all covariate distributions (57, 58). Spatial sampling points were determined using the clhs R package (59), incorporating elevation, slope, profile curvature, plan curvature, aspect, valley depth, topographic wetness index, topographic position index, land surface temperature, transformed soil-adjusted vegetation index, and accumulated cost as covariates. At each sampling location, samples were collected at a 30 cm depth and later analyzed for pH using the EPA guideline (USEPA 2004), and EC was determined with the saturation extract method (60) at the LABSAF (Soil, Water, and Foliar Laboratory) of the Santa Ana Agrarian Experimental Station.

2.3 Environmental predictors

Soil-forming factors, such as topography and vegetation, along with soil physical and moisture properties, are represented by a set of 21 environmental predictors. Three primary sources of spatial information were employed in this investigation: remote sensing imagery (RS), a digital elevation model (DEM), and soil properties derived from the SoilGrids database (16). Multiple spectral indices (Table 1) were computed from Landsat 8 OLI imagery spanning February through August 2023, which was acquired through the Google Earth Engine (GEE) platform (69). Utilizing SAGA software (70), we generated a suite of topographic predictors from DEM, including aspect, topographic position index (TPI), topographic ruggedness index (TRI), slope, and topographic wetness index (TWI). The study also included properties such as soil texture (clay, sand, silt), bulk density (BD), and volumetric water content (VWC) at three key moisture levels: field capacity (-33 kPa), permanent wilting point (-1500 kPa), and saturation (-10 kPa).

Table 1

Table 1. List of the indices derived from remote sensing used for spatial modeling.

2.4 Exploratory spatial data analysis

2.4.1 Spatial autocorrelation

Spatial autocorrelation measures the extent to which nearby data points in a geographic dataset are related to one another. It essentially assesses the probability of spatially adjacent values being similar. This phenomenon plays a crucial role in interpreting spatial patterns and relationships in data, significantly impacting environmental and geographical research, including soil science. We analyzed spatial autocorrelation in soil EC and pH data using Moran’s I (Equation 1) and Geary’s C (Equation 2) statistics.

\begin{array}{l} I = \frac{n}{W} \cdot \frac{\sum_{i} \sum_{j} w_{i j} (x_{i} - \bar{x}) (x_{j} - \bar{x})}{\sum_{i} {(x_{i} - \bar{x})}^{2}} & (1) \end{array}

\begin{array}{l} C = \frac{(n - 1)}{2 W} \cdot \frac{\sum_{i = 1}^{n} \sum_{j = 1}^{n} w_{i j} {(x_{i} - x_{j})}^{2}}{\sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2}} & (2) \end{array}

Where:

n = number of observations

x_{i}, x_{j} = observed values at locations i and j .

\bar{x} = mean of all observations .

w_{i j} = spatial weight between locations i and j .

W = \sum_{i = 1}^{n} \sum_{j = 1}^{n} w_{i j} = sum of all spatial weights .

Spatial autocorrelation in geographical data is often measured using Moran’s I and Geary’s C (71, 72). A Moran’s I value close to +1 signifies robust positive spatial autocorrelation, whereas values near -1 denote spatial dispersion. In contrast, Geary’s C demonstrates an inverse association; accordingly, lower Geary’s C values reflect increased clustering within the data. To define spatial neighbors, we utilized the spdep package (73). Graphics such as the Moran scatterplot enable the visualization of the relationship between the value at each location and the values of neighboring locations. This indicates whether the data exhibits positive spatial autocorrelation (where similar values cluster together), negative spatial autocorrelation (where dissimilar values cluster together), or no autocorrelation (indicating a random distribution). We used Local Indicators of Spatial Autocorrelation (LISA) cluster classification to generate a spatial representation with significant clusters: “High-High” (hotspots), “Low-Low” (cold spots), “High-Low” (spatial outliers), and “Low-High” (spatial outliers).

2.4.2 Feature selection

A correlation plot (matrix heatmap) was generated to explore variables and their relationships, with particular focus placed on highly correlated covariates that might not contribute to the modeling. Subsequently, random feature subset search (RFSS) was employed to select model predictors. Sixty subsets were evaluated to determine their respective pH and EC variables. As a fundamental aspect of the Random Forest methodology, RFSS was developed to constrain each ensemble model to a random selection of features, thereby improving performance through a bias-variance trade-off and enhancing diversity. The mlr3 ecosystem was utilized, specifically the mlr3select package (74), with root mean square error (RMSE) employed as the measure of efficiency.

2.5 Machine learning models

Four ML models were used: SVM, ANN, RF, and XGBoost. A variety of regression models, each fine-tuned with specific hyperparameters, were developed for later use in ML tasks like training, validation, and ensembling. To understand why we got a specific result from our predictors (75) and to help interpret the models, we performed Feature Importance Analysis (FIA) only for RF and XGBoost. Importance scores were extracted directly from the trained model and tabulated for convenient plotting.

2.5.1 Support vector machine

The theoretical basis of SVM is based on the work of Vapnik et al. (76), which discusses its use for regression estimation, multidimensional splines construction, and solving linear operator equations, showcasing its versatility in function approximation. In this study, an SVM regression model was implemented using the e1071 R package (77) with a radial basis function (RBF) kernel. During operation, an SVM implicitly transforms data points into an infinitely dimensional space, seeking a linear hyperplane within it. The parameter type was defined as eps-regression, described as standard epsilon-insensitive regression. A tube of width 2*epsilon was created around the regression line by this method, within which errors were disregarded. Hyperparameters such as cost or C were employed, which control the penalty for misclassification of training points. The use of a high C value can result in a smaller margin and overfitting if the data is noisy, as the model aims to accurately classify all training examples. In contrast, a lower C value increases the margin, potentially misclassifying more data points but leading to better generalization. C was set to 10 to achieve a good balance between model complexity and training error. We also used the hyperparameter gamma to adjust the influence of single training examples. Lower values indicate a smoother decision boundary; in this case, we choose a 0.1 value. Lastly, predictions within a ±0.1-unit margin of tolerance were deemed accurate and penalty-free, utilizing an epsilon parameter set to 0.1.

2.5.2 Artificial neural networks

Inspired by biology, the ANN started with the Perceptron, the first trainable neural network (78). It evolved to multilayer networks with backpropagation, leading to practical learning algorithms (79). An ANN regression model was defined using the nnet R package (80). In this case, non-linear patterns were learned using a single-layer feedforward neural network with weighted connections. During model definition, the parameters size, maxit, and decay were taken into account. The size parameter was employed to set the number of units (neurons) in the hidden layer; a size of 10 was selected, which determines the network’s capacity to learn complex patterns. A maxit value of 500 was applied to set the maximum number of iterations. To prevent overfitting, a decay value of 0.01 was implemented for weight decay regularization, which penalizes large weights.

2.5.3 Random forest

The RF emerged from the work of Leo Breiman, building on ensemble methods like bagging, where multiple models are trained on different data samples (81), and incorporating the feature randomness aspect from the random subspace method (82). This model effectively handles missing data, identifies key variables, resists overfitting, and is user-friendly, providing easily interpretable results for diverse data types (83). The prediction function of an RF-like learner was developed with the ranger R package (84). This ensemble method builds decision trees and averages their predictions. Our model incorporated parameters such as the number of trees set to 500 and mtry, which represents the number of randomly selected variables per split and equals the square root of the total number of features. This introduces randomness and reduces overfitting. Other parameters include min.node.size, set to 5, which determines the minimum number of samples in a terminal node, controls tree depth, and prevents overfitting. By quantifying impurity reduction, the importance parameter ranks the covariates, showing which are most crucial for accurate predictions.

2.5.4 Extreme gradient boosting

XGBoost epitomizes the advancements in gradient boosting methodologies. By incorporating novel regularization strategies and computational optimization methods, XGBoost delivers a more resilient and scalable algorithm (85). This model boasts state-of-the-art performance, efficient handling of missing data, built-in regularization, and rapid training (86). The best results usually come from structured or heterogeneous tabular data (87). The XGBoost regression model was implemented using the xgboost R package (88). This gradient boosting method constructs trees sequentially, with each tree designed to correct errors produced by preceding trees. Model complexity is regulated through the critical parameter nrounds, which was configured to 100, thereby specifying the number of boosting rounds (trees). Additional key parameters were employed, including the learning rate (eta), which was established at 0.1. The maximum depth of each tree was configured to be 6, thereby controlling the complexity of individual trees. Randomness was introduced through the subsample parameter, set to 0.7, which prevents overfitting by determining the sample fraction utilized per tree. Similarly, the colsample_bytree parameter was also established at 0.7, providing additional randomization by specifying the fraction of features employed for each tree, thus enhancing model generalization.

2.6 Ensemble approach development

Ensemble Machine Learning is a powerful approach that combines multiple models to improve prediction accuracy and robustness (89), with success in various real-world applications and problem domains (90). The ML ensemble approach was presented using a four-stage flowchart (Figure 2). We implemented two types of ensemble models: a simple average ensemble using bagging-inspired equal weighting across heterogeneous models and a performance-weighted ensemble using stacking-inspired optimization based on spatial cross-validation scores. The first method was based on the arithmetic mean of all four model predictions, following the “wisdom of crowds” philosophy, which assumed that each model contributed equally valuable information. This approach was characterized as simple and robust and effectively minimized individual model bias. The second method was based on a weighted combination derived from spatial cross-validation performance. Greater influence was assigned to models with superior performance through this technique, leveraging their strengths to potentially achieve enhanced overall results. Normalized weights were calculated proportionally using spatial cross-validation R² values, with a minimum weight applied to each model to avoid zero weights (higher R² values were assigned higher weights). The three previously mentioned metrics were also employed to evaluate the performance of both ensemble approaches. A comprehensive table was constructed to compare the performance of ensemble and individual machine learning models.

Figure 2

Flowchart depicting a machine learning process for soil data analysis. It begins with data preparation, loading soil data and covariates like topographic and vegetation indices. Features are scaled and split for cross-validation. In model training, machine learning models such as SVM, ANN, RF, and XGBoost use performance metrics to calculate weights. Predictions from these models are combined through ensemble methods, including simple averaging and a weighted ensemble. The final outputs include prediction maps, performance metrics, and uncertainty maps.

Figure 2. Procedure using the ensemble approach.

2.7 Raster predictions

Raster predictions of pH and EC were generated for the entire study area using trained ML models to create continuous spatial maps. The main methodology, illustrated in the flowchart (Figure 3), involved pixel-by-pixel model application across the study area using the terra package’s predict function. Previously, the input data required preprocessing steps such as scaling, which was crucial for scale-sensitive models like SVM and ANN. Additionally, imputation was necessary to manage missing values (NA pixels); otherwise, a single missing pixel could have rendered the entire prediction invalid. Raster predictions were created for every ML model and both ensemble methods that were evaluated.

Figure 3

Flowchart depicting a process of using raster data with environmental covariates for machine learning predictions. The steps include selecting a pixel, extracting covariate values, creating a data frame, applying machine learning models (SVM, ANN, RF, XGBoost), obtaining prediction results, and assigning values to pixel locations. The process iterates over pixels until all are processed, completing the prediction raster.

Figure 3. Flowchart to generate raster prediction maps.

2.8 Model calibration and validation

A thorough spatial cross-validation (SP-CV) process was developed to evaluate each ML model. Data splitting was conducted using the spatial block cross-validation method, which divided the study area into five folds using a 3×3 grid. The process proceeded with iterative training for each fold, where models were trained on four spatial blocks (training) and subsequently predicted the remaining spatial block (testing). For evaluation purposes, Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Coefficient of determination (R²) were calculated for each fold (Equations 3–5), and average performance across all five folds was obtained through the aggregation process. This analysis was conducted to evaluate the performance of ML models, providing realistic assessments of how accurately each model predicted the response variable in unobserved spatial locations. The SP-CV method was considered crucial for accurate soil mapping, as it ensured reliable model predictions in unsampled areas.

\begin{array}{l} R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}} & (3) \end{array}

\begin{array}{l} M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - \hat{y_{i}} | & (4) \end{array}

\begin{array}{l} R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}} & (5) \end{array}

where $y_{i}$ are the observed values, $\hat{y_{i}}$ are the predicted values, $\bar{y}$ the mean of the observed values, and n is the number of samples.

Our evaluation encompassed an uncertainty analysis of the generated raster predictions, employing the coefficient of variation (CV) to quantify the relative variability of predictions from the highest-performing model. Models disagree more and are less certain when CV values are high.

3 Results

3.1 Descriptive statistics of soil properties

Summary statistics for soil properties (response variables), based on 204 data points, are shown in Table 2. Our soils range in pH from strongly acidic to moderately alkaline (pH 4.2 – 8.0), with moderate variability and slightly negative skew. In contrast to EC, there is a broad range of values from 0.9 to 78.8 mS m^-1 (non-saline to moderately saline), with extremely high variability and highly positively skewed. The variations of pH (Figure 4A) and EC (Figure 4B) with elevation are displayed to illustrate a pattern. A trend of increasing pH and EC from higher to lower elevations was observed, with the highest values concentrated in the central valley.

Table 2

Table 2. Summary of descriptive statistics for predictors and response variables.

Figure 4

Map comparison of soil property variation by elevation, showing two panels, A and B. Panel A depicts soil pH levels with color-coded dots: blue for pH 4.2-5.2, purple for pH 5.2-6.4, red for pH 6.4-7.5, and dark red for pH 7.5-8. Panel B displays electrical conductivity (EC) with blue indicating EC 0.9-3.5 mS/m, purple for EC 3.5-6.7, red for EC 6.7-10.2, and dark red for EC 10.2-78.8. Both panels include a gradient for elevation ranging from 3,194 to 5,182 meters.

Figure 4. (A) pH value variations in relation to elevation. (B) Electrical conductivity variations in relation to elevation.

3.2 Identification of spatial autocorrelation

The analysis of spatial autocorrelation, using Moran’s I and Geary’s C, found notable positive autocorrelation in the soil properties. The data in Table 3 reveals a high degree of spatial clustering in soil pH (I = 0.500, C = 0.506, p< 0.001), in contrast to EC, which presents only moderate spatial autocorrelation (I = 0.295, C = 0.755, p< 0.001). The stronger spatial structure that was observed in pH compared to EC indicated that pH distribution was more strongly influenced by landscape-level controls, while more localized variation was exhibited by EC. The use of spatial modeling approaches was justified by these results, and spatial cross-validation was necessitated to avoid overoptimistic performance estimates.

Table 3

Table 3. Spatial autocorrelation analysis.

A Moran scatterplot was used to visualize the relationship between variables and their spatially lagged neighbors in the autocorrelation analysis. The pH plot displayed a strong positive correlation, with data points clustering densely in the high-high (HH, top-right) and low-low (LL, bottom-left) quadrants, demonstrating pronounced spatial autocorrelation (Figure 5A). In contrast, the EC plot exhibited a similar positive trend but with greater scatter and notable presence of outliers in the high-low (HL, top-left) and low-high (LH, bottom-right) quadrants, indicating moderate positive spatial autocorrelation with more heterogeneous spatial patterns (Figure 5B).

Figure 5

Moran scatter plots for spatial autocorrelation of soil properties, labeled A and B with black dots showing data distribution. Plot A graphs spatially lagged pH against pH values, displaying a positive trend with a blue line and shaded confidence interval. Plot B graphs spatially lagged EC against EC values, also showing a positive trend with a blue line and shaded confidence interval.

Figure 5. (A) Moran scatterplot for soil pH. (B) Moran scatterplot for electrical conductivity.

A LISA cluster classification map was used to identify distinct spatial clustering patterns for both variables. For pH (Figure 6A), High-High (H-H) clusters revealed localized areas of extremely high values surrounded by similarly elevated neighbors, demonstrating strong positive spatial autocorrelation. Low-Low (L-L) clusters exhibited an inverse pattern, with below-average pH values spatially concentrated in coherent regions, also indicating positive autocorrelation. Spatial outliers, marked by High-Low (H-L) and Low-High (L-H) clusters, highlighted situations where high or low values were isolated and bordered by contrasting values. These clusters denote negative spatial autocorrelation and could indicate problematic data points or areas of transition. In contrast, EC displayed markedly different spatial clustering behavior (Figure 6B), with only sparse H-H and L-H clusters detected across the study area. Large portions of the EC dataset showed non-significant spatial autocorrelation, suggesting more random or weakly structured spatial patterns that warrant additional investigation to understand the underlying processes driving this distribution.

Figure 6

Panel A shows a LISA cluster map highlighting areas with significant clusters. Red dots indicate high-high clusters, blue represents low-low, orange for high-low, light blue for low-high, and grey for not significant. Panel B shows a similar map with red for high-high and light blue for low-high clusters, with grey for not significant areas. Both maps feature a geographical area with clusters distributed throughout.

Figure 6. (A) Local Indicators of Spatial Autocorrelation(LISA) cluster map for soil pH. (B) Local Indicators of Spatial Autocorrelation(LISA) cluster map for electrical conductivity.

3.3 Correlation analysis between soil properties and environmental predictors

Strong internal correlations within variable clusters were observed through correlation matrix (Figure 7), suggesting that multicollinearity may be present in the dataset. High positive correlations represented with larger circles were detected among the moisture variables (VWC_FC, VWC_PMP, and VWC_SAT), indicating that similar aspects of soil water content were being assessed by these measures. Potential redundancy in vegetation measurements was suggested by the moderate to strong correlations found among vegetation indices (GNDVI, MSAVI, NDVI, and SAVI). Similar correlation patterns were observed for topographic variables (elevation, slope, and TPI), indicating comparable spatial relationships. In contrast, diverse correlation patterns (smaller circles) were exhibited by variables such as clay, silt, sand, and topographic indices (TRI, TWI), suggesting that unique insights could contribute to model development by these parameters. In addition, we can appreciate “X” symbol, which indicates correlations that are not statistically significant at the 0.10 significance level (90% confidence).

Figure 7

A correlation matrix heatmap displaying relationships between various environmental and soil parameters. Labels like EC, Aspect, and NDVI are shown along axes. Circular color gradients represent correlation values ranging from -1 to 1, indicated by a color bar from purple to yellow.

Figure 7. Correlation matrix heatmap among electrical conductivity (EC), aspect, bulk density (BD), clay, chlorophyll vegetation index (CVI), elevation, enhanced vegetation index (EVI), green normalized difference vegetation index (GNDVI), modified soil-adjusted vegetation index (MSAVI), normalized difference infrared index (NDII), normalized difference vegetation index (NDVI), sand, soil adjustment vegetation index (SAVI), silt, slope, topographic position index (TPI), topographic ruggedness index (TRI), triangular vegetation index (TVI), topographic wetness index (TWI), volumetric water content at field capacity (VWC_FC), volumetric water content at permanent wilting point (VWC_PMP), and volumetric water content at saturation (VWC_SAT) variables.

3.4 Predictions with machine learning and ensemble approaches

The relative contributions of environmental and soil covariates to soil pH and EC prediction were assessed through feature importance analysis comparing RF and XGBoost models. For pH prediction (Figure 8A), ten features were evaluated: Elevation, TRI, VWC_PWP (Volumetric Water Content at Permanent Wilting Point), Aspect, BD, Clay, TWI, CVI, EVI, and NDVI. Elevation emerged as the dominant predictor in both models, demonstrating the highest normalized importance (>90%) and underscoring its critical role in pH spatial distribution. Notable differences between models were observed, with RF consistently assigning higher importance to most features compared to XGBoost, particularly for TRI, VWC_PWP, and BD. For EC prediction (Figure 8B), the feature set was expanded to include SAVI, GNDVI, NDII, TPI, Sand, Silt, and VWC_SAT (Volumetric Water Content at Saturation). Elevation and Aspect dominated the prediction framework, accounting for over 90% and 80% of feature importance respectively in both models. Model-specific preferences were again evident, with RF assigning greater importance to TRI, EVI, and SAVI, while XGBoost emphasized Aspect and BD more heavily.

Figure 8

Two side-by-side bar charts labeled A and B compare feature importance using Random Forest (green) and XGBoost (orange). Chart A shows fewer features with elevation and TRI having high importance. Chart B includes more features, with elevation and aspect being prominent. Both charts use normalized importance percentages on the x-axis.

Figure 8. (A) Feature importance comparison of Random Forest and Extreme Gradient Boosting for pH. (B) Feature importance comparison of Random Forest and Extreme Gradient Boosting for electrical conductivity.

The raster maps in Figure 9 were generated by all ML models following training on observed pH data. Two further maps were created using ensemble techniques. Distinct spatial patterns in pH prediction were observed across the four ML models. RF and XGBoost models demonstrated similar predictive patterns, identifying extensive areas of neutral pH conditions (6.0-7.5) predominantly in flat terrain, while mountainous regions were characterized by acidic conditions (pH< 6.0). By comparison, the SVM model predicted alkaline pH values (7.5-9.0) for the majority of the study area, whereas neutral conditions appeared broadly, regardless of elevation or topographic location. The ANN model exhibited the most variable predictions, generating a heterogeneous spatial pattern with pH values fluctuating between acidic and alkaline conditions across the landscape without clear topographic associations.

Figure 9

Map images illustrating different pH predictions in a region using various machine learning models and ensemble methods. The left panel shows predictions by SVM, ANN, RF, and XGBoost models. The right panel shows average and weighted ensemble methods. Colors represent pH levels, from blue for less than or equal to 4.0 to red for greater than 9.0.

Figure 9. Spatial predictions of pH using all individual models and ensemble frameworks.

Additional EC map groups were generated by the same ML models and subsequently combined through ensemble methods to produce two additional maps (Figure 10). Moderate predictions with balanced distribution across the 4.0-15.0 mS·m^-¹ range were produced by SVM, with localized high-conductivity areas being identified. In contrast, the most extreme predictions were exhibited by ANN, with extensive areas exceeding 20.0 mS·m^-¹ being concentrated primarily in the central and southern portions of the study area. Predictions from the RF model were characterized as conservative, with values in 4.0–10.0 mS·m^-¹ range, though some areas with highly elevated EC levels were included, suggesting that a more stable prediction pattern was being generated. The most restrained predictions with minimal extreme values were produced by XGBoost, indicating that the most conservative approach among the four tested algorithms was being employed. Very similar patterns were exhibited by both ensemble methods, with individual model predictions being successfully balanced and extreme values observed in the ANN model being effectively smoothed out.

Figure 10

Comparison of machine learning models and ensemble methods using maps to display electrical conductivity (EC). Left panel shows models: SVM, ANN, RF, and XGBoost. Right panel shows ensemble methods: Average and Weighted. A color legend indicates EC values, ranging from green for less than or equal to 1.5 to red for greater than 20.0 mS/m.

Figure 10. Spatial predictions of electrical conductivity(EC) using all individual models and ensemble frameworks.

3.5 Uncertainty estimation and assessment of models

The coefficient of variation (CV) maps revealed striking differences in ensemble model uncertainty between predictions of soil pH and EC across the study area. For pH predictions, excellent model agreement was demonstrated across 70-80% of the study area (Figure 11A), with CV values ≤ 0.10 dominating the spatial pattern. Moderate uncertainty zones (CV = 0.10-0.25) appeared as scattered patches, likely representing topographic transition areas with complex pH gradients. High uncertainty areas (CV = 0.25-0.50) were relatively rare, occurring as isolated patches under unique environmental conditions. In contrast, EC predictions exhibited widespread model disagreement, with extensive regions showing CV values between 0.25 and 0.50 across large portions of the landscape (Figure 11B). Large contiguous areas with CV > 0.50 indicated high uncertainty clusters, potentially reflecting complex geochemical processes that were inadequately captured by available predictors. Consensus areas (CV ≤ 0.10) were limited and fragmented, appearing as scattered patches throughout the study area.

Figure 11

Maps labeled A and B depict the study area with varying uncertainty levels shown by color-coded shading. Yellow indicates the lowest uncertainty (less than or equal to 0.10), progressing to orange, magenta, and gray, which shows the highest uncertainty (greater than 0.50), according to a coefficient of variation scale.

Figure 11. (A) Relative disagreement between ensemble weighted model predictions according to coefficient of variation maps for pH. (B) Relative disagreement between ensemble weighted model predictions according to coefficient of variation maps for electrical conductivity.

Performance metrics for the four machine learning models (SVM, ANN, RF, XGBoost) and their ensemble approaches (Average and Weighted) in predicting soil pH and EC are presented in Table 4. For soil pH prediction, substantial variation in model effectiveness was observed among individual algorithms. XGBoost demonstrated superior performance with the highest R² (0.993) and lowest MAE (0.849), while RF achieved the lowest RMSE (1.002). Conversely, ANN exhibited the poorest performance across all metrics, recording the highest RMSE (1.854) and MAE (1.432). Each ensemble approach consistently outperformed all individual models, with significant improvements in error reduction and prediction accuracy. The ensemble methods effectively leveraged the strengths of individual algorithms while mitigating their respective weaknesses, resulting in more robust and reliable soil property predictions. The Weighted Ensemble proved to be the superior model overall, achieving the lowest RMSE (0.282) and MAE (0.214), along with a very strong R² of 0.946. This represents a remarkable 72% reduction in RMSE compared to the best individual model (RF) and a 75% reduction in MAE compared to the best individual model (XGBoost), clearly demonstrating the power of the ensemble approach in generating more accurate and reliable pH predictions.

Table 4

Table 4. Performance of machine learning models and ensemble approaches.

A similar trend was observed for the prediction of soil EC. Among the individual models, performance was inconsistent. While the SVM model produced the lowest RMSE (8.303) and MAE (5.36), its R² value was low (0.636), indicating a poor fit. Conversely, the XGBoost model achieved a near-perfect R² of 0.998, but with higher error values than SVM. The ANN model had the highest error metrics by a significant margin. Once again, the ensemble approach resolved these inconsistencies and delivered superior performance. The Weighted Ensemble model was the clear winner, achieving the lowest RMSE (2.340) and MAE (1.119), while maintaining a high R² of 0.939. This represents a 72% decrease in RMSE compared to the best individual model (SVM). These results show that ensemble methods leverage the strengths of their component models to produce a final model characterized by high accuracy (low error) and strong explanatory power (high R²).

4 Discussion

4.1 Measures of soil properties and their spatial behavior

The measured soil pH in our study area ranged from 4.2 to 8.0 (Table 1) and aligns with typical Andean soil gradients, which vary from acidic volcanic to neutral or alkaline valley soils. Research demonstrates that Peruvian Andean soils exhibit a distinct pH gradient, with topsoil displaying moderately acidic conditions (pH 5.5) while deeper layers remain slightly alkaline (pH 7.4) (914). At a depth of 30 cm, pH values range between 3.9 and 5.8 (92), confirming the acidic nature of surface and subsurface horizons. Our results also showed a broad range of EC (0.9–78.8 mS·m^-¹). While some locations exhibited high salinity, the low average EC values suggest that most soils were non-saline and thus structurally unstable and prone to nutrient deficiencies (3).

Spatial autocorrelation analysis revealed non-random distributions for both pH and EC (Table 2). The pronounced clustering of pH values suggests the influence of extensive, contiguous zones characterized by dominant soil types or parent materials, factors that, according to Eger et al. (93), exert a stronger influence on pH than climate. Moreover, high-resolution topographic data have been shown to reliably predict variations in topsoil pH (94). EC also displayed spatial clustering, though less pronounced than pH, suggesting that pH distribution factors have a more substantial and consistent influence over larger areas. EC appears more susceptible to localized factors such as variations in moisture content, agricultural inputs distribution, or measurement inconsistencies, as the interpretation of apparent electrical conductivity readings varies by location and soil type (95).

The integration of Moran scatter plots (Figure 5) with LISA cluster classification maps (Figure 6) confirms these patterns and supports the use of ML approaches that incorporate spatial coordinates or covariates to explain ecosystem condition variation. This approach is consistent with previous research demonstrating that incorporating spatial information enhances accuracy by addressing spatial autocorrelation (96). Understanding these spatial patterns is crucial for designing efficient sampling strategies and targeting management interventions (97). Performance remains context-dependent: robust for pH but less effective for EC. Although Local Moran and G-statistics are useful for identifying clusters, they do not provide significance testing, unlike heuristic methods (98).

The implications of our results highlight the importance of spatial patterns in EC and pH within agricultural settings, given their strong correlation with soil nutrients. Research indicates that an increase of one unit in potassium results in an approximate rise of 0.24 units in EC, whereas a similar increment in magnesium leads to an increase of about 0.68 units in EC, in contrast, an increase of P by one unit was related to a decrease of EC by 0.3 units (99). Under this line, other studies reveal that long-term annual manure applications in Andean agriculture have proven practical and beneficial, maintaining soil pH and increasing EC levels (100). Also, it was revealed that there were inconsistent relationships between soil salinity and altitude, with variations across different ecosystems. Some studies report a decrease in EC at higher elevations, a trend associated with greater leaching (101). Conversely, research in tropical rainforests indicates a significant increase in EC with altitude, suggesting that cooler high-altitude temperatures may inhibit nutrient removal processes, leading to ion accumulation (102). While another study found no significant difference in EC among sites at varying altitudes, lower altitudes exhibited higher pH levels, suggesting that factors including parent material, temperature, and precipitation may be more influential than altitude (103).

4.2 Correlations analysis between predictors

Multicollinearity could be a problem, especially with the VWC cluster variables, as shown in the correlation analysis in Figure 7. Models such as RF and XGBoost exhibit greater capacity to address multicollinearity compared to linear models (104, 105). Soil texture variables (clay, silt, and sand) showed high intercorrelation; using a single variable (i.e., clay) is preferable due to spectral similarities and the indistinctness of hillslopes (106). Similarly, one or two vegetation indices suffice to avoid redundancy while capturing plant health. Multi-index analyses remain valuable for ecosystem monitoring (107). When EC is the target variable, the analysis reveals moderately positive relationships with certain soil properties (clay content, some VWC measures), while relationships with terrain attributes are weak. This aligns with the typically strong correlation between EC and soil moisture influenced by soil texture, where higher EC usually suggests more clay and less sand (99). Previous studies have confirmed that EC exhibits a relatively strong spatial correlation with both clay percentage and pH (108), and the soil organic carbon to clay-sized particle ratio is strongly influenced by soil pH (109). Our research identified a negative correlation between altitude and pH, corroborating studies that show higher altitudes significantly reduce pH (p< 0.001) (110), although climate and mineralogy complicate quantification (111). When examining the relationship between pH and EC, research suggests a weak, negative correlation between pH and EC (99).

4.3 Feature importance analysis to machine learning

Topographic variables, particularly Elevation, TWI, Aspect, and TPI, were the most powerful predictors in both RF and XGBoost models (Figure 8). Elevation ranks as the most important variable because it serves as a powerful proxy controlling multiple environmental processes at the landscape scale (112). Processes influenced by climate (temperature and rainfall) systematically change with elevation (113, 114), while TWI explicitly models the impact of downhill water flow on soil properties by identifying areas prone to water accumulation (115). Although topographic and soil conditions influence vegetation indices that measure plant health (116, 117), the connection between vegetation indices and our target variables is weakened by the overriding influence of long-term landscape processes. Researchers employ remote sensing algorithms to analyze vegetation indices across different land areas, enabling the identification of soil variations affecting plant growth (118).

4.4 Use of ensemble approach for predictions purposes

The ensemble approach improves upon traditional bagging methods, such as RF, by integrating the unique strengths of different algorithm families to more effectively model complex spatial patterns. The use of an averaged ensemble produces spatially more realistic results by balancing individual model predictions, effectively smoothing extreme values in individual models (particularly ANN models for pH) while preserving spatial patterns identified by other models (Figure 9). This is compressible because ensemble learning is so strong and effective, it improves how well models predict. In recent years, ensemble learning has become a significant research focus, resulting in more studies across diverse application areas (119). Since the weighted ensemble and simple average produced nearly identical results, either the weights were evenly distributed across models or model performance was consistent during validation. As an example about their advantages of using this approach, a nested optimization algorithm, which tuned hyperparameters and determined optimal weights for combining ensembles, was used with a stacked ensemble method and weighted average to minimize variance and maps from the EC consistently show similar spatial patterns across all methods (Figure 10), with higher values concentrated in the lower-elevation central valley and lower values in the highlands. The ensemble approaches provide more spatially coherent predictions with smoother transitions between EC classes, reducing the pixelated or noisy patterns visible in some individual models. Recent studies have demonstrated that ensemble learning has become a significant research focus, particularly with stacked ensemble methods that utilize nested optimization algorithms to simultaneously tune hyperparameters and determine the optimal weights for combining models. This approach effectively minimizes both variance and bias (120). Numerous other studies further confirm that ensemble models provide more reliable and versatile prediction accuracy, thereby enhancing decision-making across various applications (36, 121–124).

4.5 Analysis of predictive uncertainty

The stark differences between pH and EC uncertainty maps (Figure 11) demonstrate that pH is inherently more predictable than EC in Andean soils, likely because pH is more directly controlled by parent material and basic soil-forming processes, which are well-captured by topographic and climatic predictors (94). The high EC uncertainty explains the dramatic differences observed between individual models, particularly the extreme predictions of ANN models. These uncertainty maps identify high-uncertainty EC areas as priority locations for additional sampling to improve model accuracy and reduce prediction uncertainty in risk assessment. Alternatively, a study used auxiliary soil data and a general linear model to reduce EC uncertainty (125). By contrast, we can confidently use pH predictions to inform most management decisions. The significant differences in spatial uncertainty patterns between pH and EC predictions are explained and validated by the results in Table 3, which supports our ensemble approach. Some models demonstrated high R² values for some individual models (especially XGBoost and ANN), combined with high error metrics, suggesting that these models may be overfitting or producing unrealistic, extreme predictions, particularly evident in the spatial maps of the ANN model. XGBoost delivers strong performance in predicting soil properties, as supported by studies showing RMSEs of 1.03–1.09 for pH and 26.53 mg/kg for phosphorus, consistent with dataset variability and highlighting some limits with nutrient prediction (126). According to recent studies, XGBoost models outperform traditional methods in predicting soil freezing characteristic curves (127), and soil organic matter (128). On the other hand, ensemble methods effectively combine model strengths and reduce individual weaknesses, which is especially valuable in EC where significant disagreement between models occurs. EC prediction remains more challenging than pH, as evidenced by higher MAE values, even in ensemble methods, which explains the higher uncertainty patterns in your CV maps (Figure 11). Our fivefold cross-validation tuning process was surpassed by the performance of the models. This method verifies that the chosen hyperparameters are not only effective on the training data but also demonstrate robust generalization to unseen data, thus reducing the risk of overfitting (36).

5 Conclusions

Andean soil pH and EC show substantial variability, where EC displays notably higher variability and positive skewness compared to pH. We identified distinct spatial patterns using spatial autocorrelation analysis (Moran’s I, Geary’s C, LISA), which shows strong, statistically significant spatial clustering for pH, indicating dominant landscape-level controls. In contrast, EC exhibits moderate spatial autocorrelation, suggesting its distribution is more influenced by localized factors.

Correlation analysis reveals significant multicollinearity among groups of related variables, such as soil moisture indices, VIs, and soil texture. Topographic variables, especially Elevation, are the most influential predictors for both pH and EC in machine learning models, like RF and XGBoost, serving as proxies for combined environmental processes. Soil texture variables and certain VIs provide unique information but must be selected carefully to prevent redundancy.

Ensemble models generated the most spatially coherent and realistic maps for both pH and EC, outperforming any individual model. Among the standalone models, XGBoost and RF performed best for pH, while SVM had the best error metrics for EC. The ANN models consistently performed the poorest, exhibiting unrealistic spatial predictions and potential overfitting. The success of the ensemble approach lies in its ability to balance these individual predictions, effectively smoothing extreme values from models like ANN to produce more reliable and accurate maps with smoother transitions between value classes. Ensemble approaches (Average and Weighted) consistently and significantly outperform all individual ML models (SVM, ANN, RF, XGBoost) for predicting both soil pH and EC. The Weighted Ensemble achieved the highest accuracy, demonstrating a substantial error reduction of approximately 72% in RMSE compared to the best individual models, and strong explanatory power (R² > 0.93).

Through uncertainty assessment with CV maps, a significant difference is observed in the values of pH and EC. The predictions for pH are quite certain (low CV) over most of the area, but the EC predictions are widely uncertain (high CV), suggesting that predicting EC accurately within this framework is inherently difficult. High-uncertainty EC zones are priority targets for future sampling. The strong spatial structure of pH and the demonstrated high accuracy of its ensemble predictions make it reliable for informing soil management decisions. The higher uncertainty associated with EC predictions necessitates caution in their application for risk assessment but identifies specific areas needing further investigation. Understanding these spatial patterns is crucial for efficient sampling and targeted interventions.

As a recommendation for ensemble modeling, implementing methods with optimized weighting schemes is highly advised for DSM of spatially autocorrelated properties, such as pH and EC. This approach effectively leverages the strengths of different models and reduces individual weaknesses, resulting in improved accuracy and robustness. Lightweight ensemble frameworks also show potential for real-time applications.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Author contributions

CC-I: Conceptualization, Investigation, Writing – original draft, Data curation, Software, Formal Analysis, Methodology. AB: Writing – original draft, Visualization, Data curation. SP: Writing – review & editing, Validation.

Funding

The author(s) declare financial support was received for the research and/or publication of this article. This research was funded by the INIA project CUI 2487112 “Mejoramiento de los servicios de investigación y transferencia tecnológica en el manejo y recuperación de suelos agrícolas degradados y aguas para riego en la pequeña y mediana agricultura en los departamentos de Lima, Áncash, San Martín, Cajamarca, Lambayeque, Junín, Ayacucho, Arequipa, Puno y Ucayali”.

Acknowledgments

To the personnel of the Soil, Water, and Foliars Laboratory (LABSAF) at the Santa Ana Agrarian Experimental Station (EEA).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

1. Fausak LK, Bridson N, Diaz-Osorio F, Jassal RS, and Lavkulich LM. Soil health – a perspective. Front Soil Sci. (2024) 4:1462428. doi: 10.3389/fsoil.2024.1462428