Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Earth Sci., 22 January 2026

Sec. Georeservoirs

Volume 13 - 2025 | https://doi.org/10.3389/feart.2025.1750129

This article is part of the Research TopicExploring Hydrocarbon Origins and Reservoir Dynamics in Complex Geological SettingsView all articles

Accurate intelligent modeling of mud loss while drilling wells via soft computing methods

Lulwah M. AlkwaiLulwah M. Alkwai1Kusum YadavKusum Yadav1Yasser AlharbiYasser Alharbi1Debashis Dutta,Debashis Dutta2,3Hojjat Abbasi
Hojjat Abbasi4*
  • 1College of Computer Science and Engineering University of Ha’il, Ha’il, Saudi Arabia
  • 2Sharda School of Engineering and Sciences, Sharda University, Greater Noida, UP, India
  • 3Department of Computers Techniques Engineering, College of Technical Engineering, The Islamic University, Najaf, Iraq
  • 4Chemistry Department, Herat University, Herat, Afghanistan

Accurate prediction of mud loss volume in drilling operations is a critical challenge in industries such as petroleum engineering and geothermal well construction. Unforeseen mud loss leads to substantial economic losses, operational delays, and environmental concerns, underscoring the urgent need for robust predictive models. To address this, current study investigates the application of advanced ensemble machine learning techniques. Specifically, this study presents models utilizing Random Forest (RF), Adaptive Boosting (AdaBoost), and Decision Trees (DT), alongside a custom stacking-based Ensemble Learning framework that integrates Random Forest, AdaBoost, and Decision Tree models, to model and predict mud loss volume. The analyses incorporated key influencing factors: mud viscosity, differential pressure, hole size, and solid content. Empirical data comprising 2,820 observations were utilized from a Middle Eastern oil field. Data integrity was ensured through the leverage technique for outlier detection, and model generalizability was enhanced using 5-fold cross-validation during the training phase (90% of the dataset), with 10% reserved for independent testing. Among the evaluated models, the AdaBoost approach demonstrated superior predictive performance. It achieved a test coefficient of determination (R2) of 0.828, on the testing dataset. Sensitivity analyses revealed that mud viscosity and solid content inversely affect mud loss, while hole size and differential pressure consistently lead to its increase. These results confirm the efficacy of AdaBoost for highly accurate mud loss prediction. This work distinguishes itself by providing a comprehensive comparison of multiple advanced ensemble ML techniques on a large, real-world dataset from an active oil field. The findings offer a more reliable and robust tool for forecasting mud loss, thereby enhancing operational efficiency and risk mitigation in drilling operations. This contributes to optimizing drilling decisions beyond the capabilities of traditional analytical methods by providing data-driven, actionable insights.

1 Introduction

The drilling industry occupies a critical position within the framework of modern industrial operations, underpinning a diverse array of sectors, including petroleum, geothermal energy production, hydrogen storage, and carbon dioxide sequestration (Mohamed et al., 2021; Song et al., 2023; Ugarte and Salehi, 2021; Kumar Singh and Priyadarshini Nayak, 2024). This industry fuels global energy demands and has a significant role in the transition toward energy sources. In petroleum extraction, drilling operations are essential for accessing and producing fossil fuels, while geothermal drilling offers renewable energy alternatives, tapping into the Earth’s heat for sustainable power generation. Furthermore, the emergence of hydrogen storage techniques and CO2 sequestration strategies highlights the versatility of drilling operations, where advanced approaches are used to enhance the safety of resource extraction and subsurface storage. As such, effective management of drilling operations has profound implications for economic viability and environmental stewardship (Okoro et al., 2018; Lavrov and Tronvoll, 2021; Pang et al., 2024).

However, mud loss is a significant challenge in drilling operations, which can lead to severe financial and operational difficulties. The mud loss can result in significant monetary losses due to non-productive time, increased operational costs, and potential damage to the drilling equipment (Taheri et al., 2024; Orun et al., 2023; Saihood and Samuel, 2022; Asadimehr, 2024). Moreover, mud loss can lead to complications such as pipe sticking, which can immobilize the drill string and necessitate expensive retrieval operations, or the onset of kicks, an uncontrolled influx of formation fluids into the wellbore that can escalate into a dangerous blowout situation. These challenges underscore the importance of effectively understanding and managing mud loss to safeguard personnel, equipment, and operational budgets (Brankovic et al., 2021; Zhu, 2022).

Mud loss can arise from various factors, including geological disturbances, the properties of the fluids, and the approaches utilized while the drilling operation. For instance, the permeability of the drilled formation can significantly impact the mud loss rate. Additionally, the composition and features of the drilling mud, including viscosity and density, play pivotal roles in either containing or exacerbating the mud loss phenomenon. Moreover, human factors, such as decision-making and operational procedures, further contribute to the complexity of mud loss incidents (Taheri et al., 2024; Shad et al., 2021; Agwu et al., 2021). Several preventative measures are employed within the drilling industry to mitigate the risks associated with mud loss. Techniques such as properly selecting fluid composition while drilling and monitoring parameters and employing various wellbore integrity strategies are pivotal in minimizing mud loss incidents. For example, utilizing non-damaging fluids or specialized mud additives can enhance the mud’s ability to seal porous formations effectively, thereby reducing the chances of loss. Furthermore, proactive monitoring systems that provide live data on well conditions can enable engineers to adjust drilling parameters dynamically, thereby increasing the chances of successful operations while minimizing mud loss (Zhang Z. et al., 2022; Mahdi and Alrazzaq, 2024; Keshavarz and Moreno, 2023).

Despite the various strategies in place, the phenomenon of mud loss is influenced by several interconnected factors, including hole size, differential pressure between the wellbore and the surrounding formations, and the rheological properties of drilling fluids. These parameters can interact in complex ways, leading to unpredictable loss volumes, necessitating a comprehensive understanding of their interdependency. For drilling engineers, predicting the occurrence and volume of mud loss poses a considerable challenge, as the multitude of influencing factors can result in sudden and unanticipated changes in drilling conditions, thus complicating preventative measures and operational strategies (Pang et al., 2022a; Magzoub et al., 2021; Pang et al., 2022b). The unpredictability of mud loss has led to exploring various methodologies to quantify this critical parameter. Traditional approaches, such as empirical models and analytical solutions, have often proven inadequate due to their reliance on simplified assumptions and linear correlations, which do not account for the non-linear complexities inherent in drilling operations. While some methods have attempted to provide rudimentary predictions based on historical data or geologic analogs, these approaches often lack the granularity and adaptability needed to address the dynamic nature of drilling environments (Gowida et al., 2022; Wood et al., 2022; Zhang et al., 2021; Ji et al., 2023).

Recent advances in machine learning have demonstrated its transformative potential across diverse domains of geotechnical and materials engineering (Bassir and Madani, 2019; Hasanzadeh and Madani, 2024; Madani and Alipour, 2022), further underscoring its applicability to drilling-related challenges. For instance, Thapa and Ghani (2025) introduced hybrid deep learning models integrated with environmental assessment frameworks to advance Sustainable Development Goals (SDG 9 and SDG 12), showcasing how ML can optimize soil stabilization strategies for urban resilience. Similarly, Benzaamia et al. (2025) highlighted the efficacy of ensemble tree-based algorithms in forecasting durability performance of concrete, thereby contributing to sustainable construction practices. In another notable study, Le et al. (2022) demonstrated the ability of ANN models to capture complex nonlinear relationships between rock properties and mechanical strength, offering predictive insights that surpass traditional empirical correlations.

Machine learning offers the potential to harness vast amounts of drilling data to identify relationships that may not be obvious through traditional statistical analyses. The successful use of machine learning in calculation analyses in different fields underscores its value in enhancing operational efficiency and decision-making (Noshi and Schubert, 2018; Lawal et al., 2024; Nabavi et al., 2025). By integrating machine learning into the prediction of mud loss, it becomes possible to develop adaptive models that respond dynamically to the many variables that influence drilling operations. This paradigm shift represents a significant opportunity to advance understanding of mud loss phenomena and improve drilling operations’ safety and efficiency.

While various studies have explored the application of machine learning in drilling operations, a comprehensive and comparative analysis utilizing advanced ensemble techniques on a large, real-world mud loss dataset from an active oil field remains underexplored. Previous efforts have often relied on smaller, synthetic, or generalized datasets, or focused on a limited range of less sophisticated models (Abdollahfard et al., 2025; Chen et al., 2025; Egbuna et al., 2025). Key novelty of this research lies in its systematic investigation and comparative evaluation of multiple state-of-the-art ensemble machine learning algorithms (Random Forest, Adaptive Boosting, Decision Trees, and Ensemble Learning) using an extensive empirical dataset of 2,820 observations directly from a Middle Eastern oil field. This study offers one of the most robust and data-driven assessments of mud loss prediction to date, providing practical insights into the complex interplay of drilling parameters and demonstrating a predictive accuracy that significantly surpasses conventional empirical or less sophisticated modeling approaches. This work aims to bridge the gap between theoretical ML applications and real-world operational challenges by delivering a highly reliable and actionable predictive tool for mud loss management (Jafarizadeh et al., 2023; Sabah et al., 2021).

Current study aligns with existing efforts to leverage machine learning for lost circulation management, particularly those utilizing field data from major oil fields. For instance (Al-Hameedi et al., 2019), employed statistical models for lost circulation volume prediction in the Dammam formation, demonstrating improved accuracy over older models and providing empirical equations for operational optimization. Similarly, (Okai et al., 2024) explored deep learning algorithms (LSTM, GRU, CNN) for classifying lost circulation severity in large oilfield datasets. Building upon these foundations, this work distinguishes itself by primarily focusing on the precise quantification of mud loss volume through a rigorous comparative analysis of advanced ensemble machine learning techniques (Random Forest, AdaBoost, Decision Trees, and a custom Ensemble Learning framework).

The superior performance of AdaBoost model (test R2 of 0.828) for this specific regression task, coupled with a detailed sensitivity analysis providing quantifiable operational insights into parameters like mud viscosity and solid content, offers a distinct and highly actionable contribution beyond general prediction or classification. This approach provides a robust, interpretable, and directly applicable tool for enhancing real-time drilling fluid management and significantly mitigating the economic and environmental impacts of lost circulation.

Conventional models for predicting mud loss are limited by simplified assumptions, linear correlations, and site-specific heuristics, which hinder their accuracy and adaptability in complex drilling environments. They often fail to generalize across diverse geological conditions and are further weakened by reliance on small or synthetic datasets. To address this gap, the present study develops advanced predictive models using ensemble machine learning techniques including Random Forest, AdaBoost, Decision Trees, and a custom ensemble framework applied to a large, real-world dataset from a Middle Eastern oil field.

Critical input parameters such as hole size, differential pressure, mud viscosity, and solid content are systematically analyzed, with outlier detection via the leverage method ensuring data integrity. Model robustness is reinforced through k-fold cross-validation, while sensitivity analyses and multiple performance metrics provide deeper insights into parameter significance and predictive reliability. By combining methodological rigor with practical field data, this research offers a more accurate and generalizable framework for mud loss prediction, thereby enhancing decision-making, operational efficiency, and risk mitigation in drilling practices.

The remainder of this manuscript is organized as follows. Section 2 provides the theoretical background of machine learning methods, including ensemble learning, AdaBoost, Decision Trees, and Random Forest. Section 3 describes the dataset, preprocessing procedures, and model development framework. Section 4 presents the results of model evaluation, sensitivity analyses, and interpretability assessments. Finally, Section 5 concludes the study by summarizing the key contributions and highlighting its practical relevance for drilling operations.

2 Background of machine learning

2.1 Ensemble learning

As the flow chart presented in Figure 1, it is a powerful technique paradigm that can combine various approaches, known as base learners, to construct more powerful predicting models. The primary advantage of ensemble methods is their ability to enhance overall performance by leveraging the strengths of various algorithms, thereby improving accuracy, stability, and resilience against over-fitting. By integrating the outcomes of several approaches, ensemble learning capitalizes on the concept that the errors made by individual models are less likely to correlate, leading to more reliable high-level predictions. This approach has gained significant traction in various applications, including classification, regression, and anomaly detection (Zhang Y. et al., 2022; Yaghoubi et al., 2024; Mamun et al., 2022).

Figure 1
Flowchart depicting machine learning model ensemble workflow. Training data is used by Random Forest, AdaBoost, and Decision Tree algorithms, creating Predictions 1, 3, and 5. These predictions are combined using methods like Voting (majority/weighted), Averaging, and Stacking Meta-Learner, leading to the final prediction.

Figure 1. Flowchart of the ensemble learning machine learning method.

The primary benefits of ensemble learning are its capability to improve the accuracy and robustness of techniques, reduce overfitting, and enhance predictive performance in complex datasets. Ensembles can better generalize than individual models by aggregating predictions from multiple models. However, the challenges associated with ensemble methods include increased complexity in model interpretation, higher computational costs during training and prediction phases, and the necessity for careful selection and tuning of base learners to avoid overfitting in specific contexts. Despite these challenges, ensemble learning remains a prominent approach in machine learning, particularly in competitive settings where maximizing predictive power is paramount. Applications span various domains, including finance, healthcare, and artificial intelligence, where ensemble methods often yield superior results compared to single-model approaches (Yang et al., 2023; Mhawi et al., 2022; Tan et al., 2022).

2.2 AdaBoost

AdaBoost, a shortened form of Adaptive Boosting, is a pioneering technique that enhances the classification performance of weak classifiers by combining their predictions into a robust model (see Figure 2). Presented by Freund and Schapire (1996), AdaBoost transforms weak learner collections into simple models like decision stumps (trees with a single split) into a strong predictive model. The fundamental idea behind AdaBoost is to focus on the mistakes made by previous classifiers by adjusting the weights of incorrectly classified instances during training. This iterative process allows the model to improve its accuracy progressively and is particularly effective at reducing bias and variance.

Figure 2
Flowchart illustrating a machine learning process starting with

Figure 2. A schematic overview of AdaBoost.

The AdaBoost algorithm operates sequentially, wherein it adjusts the weights of training instances after each weak learner is trained. The approach starts by putting equal weight on each instance in the training dataset. During each iteration t, a weak classifier ht (x) is trained using the weighted dataset (Ren et al., 2022; Amirruddin et al., 2022). The weak classifier’s performance is measured by its weighted error rate, as shown in Equation 1.

εt=i=1Nωi.1yihtxii=1Nωi(1)

Where ωi represents i-th instance weight, yi is the true label, 1 (.) is an indication function that renders one in a true condition, and zero in not true. Weak classifiers are used for weighted data train, αt is calculated through Equation 2.

αt=12log1εtεt(2)

Equation 2 expresses the importance of the weak learner; better-performing classifiers receive higher weights. Finally, the AdaBoost ensemble model’s predictions are made using the weight vote of the weak classifier. The final output H(x) of the AdaBoost model is given by Equation 3.

Hx=signt=1Tαthtx(3)

In Equation 3 T is the number of weak classifiers, and sign (.) denotes the sign function, which outputs the class label. The framework can be adapted for regression tasks, and the predictions would be averaged instead of voting. The adaptive nature of AdaBoost, where the model emphasizes instances that previous learners struggled with, coupled with its capability to match numerous weak classifiers into a strong ensemble, makes it a powerful method for improving predictive accuracy across various usages, such as text classification, and bioinformatics. Nevertheless, the AdaBoost approach is weak due to inaccurate data and outlier sets, which can adversely affect model performance. Implementations often incorporate safeguards against this sensitivity, such as modifying the learning rate or employing robust base learners (Mohebbanaaz et al., 2022; Tyralis and Papacharalampous, 2021). The schematic diagram of the AdaBoost algorithm is illustrated in Figure 2.

2.3 Decision tree

Decision Trees, shown in Figure 3, are a famous machine-learning approach implemented in classifications and regressions. The primary objective of a decision tree is to split the datasets into subsets, including instances sharing similar values of the target variable. This hierarchical structure mimics human decision-making, making it easy to understand and interpret. For all internal tree nodes, a decision is made according to the particular value, leading to the creation of child nodes that further partition the dataset based on additional features. The approach reaches a stop criterion like reaching a maximum depth or a minimum sample number in a leaf node (Navada et al., 2011; Elhazmi et al., 2022).

Figure 3
Flowchart of a decision tree structure with green leaf nodes and blue decision nodes. A dashed box labeled

Figure 3. Decision tree algorithm schematic.

The tree-building process begins with the entire dataset at the root node, which is subsequently split based on the feature that results in the highest gain in purity (the reduction in impurity after the split). This is done by evaluating the chosen criteria (Gini impurity, Entropy) across all possible splits for each feature. The feature that yields the highest information gain or reduction in impurity is selected for the split. Once a feature is chosen, the dataset is partitioned into subsets according to the value of that feature, and the process is recursively repeated for all subsets. The recursion continues until a stopping criterion is reached, such as a maximum tree depth or when further splits do not significantly reduce impurity.

When a Decision Tree is made, predictions will be made by traversing the tree structure from the principal node to a leaf node as the decision rules based on the feature values of each instance. For classification tasks, the class label assigned to the instance typically represents the majority class of the samples in the leaf node, while for regression tasks, the predicted value is the average of the outputs of the instances in that leaf. The interpretability of Decision Trees is one of their significant advantages; the hierarchical structure allows for straightforward visualization and understanding of how decisions are made. Decision Trees can easily overfit, especially when they’re deep. To prevent this and improve performance, consider pruning them, limiting their depth, or using ensemble methods like Random Forest (Zulfiqar et al., 2021; Bansal et al., 2022).

2.4 Random forest

The Random Forest manner, depicted in Figure 4, builds multiple decision trees and then aggregates their outcomes. To build each tree, it uses a bootstrapped sample of the training data (sampling with replacement) and considers only a random subset of features for each split. This inherent randomness is key; it makes the trees diverse and less prone to overfitting, ultimately leading to a more robust model that performs well on new data (Sarica et al., 2017; Biau and Scornet, 2016; Rigatti, 2017).

Figure 4
Diagram illustrating a random forest model. Input data is fed into multiple decision trees: first tree, second tree, and Nth tree. Each tree generates predictions. These predictions are averaged, and the process is repeated to produce the final result.

Figure 4. Random forest algorithm schematic.

For each decision tree within this method, the splitting criterion typically used is either the Gini impurity of classifications plus mean squared error (MSE) for regressions. The Gini impurity G(p) is defined as Equation 4.

Gp=1i=1CPi2(4)

Where Pi denotes the instance proportion regarding class i in the node. Once all DT are fabricated, the last estimate of this approach is attained through an aggregation of the individual tree predictions. In classifications, the mode is computed as Equation 5.

y^=modey1,y2,...,yT(5)

Equation 6 details how to calculate the final prediction for regression tasks in a Random Forest: it's the average of all individual tree predictions (yt​), where T represents the total number of trees in the forest.

y^=1Ti=1Tyt(6)

This ensemble approach utilizes the “wisdom of crowds” principle, leading to higher accuracy and reduced variance compared to individual decision trees. Moreover, Random Forest is effective in its predictive abilities and provides visions into feature importance, allowing experts to discern which variables most influence the outcomes, thus enhancing interpretability of various applications (Mohapatra et al., 2020; Feng et al., 2020; Ao et al., 2019).

3 Data overview and model performance

3.1 Data overview

The data used to develop and evaluate these predictive models came from empirical mud loss volume data collected during drilling operations in a Middle Eastern field. Table 1 provides a comprehensive summary of the statistical characteristics of the input parameters, which encompass hole size, mud viscosity, differential pressure between the wellbore and surrounding formations, and the solid content of the drilling mud. Additionally, the output variable, defined as mud loss volume, is included in this analysis. The statistical metrics presented in the table consist of key descriptors.

Table 1
www.frontiersin.org

Table 1. Descriptive statistics of the input features and output response.

Notably, the dataset for developing the data-driven machine learning model comprises 2,820 observations. To ensure a robust train and check, 90% of dataset were allocated for the training and validation. This allocation was implemented using k-fold cross-validation, specifically with 5 folds, to enhance the model’s reliability and mitigate overfitting. The remaining 10% of the dataset, namely the testing phase, was reserved for assessing the efficacy and predictive power of the developed models, enabling an accurate evaluation of their performance in real-world scenarios. This methodological framework underscores the rigor and systematic approach employed, thereby contributing to the overall robustness and validity of the research findings. Figure 5 illustrates the overall flowchart on methodology of current research.

Figure 5
Flowchart depicting a machine learning process for predicting mud loss volume. It starts with data pre-processing of input parameters: DP, mud viscosity, hole size, and solid content. This leads to ensemble learning using adaptive boosting, random forest, and decision tree algorithms. The model selection results in the output parameter: mud loss volume.

Figure 5. Overall flowchart on methodology of current research.

To ensure machine learning algorithms are both effective and generalizable, K-fold cross-validation was used. This approach meticulously divides the dataset into ‘K’ segments, or folds. Each of these folds is used as a validation set exactly once, with the other ‘K-1′ folds forming the training set. Running this training and validation loop ‘K’ times, and then averaging the outcomes, helps minimize the inherent bias from arbitrary data splits, leading to a more dependable evaluation of the model’s performance (Wong and Yeh, 2019; Fushiki, 2011).

K-fold cross-validation is particularly useful for preventing overfitting, as it allows us to thoroughly evaluate a model’s predictive performance on different parts of the dataset. Figure 6 provides a visual overview of this robust process. For this study, a 5-fold cross-validation approach was applied to each algorithm in its training. This methodology selection ensures a more reliable assessment of model performance and promotes the design of more robust models.

Figure 6
Flowchart illustrating the partitioning of input data into training and testing sets. It features multiple folds, labeled as Fold 1 through Fold N, for hyperparameter tuning. Final evaluation is conducted using test data.

Figure 6. K-Fold algorithm diagram, highlighting the process of data partitioning, training, and validation across multiple folds.

To ensure that overfitting did not compromise the reliability of the developed models, several safeguards were implemented during the training and evaluation process. First, a 5-fold cross-validation strategy was applied to the training dataset, allowing each subset of data to serve as both training and validation in rotation, thereby reducing bias from arbitrary splits. Second, an independent test set comprising 10% of the data was reserved exclusively for final evaluation, ensuring that model performance was assessed on unseen data. Third, hyperparameter tuning was conducted with careful monitoring of validation metrics to avoid overly complex configurations that could memorize noise rather than capture generalizable patterns. Finally, sensitivity analyses and SHAP-based interpretability checks confirmed that the models were learning meaningful relationships rather than spurious correlations (Asteris et al., 2021; Armaghani and Asteris, 2021).

3.1.1 Dataset description and representativeness

The empirical foundation of this study is a comprehensive dataset comprising 2,820 individual data points compiled from historical drilling operations (2010–2023) in the Majnoon oil field, located in the southeastern region of Iraq. This dataset provides a rich and diverse source of real-world information, critical for developing robust predictive models. The following sections detail the context, characteristics, and pre-processing steps applied to ensure data quality and model reliability.

To properly evaluate the generalizability of the machine learning models, it is essential to understand the environment from which the data were sourced. The key contextual parameters are as follows.

1. Geological Setting: The wells captured in this dataset were drilled through predominantly sandstone formations. The depth of the logged intervals ranges from approximately 2,150 m–4,210 m, covering various pressure and temperature regimes inherent to these depths.

2. Drilling Fluid System: All operations utilized a Water-Based Mud (WBM) system. The rheological properties of this fluid system were characterized using the Bingham Plastic model, a standard approach in the drilling industry for describing the relationship between shear stress and shear rate.

3. Operational Conditions: Drilling was conducted under overbalanced drilling (OBD) conditions, where the hydrostatic pressure of the drilling mud column intentionally exceeds the formation pore pressure. The dataset includes operations performed with both tricone (roller-cone) bits and Polycrystalline Diamond Compact (PDC) bits. The mud circulation rate, a key operational parameter, varied significantly across the dataset, with values ranging from 100 to 720 gallons per minute (gpm).

Prior to model development, the raw dataset underwent rigorous pre-processing and cleaning to resolve inconsistencies and noise, ensuring the fidelity of the data used for training. The leverage statistical method was applied to identify potential high-leverage points, which represent observations with extreme feature values that can influence model behavior. Although hat-values were computed, none of these high-leverage observations were removed. This choice preserved the full variability of the dataset, prevented unnecessary narrowing of the feature space, and maintained the model’s ability to generalize to real operational conditions where extreme but valid cases commonly occur. In addition, no data point containing missing values was used. Only complete and fully observed samples were retained to avoid bias introduced by imputation and to ensure that model training relied solely on reliable and directly measured information.

Furthermore, mud loss is not a monolithic phenomenon. For this study, a critical distinction was made between different types of loss. The models were specifically developed to predict seepage and partial losses, which typically occur through porous formations or small fractures and can be managed by adjusting drilling parameters and mud properties. Instances of catastrophic or total loss, characterized by a sudden and complete loss of circulation into large natural or induced fractures, were identified and excluded from the dataset. This exclusion is justified because catastrophic events represent a different physical mechanism that often requires immediate and drastic interventions, rather than the fine-tuning of operational parameters that this predictive model is designed to support. This focused approach ensures that the model is trained on a consistent problem domain, enhancing its practical utility for routine drilling operations.

3.2 Models’ evaluation

To determine the predictive capabilities of the constructed models, a range of performance metrics is systematically calculated for each model, using Equations 710 (Madani et al., 2021; Madani et al., 2017; Asteris et al., 2025; Sarir et al., 2021):

RE%=opredoexpoexp×100(7)
AARE%=100Ni=1Noipredoiexpoiexp(8)
MSE=i=1Noipredoiexp2N(9)
R2=1i=1Noipredoiexp2i=1Noiexpo¯2(10)

In this framework, the subsequent performance metrics are employed to evaluate model precision: MSE, RE%, AARE%, and R2.

nnorm=nnminnmaxnmin(11)

In the Equation 11, n denotes the current data point, max is the highest value in the dataset, min is the lowest value, and nnorm is the resulting normalized data value.

3.3 Sensitivity analysis

This part introduces a sensitivity investigation by Pearson coefficient to evaluate how inputs affect the mud loss volume during the well construction phase. In summary, an input variable’s importance is established by its value’s magnitude; The absolute value of this element reflects its importance. A positive value suggests a direct relationship where both the input and target variables increase together. In contrast, a negative value indicates an inverse relation, meaning the target declines as the input raises. This factor is then identified. (Madani et al., 2021; Bemani et al., 2023):

rj=i=1nIj,iI¯jZiZ¯i=1nIj,iI¯j2i=1nZiZ¯2(12)

In Equation 12, Ij__ denotes the average amount of the variable Ij, while Z and Z__ represent the response variable and its average. Figure 7 depicts the relative implication of various factors on the mud loss volume, containing hole size, mud viscosity, differential pressure between the wellbore and formation, and mud solid content. The results indicate that mud viscosity exerts the most pronounced effect on the mud loss volume, characterized by a correlation coefficient (R-value) of −0.24, which denotes an inverse relationship with the output parameter. In contrast, the impact of hole size is minimal, as evidenced by an R-value of 0.011. Furthermore, the analysis reveals that hole size and differential pressure parameters positively influence mud loss volume. In contrast, mud viscosity and solid content are associated with a negative impact on the magnitude of this output parameter.

Figure 7
Bar chart showing four data points: MVIS at -0.240, Hole size at 0.011, Retort solid at -0.111, and Delta P at 0.066. Negative and positive values are represented with orange bars outlined in green.

Figure 7. Computed relevancy indices respective to mud viscosity, hole size, retort solid, and delta P.

In this study, the identified inverse relationships between mud viscosity/solid content and mud loss volume provide crucial insights for proactive drilling fluid management. Specifically, the negative correlation of mud viscosity (R-value of −0.24) and solid content with mud loss suggests that these parameters are key levers for mitigation. Higher mud viscosity enhances the formation of a robust filter cake, which can effectively seal permeable formations and micro-fractures, thereby reducing fluid invasion. Similarly, an optimized concentration of fine, inert solids within the drilling fluid contributes to a low-permeability filter cake that minimizes fluid loss into the surrounding rock. These findings underscore the importance of precise control over drilling fluid properties as a primary strategy to prevent and manage lost circulation.

Translating these insights into practical field applications, drilling engineers can leverage the model’s predictions and the sensitivity analysis findings to make informed, real-time adjustments. When indicators of potential mud loss emerge, a strategic increase in mud viscosity, achieved through the addition of appropriate viscosifiers, should be considered to reinforce wellbore stability and reduce fluid invasion. Concurrently, rigorous management of solids control equipment is essential to maintain the optimal type and distribution of solids that contribute to a strong filter cake, without compromising other mud properties. This proactive, data-driven approach, guided by the model, empowers operators to minimize the economic and operational impact of lost circulation, enhancing drilling efficiency and safety.

To enhance interpretability, the SHAP framework was utilized as a game-theory–based method that assigns each feature a measurable impact on predictions. By assessing its influence across all feature combinations, SHAP provides a consistent, mathematically sound explanation of model behavior, clarifying how individual variables shape the output.

Figure 8 highlights that hole size emerges as the dominant parameter governing mud loss volume, exerting a stronger influence than any other input variable considered in the analysis. This observation is further substantiated by the SHAP feature attribution plot in Figure 9, which provides a detailed breakdown of how individual features contribute to the model’s predictions. The visualization employs a color gradient to encode feature magnitude, where red indicates higher values and blue denotes lower values. Notably, the distribution of red points on the negative side of the hole size axis demonstrates that larger hole sizes are consistently associated with reduced mud loss predictions. This pattern underscores the inverse relationship between hole size and mud loss volume, offering a mechanistic interpretation of the model’s behavior. In contrast, features with less pronounced SHAP contributions exhibit weaker or more scattered distributions, reinforcing the central role of hole size in shaping the predictive outcome.

Figure 8
Bar chart showing SHAP feature importance for model output.

Figure 8. Feature importance order determined through SHAP.

Figure 9
SHAP summary plot showing feature contributions to a model's output. Five features are listed: Hole size, Retort solid percentage, Delta P (psi), MW (pcf), MFVIS, and Formation type. The horizontal axis represents SHAP values, indicating the impact of each feature. Dots are colored by feature value, with a gradient from blue (low) to red (high).

Figure 9. SHAP feature contributions graph.

3.4 Outlier identification

Leveraging technique is an analytical approach implemented to identify anomalous datapoints via assessing the St.D of residual values in conjunction with H. H is computed using a specific formula, as detailed below. This approach allows for the quantification of the influence that individual observations exert on the fitted model, facilitating the detection of outliers (Madani et al., 2021):

H=XXTX1XT(13)

To derive the hat amounts for the data and assess H, it is essential to calculate the entries of H using Equation 13. The matrix is constructed by X that has dimensions n (representing input parameters) by m (representing dataset), in conjunction with XT. Formulas for determining leveraging values is provided in the Equation 14. This calculation is crucial for identifying the influence of individual remarks on the constructed model (Bemani et al., 2023):

H*=3n+1/m(14)

After calculating the leverage values (H*), questionable datapoint is identified via Williams plot. As depicted via Figure 10, the suspected and leveraged threshold are employed to an acceptable section, indicated by the green. Most data entries fall in the range, while less than 1% of the datapoints marked as red. This research includes whole initial dataset for developing robust predictive models, enhancing generalization.

Figure 10
Scatter plot showing standardized residual versus hat value. Yellow dots represent valid data; red circles indicate suspected data. Vertical lines denote suspected and leverage limits. Data clusters densely around lower hat values.

Figure 10. Hat matrix for outlier detection.

3.5 Computational considerations

While the implementation of robust techniques such as k-fold cross-validation, outlier detection, and ensemble learning methods significantly enhanced the predictive accuracy and reliability of the models, it is important to acknowledge their associated computational costs.

The most substantial contributors to computational overhead in this study were.

• K-Fold Cross-Validation: Performing k separate training and validation runs for each model, rather than a single train-test split, inherently increases processing time by a factor of k. For example, a 10-fold cross-validation requires 10 times more training iterations.

• Ensemble Learning Algorithms: Both Random Forest and AdaBoost, by design, involve the training of multiple individual decision trees. While Random Forest benefits from parallelization, AdaBoost’s sequential nature means that the training of each subsequent weak learner depends on the previous one, which can be computationally intensive, especially with a large number of estimators.

• Hyperparameter Tuning: The process of systematically searching for optimal model hyperparameters (e.g., using Grid Search) involved evaluating numerous model configurations, each requiring a full training and cross-validation cycle. This step was particularly resource intensive.

Despite these computational demands, the trade-off was deemed acceptable and necessary. The enhanced model robustness, reduced overfitting, and more reliable performance estimates obtained through these methods are critical for a high-stakes application like mud loss prediction in drilling operations, where inaccurate forecasts can lead to significant economic losses and operational inefficiencies. The computations were performed with an Intel Core i7 processor and 16 GB RAM and the total execution time for model training and evaluation was manageable within the scope of this research.

4 Results and discussion

4.1 Hyperparameter tuning

Hyperparameter tuning for the various algorithms was performed using the training and validation sets. Specifically, the Decision Tree model’s max-depth parameter was optimized to 18, a detail visible in Figure 11. As shown in Figure 12, the optimized value of the number of estimators hyperparameters in the AdaBoost machine model is 20, while the tuned value of max depth in the random forest model, as demonstrated in Figure 13, is set to 20. Notice that the Ensemble Learning model comprises the decision tree base estimators, random forest, and adaptive boosting, each with its optimum hyperparameters.

Figure 11
Line graph showing Mean Square Error (MSE) versus maximum depth hyperparameter values. The blue line represents training error, decreasing sharply before plateauing. The orange line shows validation error, remaining relatively stable.

Figure 11. The max depth hyperparameter tuned amount in the decision tree model.

Figure 12
A line graph shows Mean Square Error (MSE) against the number of estimators for training and validation sets. The training error (blue) decreases sharply and stabilizes near zero with more estimators. The validation error (orange) decreases initially but stabilizes around 3,000.

Figure 12. Finding the tuned value of the estimators in the AdaBoost.

Figure 13
Line graph showing mean square error (MSE) against max depth hyperparameter value. The blue line with dots represents training data, decreasing rapidly initially and plateauing around 2000. The orange line with crosses represents validation data, decreasing less steeply and stabilizing around 3000. A legend indicates the lines for training and validation.

Figure 13. Determination of optimal amount of hyperparameter of random forest (max depth).

4.2 Evaluation of the established models

Table 2 offerings the results of the evaluation metrics concerning train, test, and total data points for the prediction utilizing the established models. Three quantitative metrics, (R2), mean MSE, and AARE%, were employed to ascertain the performance of the developed soft computing methodologies. Based on these metrics, the AdaBoost model emerged as the most effective estimator for predicting mud loss volume, achieving R2, MSE, and AARE% values of 0.987, 118.4, and 12.4, respectively. The evaluation metrics are visually represented in Figure 14 for the testing phase, facilitating a clear and comprehensive comparison among the various models developed in this study.

Table 2
www.frontiersin.org

Table 2. Evaluation metrics concerning three phases for all the models.

Figure 14
Three bar charts compare machine learning models: (A) R-squared values show Ensemble Learning highest and Decision Tree lowest. (B) Average Absolute Relative Error (AARE%) with Decision Tree highest and Adaboost lowest. (C) Mean Squared Error (MSE) where Decision Tree scores highest and Adaboost lowest.

Figure 14. MSE, R-squared, and AARE% in test phase for all the models: (A) R-squared. (B) AARE% and (C) MSE.

Two visualization techniques were employed to evaluate the efficacy of the developed algorithms: relative errors and crossplots. Figure 15 visually compare the observed and predicted mud loss volumes for each algorithm employed in this study. Notably, the AdaBoost exhibits a tight clustering of points proximal to the y = x line, indicating a robust correlation among the actual and predicted amounts. The linear regression lines derived from these data points closely align with the ideal y = x line, suggesting that the AdaBoost model accurately predicts the mud loss volume. The scatter plots in Figure 15 further demonstrate the precision of the AdaBoost model, with the relative error distribution closely aligned with the x-axis. These visualization manners establish a strong correlation among the actual mud loss data and the results attained from the AdaBoost, underscoring its accuracy and reliability.

Figure 15
Scatterplots showcase actual versus estimated values and relative error for four models: AdaBoost, Decision Tree, Ensemble Learning, and Random Forest. Plots A-D depict estimated values against actual values with fitted lines for training and test data. Plots E-H illustrate percentage of relative error across actual values. Blue and green dots represent training and test data, respectively, highlighting model performance differences.

Figure 15. Comprehensive performance evaluation of the developed machine learning models comparing actual versus predicted mud loss volumes and relative error distribution for training and testing datasets. (A) AdaBoost. (B) Decision Tree. (C) Ensemble Learning. (D) Random Forest cross-plots and (E) Decision Tree. (F) AdaBoost; (G) Ensemble Learning; and (H) Random Forest relative error distributions.

4.3 Advantages

The proposed framework for mud loss prediction offers several key advantages.

• Higher Accuracy: The Adaptive Boosting (AdaBoost) model demonstrated significantly higher predictive accuracy, achieving a test R2 of 0.828 on the test dataset. This performance notably surpasses that of other evaluated models, providing a more reliable tool for real-world applications.

• Operational Insights: The sensitivity analysis provided crucial operational insights by quantitatively identifying the most influential parameters affecting mud loss. Specifically, mud viscosity was confirmed as the most impactful inversely correlated parameter (Pearson R = −0.24), offering direct guidance for optimizing drilling fluid properties.

• Robustness: The rigorous methodology, including the application of the leverage technique for outlier detection and robust 5-fold cross-validation, significantly enhances the model’s reliability and generalizability. These measures effectively mitigate the risks of data integrity issues and overfitting, ensuring the model’s applicability across varied operational scenarios.

4.4 Limitations

While the present study demonstrates the strong predictive capability of ensemble machine learning models for mud loss volume, several limitations must be acknowledged to contextualize the findings and guide future research. The dataset employed in this study was derived exclusively from a Middle Eastern oil field. Although the dataset is diverse within its local operational context, encompassing a wide range of drilling practices and fluid compositions, its geographic and geological specificity may constrain the generalizability of the models. Subsurface formations in other regions may exhibit distinct lithological, geomechanical, and fluid interaction characteristics that could influence mud loss dynamics differently.

To strengthen confidence in the broader applicability of the developed models, external validation using datasets from other oil fields and geological settings is essential. Such validation would confirm whether the predictive relationships identified here hold across diverse drilling environments and operational conditions. A promising avenue for extending the utility of this work lies in transfer learning. Pre-trained ensemble models developed on the current dataset could be fine-tuned with smaller, region-specific datasets from other drilling environments. This approach would reduce the data requirements for new sites while leveraging the predictive power of the existing models, thereby facilitating rapid adaptation to local geological contexts.

Finally, while the models provide actionable insights into mud loss prediction, their integration into real-time drilling operations requires further testing. Future work should explore coupling these predictive frameworks with live drilling data streams and decision-support systems to evaluate their performance under dynamic field conditions. In summary, although the present study offers a robust and data-driven framework for mud loss prediction, its geographic specificity necessitates cautious interpretation. Expanding validation efforts and exploring transfer learning strategies will be critical to ensuring that the models achieve practical utility across diverse drilling environments worldwide.

4.5 Recommendations and future work

Future research could explore the integration of real-time drilling parameters, evaluate additional advanced deep learning architectures, and validate the models across a wider range of geological settings and drilling conditions. Future work will explore the integration of additional geological parameters, such as formation permeability, rock mechanical properties, and more granular pore pressure data, pending their availability and consistent measurement across diverse datasets. This would allow for a more comprehensive understanding of the interplay between operational and geological factors influencing mud loss.

Furthermore, future work could significantly enhance the model’s predictive power by integrating a more profound understanding of the geomechanical environment and leveraging advanced sensing technologies from related domains. For instance, detailed characterization of pre-existing faults and fractures, which are primary conduits for mud loss, is essential for proactive risk mitigation in challenging reservoirs (Talib et al., 2023). This geological insight can be complemented by a deeper analysis of rock damage mechanics, including the fractal analysis of limestone damage under tool impact (Zou et al., 2025a) and modeling the transient conditions of rock breaking and associated cutter wear (Zou et al., 2024; Zou et al., 2025b). Moreover, obtaining accurate formation properties as model inputs is critical; this can be improved using advanced methods like Bayesian data fusion to determine the properties of materials like clay (Zheng et al., 2025) and by studying the dynamic shakedown behavior of granular materials within the formation (Wang et al., 2023). To enable real-time prediction, the model could be fed live data from advanced downhole sensing technologies such as electromagnetic tomography for multiphase flow monitoring (Ge et al., 2025). The validity of applying such data-driven and theory-guided neural network approaches is supported by their successful use in analogous geomechanical problems, including the intelligent diagnosis of pipeline blockages (Di et al., 2025) and predicting ground settlement from construction (Li et al., 2025). Ultimately, integrating these elements can contribute to a more holistic modeling framework that captures diverse physical phenomena across a well’s lifecycle, such as thermal-induced growth (Yu et al., 2024).

5 Conclusion

This study developed and evaluated advanced ensemble machine learning models such as Random Forest, AdaBoost, Decision Trees, and a custom stacking-based ensemble to predict mud loss volume in drilling operations.

• A large, real-world dataset comprising 2,820 observations from the Majnoon oil field was used, ensuring empirical relevance and robustness.

• AdaBoost emerged as the most accurate model, achieving a test R2 of 0.828, outperforming other ensemble techniques.

• Sensitivity analysis revealed that mud viscosity and solid content inversely affect mud loss, while hole size and differential pressure positively contribute to it.

• The use of k-fold cross-validation and leverage-based outlier detection ensured model generalizability and data integrity.

• The study demonstrated that ensemble ML models significantly outperform traditional empirical approaches in predicting mud loss, offering a reliable and interpretable tool for operational decision-making.

Data availability statement

Data is available on request from the corresponding author.

Author contributions

LA: Investigation, Project administration, Resources, Writing – original draft. KY: Project administration, Resources, Visualization, Writing – review and editing. YA: Formal Analysis, Project administration, Resources, Visualization, Writing – review and editing. DD: Conceptualization, Data curation, Methodology, Writing – review and editing. HA: Conceptualization, Investigation, Writing – original draft.

Funding

The author(s) declared that financial support was not received for this work and/or its publication.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Abdollahfard, Y., Mirabbasi, S. M., Ahmadi, M., Hemmati-Sarapardeh, A., and Ashoorian, S. (2025). Formation permeability estimation using mud loss data by deep learning. Sci. Rep. 15 (1), 15251. doi:10.1038/s41598-025-94617-7

PubMed Abstract | CrossRef Full Text | Google Scholar

Agwu, O. E., Akpabio, J. U., Ekpenyong, M. E., Inyang, U. G., Asuquo, D. E., Eyoh, I. J., et al. (2021). A critical review of drilling mud rheological models. J. Petroleum Sci. Eng. 203, 108659. doi:10.1016/j.petrol.2021.108659

CrossRef Full Text | Google Scholar

Al-Hameedi, A. T. T., Alkinani, H. H., Dunn-Norman, S., Flori, R. E., Hilgedick, S. A., Amer, A. S., et al. (2019). Mud loss estimation using machine learning approach. J. Petroleum Explor. Prod. Technol. 9 (2), 1339–1354. doi:10.1007/s13202-018-0581-x

CrossRef Full Text | Google Scholar

Amirruddin, A. D., Muharam, F. M., Ismail, M. H., Tan, N. P., and Ismail, M. F. (2022). Synthetic minority Over-sampling TEchnique (SMOTE) and logistic model tree (LMT)-adaptive boosting algorithms for classifying imbalanced datasets of nutrient and chlorophyll sufficiency levels of oil palm (elaeis guineensis) using spectroradiometers and unmanned aerial vehicles. Comput. Electron. Agric. 193, 106646. doi:10.1016/j.compag.2021.106646

CrossRef Full Text | Google Scholar

Ao, Y., Li, H., Zhu, L., Ali, S., and Yang, Z. (2019). The linear random forest algorithm and its advantages in machine learning assisted logging regression modeling. J. Petroleum Sci. Eng. 174, 776–789. doi:10.1016/j.petrol.2018.11.067

CrossRef Full Text | Google Scholar

Armaghani, D. J., and Asteris, P. G. (2021). A comparative study of ANN and ANFIS models for the prediction of cement-based mortar materials compressive strength. Neural Comput. Appl. 33 (9), 4501–4532. doi:10.1007/s00521-020-05244-4

CrossRef Full Text | Google Scholar

Asadimehr, S. (2024). Investigating the use of drilling mud and the reasons for its use. Eurasian J. Chem. Med. Petroleum Res. 3 (2), 543–551.

Google Scholar

Asteris, P. G., Skentou, A. D., Bardhan, A., Samui, P., and Pilakoutas, K. (2021). Predicting concrete compressive strength using hybrid ensembling of surrogate machine learning models. Cem. Concr. Res. 145, 106449. doi:10.1016/j.cemconres.2021.106449

CrossRef Full Text | Google Scholar

Asteris, P. G., Sivenas, T., Gkantou, M., Formisano, A., and Le, T. T. (2025). Estimation of axial load-carrying capacity of elliptical concrete filled steel tubular columns using computational intelligence. J. Build. Eng. 112, 113738. doi:10.1016/j.jobe.2025.113738

CrossRef Full Text | Google Scholar

Bansal, M., Goyal, A., and Choudhary, A. (2022). A comparative analysis of K-Nearest neighbor, genetic, support vector machine, decision tree, and long short term memory algorithms in machine learning. Decis. Anal. J. 3, 100071. doi:10.1016/j.dajour.2022.100071

CrossRef Full Text | Google Scholar

Bassir, S. M., and Madani, M. (2019). A new model for predicting asphaltene precipitation of diluted crude oil by implementing LSSVM-CSA algorithm. Petroleum Sci. Technol. 37 (22), 2252–2259. doi:10.1080/10916466.2019.1632896

CrossRef Full Text | Google Scholar

Bemani, A., Madani, M., and Kazemi, A. (2023). Machine learning-based estimation of nano-lubricants viscosity in different operating conditions. Fuel 352, 129102. doi:10.1016/j.fuel.2023.129102

CrossRef Full Text | Google Scholar

Benzaamia, A., Benzaamia, A., Rbouh, R., Ghrici, A. A., and Asteris, P. G. (2025). Prediction of chloride resistance level in concrete using optimized tree-based machine learning models. Bull. Comput. Intell. 1 (1), 104–117. doi:10.53941/bci.2025.100007

CrossRef Full Text | Google Scholar

Biau, G., and Scornet, E. (2016). A random forest guided tour. TEST 25 (2), 197–227. doi:10.1007/s11749-016-0481-7

CrossRef Full Text | Google Scholar

Brankovic, A., Matteucci, M., Restelli, M., Ferrarini, L., Piroddi, L., Spelta, A., et al. (2021). Data-driven indicators for the detection and prediction of stuck-pipe events in oil&gas drilling operations. Upstream Oil Gas Technol. 7, 100043. doi:10.1016/j.upstre.2021.100043

CrossRef Full Text | Google Scholar

Chen, Y., Sun, T., Yang, J., Chen, X., Ren, L., Wen, Z., et al. (2025). Prediction of mud weight window based on geological sequence matching and a physics-driven machine learning model for pre-drilling. Processes 13 (7), 2255. doi:10.3390/pr13072255

CrossRef Full Text | Google Scholar

Di, D., Bai, Y., Fang, H., Sun, B., Wang, N., and Li, B. (2025). Intelligent siltation diagnosis for drainage pipelines using weak-form analysis and theory-guided neural networks in geo-infrastructure. Automation Constr. 176, 106246. doi:10.1016/j.autcon.2025.106246

CrossRef Full Text | Google Scholar

Egbuna, I. K., Asere, J. B., Ado, U., Ali, A. I., Agboro, H., and Zereuwa, C. (2025). The role of artificial intelligence in minimizing drilling waste and formation damage in the oil and gas industry. IRE Journals 8 (11), 52 – 66.

Google Scholar

Elhazmi, A., Al-Omari, A., Sallam, H., Mufti, H. N., Rabie, A. A., Alshahrani, M., et al. (2022). Machine learning decision tree algorithm role for predicting mortality in critically ill adult COVID-19 patients admitted to the ICU. J. Infect. Public Health 15 (7), 826–834. doi:10.1016/j.jiph.2022.06.008

PubMed Abstract | CrossRef Full Text | Google Scholar

Feng, W., Ma, C., Zhao, G., and Zhang, R. (2020). “FSRF: an improved random forest for classification,” in 2020 IEEE international conference on advances in electrical engineering and computer applications (AEECA). (IEEE), 173–178.

CrossRef Full Text | Google Scholar

Freund, Y., and Schapire, R. E. (1996). “Experiments with a new boosting algorithm,” in Icml (Citeseer).

Google Scholar

Fushiki, T. (2011). Estimation of prediction error by using K-fold cross-validation. Statistics Comput. 21, 137–146. doi:10.1007/s11222-009-9153-8

CrossRef Full Text | Google Scholar

Ge, L., Liu, Z., Liu, S., Xiao, X., Yuan, Y., and Yin, Z. (2025). Electromagnetic tomography for multiphase flow in the downhole annulus. IEEE Trans. Instrum. Meas. 74, 1–13. doi:10.1109/tim.2025.3548206

CrossRef Full Text | Google Scholar

Gowida, A., Ibrahim, A. F., and Elkatatny, S. (2022). A hybrid data-driven solution to facilitate safe mud window prediction. Sci. Rep. 12 (1), 15773. doi:10.1038/s41598-022-20195-7

PubMed Abstract | CrossRef Full Text | Google Scholar

Hasanzadeh, M., and Madani, M. (2024). Deterministic tools to predict gas assisted gravity drainage recovery factor. Energy Geosci. 5 (3), 100267. doi:10.1016/j.engeos.2023.100267

CrossRef Full Text | Google Scholar

Jafarizadeh, F., Larki, B., Kazemi, B., Mehrad, M., Rashidi, S., Ghavidel Neycharan, J., et al. (2023). A new robust predictive model for lost circulation rate using convolutional neural network: a case study from marun oilfield. Petroleum 9 (3), 468–485. doi:10.1016/j.petlm.2022.04.002

CrossRef Full Text | Google Scholar

Ji, H., Pu, D., Yan, W., Zhang, Q., Zuo, M., and Zhang, Y. (2023). Recent advances and application of machine learning in food flavor prediction and regulation. Trends Food Sci. and Technol. 138, 738–751. doi:10.1016/j.tifs.2023.07.012

CrossRef Full Text | Google Scholar

Keshavarz, M., and Moreno, R. B. Z. L. (2023). Qualitative analysis of drilling fluid loss through naturally-fractured reservoirs. SPE Drill. and Complet. 38 (03), 502–518. doi:10.2118/215810-pa

CrossRef Full Text | Google Scholar

Kumar Singh, R., and Priyadarshini Nayak, N. (2024). Complications in drilling operations in basalt for CO2 sequestration: an overview. Mater. Today Proc. 99, 22–29. doi:10.1016/j.matpr.2023.04.441

CrossRef Full Text | Google Scholar

Lavrov, A., and Tronvoll, J. (2021). “Mud loss into a single fracture during drilling of petroleum wells: modelling approach,” in Development and application of discontinuous modelling for rock engineering (FL, United States: CRC Press), 189–198.

CrossRef Full Text | Google Scholar

Lawal, A., Yang, Y., He, H., and Baisa, N. L. (2024). Machine learning in oil and gas exploration: a review. IEEE Access 12, 19035–19058. doi:10.1109/access.2023.3349216

CrossRef Full Text | Google Scholar

Le, T.-T., Skentou, A. D., Mamou, A., and Asteris, P. G. (2022). Correlating the unconfined compressive strength of rock with the compressional wave velocity effective porosity and schmidt hammer rebound number using artificial neural networks. Rock Mech. Rock Eng. 55 (11), 6805–6840. doi:10.1007/s00603-022-02992-8

CrossRef Full Text | Google Scholar

Li, Y., Weng, X., Hu, D., Tan, Z., and Liu, J. (2025). Data-driven method for predicting long-term underground pipeline settlement induced by rectangular pipe jacking tunnel construction. J. Pipeline Syst. Eng. Pract. 16 (3), 04025046. doi:10.1061/jpsea2.pseng-1855

CrossRef Full Text | Google Scholar

Madani, M., and Alipour, M. (2022). Gas-oil gravity drainage mechanism in fractured oil reservoirs: surrogate model development and sensitivity analysis. Comput. Geosci. 26 (5), 1323–1343. doi:10.1007/s10596-022-10161-7

CrossRef Full Text | Google Scholar

Madani, M., Abbasi, P., Baghban, A., Zargar, G., and Abbasi, P. (2017). Modeling of CO2-brine interfacial tension: application to enhanced oil recovery. Petroleum Sci. Technol. 35 (23), 2179–2186. doi:10.1080/10916466.2017.1391844

CrossRef Full Text | Google Scholar

Madani, M., Moraveji, M. K., and Sharifi, M. (2021). Modeling apparent viscosity of waxy crude oils doped with polymeric wax inhibitors. J. Petroleum Sci. Eng. 196, 108076. doi:10.1016/j.petrol.2020.108076

CrossRef Full Text | Google Scholar

Magzoub, M. I., Kiran, R., Salehi, S., Hussein, I. A., and Nasser, M. S. (2021). Assessing the relation between mud components and rheology for loss circulation prevention using polymeric gels: a machine learning approach. Energies 14 (5), 1377. doi:10.3390/en14051377

CrossRef Full Text | Google Scholar

Mahdi, D. S., and Alrazzaq, A. (2024). ANN model for predicting mud loss rate from unconfined compressive strength and drilling data. Pet. Chem. 64 (7), 811–819. doi:10.1134/s0965544124050116

CrossRef Full Text | Google Scholar

Mamun, M., Afia, F., Miraz, A. M., and Salim, A. (2022). “Lung cancer prediction model using ensemble learning techniques and a systematic review analysis,” in 2022 IEEE World AI IoT Congress (AIIoT). (IEEE), 187–193.

Google Scholar

Mhawi, D. N., Aldallal, A., and Hassan, S. (2022). Advanced feature-selection-based hybrid ensemble learning algorithms for network intrusion detection systems. Symmetry 14 (7), 1461. doi:10.3390/sym14071461

CrossRef Full Text | Google Scholar

Mohamed, A., Salehi, S., and Ahmed, R. (2021). Significance and complications of drilling fluid rheology in geothermal drilling: a review. Geothermics 93, 102066. doi:10.1016/j.geothermics.2021.102066

CrossRef Full Text | Google Scholar

Mohapatra, N., Shreya, K., and Chinmay, A. (2020). Optimization of the random forest algorithm. Singapore: Springer Singapore.

Google Scholar

Mohebbanaaz, , Rajani Kumari, L. V., and Sai, Y. P. (2022). Classification of ECG beats using optimized decision tree and adaptive boosted optimized decision tree. Signal, Image Video Process. 16 (3), 695–703. doi:10.1007/s11760-021-02009-x

CrossRef Full Text | Google Scholar

Nabavi, Z., Hosseini, S., Hamidi, J. K., Monjezi, M., Hasheminasab, F., Vardin, A. N., et al. (2025). Prediction of rotary drilling rate of penetration in surface mines using machine learning techniques. Rock Mech. Rock Eng., 1–31. doi:10.1007/s00603-025-05014-5

CrossRef Full Text | Google Scholar

Navada, A., Ansari, A. N., Patil, S., and Sonkamble, B. A. (2011). “Overview of use of decision tree algorithms in machine learning” in 2011 IEEE control and system graduate research colloquium. (IEEE), 37–42.

CrossRef Full Text | Google Scholar

Noshi, C. I., and Schubert, J. J. (2018). The role of machine learning in drilling operations; a review. SPE.

CrossRef Full Text | Google Scholar

Okai, M. I., Ogolo, O., Nzerem, P., and Ibrahim, K. S. (2024). “Application of boosting machine learning for mud loss prediction during drilling operations,” in SPE Nigeria Annual International Conference and Exhibition (SPE), D032S028R003. doi:10.2118/221583-MS

CrossRef Full Text | Google Scholar

Okoro, E. E., Dosunmu, A., and Iyuke, S. E. (2018). Data on cost analysis of drilling mud displacement during drilling operation. Data Brief 19, 535–541. doi:10.1016/j.dib.2018.05.075

PubMed Abstract | CrossRef Full Text | Google Scholar

Orun, C. B., Akpabio, J. U., and Agwu, O. E. (2023). Drilling fluid design for depleted zone drilling: an integrated review of laboratory, field, modelling and cost studies. Geoenergy Sci. Eng. 226, 211706. doi:10.1016/j.geoen.2023.211706

CrossRef Full Text | Google Scholar

Pang, H., Wang, H., Jin, Y., Lu, Y., and Fan, Y. (2022a). “Prediction of mud loss type based on seismic data using machine learning,” in ARMA US Rock Mechanics/Geomechanics Symposium (ARMA), ARMA-2022.

Google Scholar

Pang, H., Meng, H., Wang, H., Fan, Y., Nie, Z., and Jin, Y. (2022b). Lost circulation prediction based on machine learning. J. Petroleum Sci. Eng. 208, 109364. doi:10.1016/j.petrol.2021.109364

CrossRef Full Text | Google Scholar

Pang, H.-W., Wang, H. Q., Xiao, Y. T., Jin, Y., Lu, Y. H., Fan, Y. D., et al. (2024). Machine learning for carbonate formation drilling: mud loss prediction using seismic attributes and mud loss records. Petroleum Sci. 21 (2), 1241–1256. doi:10.1016/j.petsci.2023.10.024

CrossRef Full Text | Google Scholar

Ren, J., Zhao, H., Zhang, L., Zhao, Z., Xu, Y., Cheng, Y., et al. (2022). Design optimization of cement grouting material based on adaptive boosting algorithm and simplicial homology global optimization. J. Build. Eng. 49, 104049. doi:10.1016/j.jobe.2022.104049

CrossRef Full Text | Google Scholar

Rigatti, S. J. (2017). Random forest. J. Insur. Med. 47 (1), 31–39. doi:10.17849/insm-47-01-31-39.1

PubMed Abstract | CrossRef Full Text | Google Scholar

Sabah, M., Mehrad, M., Ashrafi, S. B., Wood, D. A., and Fathi, S. (2021). Hybrid machine learning algorithms to enhance lost-circulation prediction and management in the marun oil field. J. Petroleum Sci. Eng. 198, 108125. doi:10.1016/j.petrol.2020.108125

CrossRef Full Text | Google Scholar

Saihood, T., and Samuel, R. (2022). “Mud loss prediction in realtime through hydromechanical efficiency,” in Adipec.

Google Scholar

Sarica, A., Cerasa, A., and Quattrone, A. (2017). Random forest algorithm for the classification of neuroimaging data in alzheimer's disease: a systematic review. Front. Aging Neurosci. 9, 329. doi:10.3389/fnagi.2017.00329

PubMed Abstract | CrossRef Full Text | Google Scholar

Sarir, P., Chen, J., Asteris, P. G., Armaghani, D. J., and Tahir, M. M. (2021). Developing GEP tree-based, neuro-swarm, and whale optimization models for evaluation of bearing capacity of concrete-filled steel tube columns. Eng. Comput. 37 (1), 1–19. doi:10.1007/s00366-019-00808-y

CrossRef Full Text | Google Scholar

Shad, S., Salmanpour, S., Zamani, H., and Zivar, D. (2021). Dynamic analysis of mud loss during overbalanced drilling operation: an experimental study. J. Petroleum Sci. Eng. 196, 107984. doi:10.1016/j.petrol.2020.107984

CrossRef Full Text | Google Scholar

Song, X., Li, G., Huang, Z., Shi, Y., Wang, G., Song, G., et al. (2023). Review of high-temperature geothermal drilling and exploitation technologies. Gondwana Res. 122, 315–330. doi:10.1016/j.gr.2022.10.013

CrossRef Full Text | Google Scholar

Taheri, K., Zeinijahromi, A., Tavakoli, V., and Alizadeh, H. (2024). Formation damage management through enhanced drilling efficiency: mud weight and loss analysis in asmari formation, Iran. J. Afr. Earth Sci. 217, 105348. doi:10.1016/j.jafrearsci.2024.105348

CrossRef Full Text | Google Scholar

Talib, M., Afzal Durrani, M. Z., Subhani, G., Sarosh, B., and Atif Rahman, S. (2023). Faults/fractures characterization to improve well planning and reduce drilling risks–A case study from a tight carbonate reservoir in Pakistan. J. Seismic Explor. 32 (1), 21–38.

Google Scholar

Tan, T. H., Wu, J. Y., Liu, S. H., and Gochoo, M. (2022). Human activity recognition using an ensemble learning algorithm with smartphone sensor data. Electronics 11 (3), 322. doi:10.3390/electronics11030322

CrossRef Full Text | Google Scholar

Thapa, I., and Ghani, S. (2025). AI-Enabled sustainable soil stabilization for resilient urban infrastructure: advancing SDG 9 and SDG 12 through hybrid deep learning and environmental assessment. Bull. Comput. Intell. 1, 3–30. doi:10.53941/bci.2025.100002

CrossRef Full Text | Google Scholar

Tyralis, H., and Papacharalampous, G. (2021). Boosting algorithms in energy research: a systematic review. Neural Comput. Appl. 33 (21), 14101–14117. doi:10.1007/s00521-021-05995-8

CrossRef Full Text | Google Scholar

Ugarte, E. R., and Salehi, S. (2021). A review on well integrity issues for underground hydrogen storage. J. Energy Resour. Technol. 144 (4), 042001. doi:10.1115/1.4052626

CrossRef Full Text | Google Scholar

Wang, K., Chen, Z., Wang, Z., Chen, Q., and Ma, D. (2023). Critical dynamic stress and cumulative plastic deformation of calcareous sand filler based on shakedown theory. J. Mar. Sci. Eng. 11 (1), 195. doi:10.3390/jmse11010195

CrossRef Full Text | Google Scholar

Wong, T.-T., and Yeh, P.-Y. (2019). Reliable accuracy estimates from k-fold cross validation. IEEE Trans. Knowl. Data Eng. 32 (8), 1586–1594. doi:10.1109/tkde.2019.2912815

CrossRef Full Text | Google Scholar

Wood, D. A., Mardanirad, S., and Zakeri, H. (2022). Effective prediction of lost circulation from multiple drilling variables: a class imbalance problem for machine and deep learning algorithms. J. Petroleum Explor. Prod. Technol. 12 (1), 83–98. doi:10.1007/s13202-021-01411-y

CrossRef Full Text | Google Scholar

Yaghoubi, E., Yaghoubi, E., Khamees, A., Razmi, D., and Lu, T. (2024). A systematic review and meta-analysis of machine learning, deep learning, and ensemble learning approaches in predicting EV charging behavior. Eng. Appl. Artif. Intell. 135, 108789. doi:10.1016/j.engappai.2024.108789

CrossRef Full Text | Google Scholar

Yang, Y., Lv, H., and Chen, N. (2023). A survey on ensemble learning under the era of deep learning. Artif. Intell. Rev. 56 (6), 5545–5589. doi:10.1007/s10462-022-10283-5

CrossRef Full Text | Google Scholar

Yu, H., Zhao, Z., Dahi Taleghani, A., Lian, Z., and Zhang, Q. (2024). Modeling thermal-induced wellhead growth through the lifecycle of a well. Geoenergy Sci. Eng. 241, 213098. doi:10.1016/j.geoen.2024.213098

CrossRef Full Text | Google Scholar

Zhang, L., Wen, J., Li, Y., Chen, J., Ye, Y., Fu, Y., et al. (2021). A review of machine learning in building load prediction. Appl. Energy 285, 116452. doi:10.1016/j.apenergy.2021.116452

CrossRef Full Text | Google Scholar

Zhang, Z., Wei, Y., Xiong, Y., Peng, G., Wang, G., Lu, J., et al. (2022). Influence of the location of drilling fluid loss on wellbore temperature distribution during drilling. Energy 244, 123031. doi:10.1016/j.energy.2021.123031

CrossRef Full Text | Google Scholar

Zhang, Y., Liu, J., and Shen, W. (2022). A review of ensemble learning algorithms used in remote sensing applications. Appl. Sci. 12 (17), 8654. doi:10.3390/app12178654

CrossRef Full Text | Google Scholar

Zheng, Y., Li, J., and Li, J. (2025). Determination of initial void ratio of remolded clay using bayesian data fusion approach. Geotechnical Test. J. 48 (3), 346–363. doi:10.1520/gtj20240111

CrossRef Full Text | Google Scholar

Zhu, Q. (2022). “Treatment and prevention of stuck pipe based on artificial neural networks analysis,” in Offshore technology conference Asia.

Google Scholar

Zou, B., Yin, J., Liu, Z., and Long, X. (2024). Transient rock breaking characteristics by successive impact of shield disc cutters under confining pressure conditions. Tunn. Undergr. Space Technol. 150, 105861. doi:10.1016/j.tust.2024.105861

CrossRef Full Text | Google Scholar

Zou, B., Yin, J., Zhang, W., and Long, X. (2025a). Fractal analysis of limestone damage under successive impact by shield disc cutters. Eng. Fract. Mech. 322, 111163. doi:10.1016/j.engfracmech.2025.111163

CrossRef Full Text | Google Scholar

Zou, B., Chen, Y., Bao, Y., Liu, Z., Hu, B., Ma, J., et al. (2025b). Impact of tunneling parameters on disc cutter wear during rock breaking in transient conditions. Wear 560, 205620. doi:10.1016/j.wear.2024.205620

CrossRef Full Text | Google Scholar

Zulfiqar, H., Yuan, S. S., Huang, Q. L., Sun, Z. J., Dao, F. Y., Yu, X. L., et al. (2021). Identification of cyclin protein using gradient boost decision tree algorithm. Comput. Struct. Biotechnol. J. 19, 4123–4131. doi:10.1016/j.csbj.2021.07.013

PubMed Abstract | CrossRef Full Text | Google Scholar

Nomenclature

ML Machine learning

RF Random Forest

DT Decision Tree

AdaBoost Adaptive Boosting

EL Ensemble learning (stacking-based in this study)

CNN Convolutional Neural Network

LSTM Long Short-Term Memory network

GRU Gated Recurrent Unit

SHAP SHapley Additive exPlanations

SDG Sustainable Development Goal

y Target variable (mud loss volume)

yT Target at final time step or total dataset label

yi Target value for sample i

X Input feature vector

Xi Feature vector for sample i

t Time step or sequence index

at Activation or attention at time step t

hi Hidden-state/learner output for sample i

w Model weight, feature weight, or ensemble weight

et Error/residual at time step t

Pi Predicted mud loss for sample i

Gp Gain/impurity reduction function of parameter p

Dh Hole diameter (inches or mm)

ΔP Differential pressure between wellbore and formation (MPa or psi)

μ Mud viscosity (cP)

Sc Solid content of drilling fluid (%)

Vloss Mud loss volume (L, m3, or bbl)

fRF Random Forest base learner prediction

fAda AdaBoost base learner prediction

fDT Decision Tree base learner prediction

αRF,αAda,αDT Ensemble weights for RF, AdaBoost, and DT

y^ Final ensemble prediction (y^=αRFfRF+αAdafAda+αDTfDT)

R2 Coefficient of determination

MAE Mean absolute error

RMSE Root mean squared error

MAPE Mean absolute percentage error (%)

k Number of folds in k-fold cross-validation

Dtrain,Dtest Training and independent test sets

Keywords: drilling, geo-resources, machine learning, mud loss, outlier detection

Citation: Alkwai LM, Yadav K, Alharbi Y, Dutta D and Abbasi H (2026) Accurate intelligent modeling of mud loss while drilling wells via soft computing methods. Front. Earth Sci. 13:1750129. doi: 10.3389/feart.2025.1750129

Received: 19 November 2025; Accepted: 25 December 2025;
Published: 22 January 2026.

Edited by:

Moataz Barakat, Tanta University, Egypt

Reviewed by:

Panagiotis G. Asteris, School of Pedagogical and Technological Education, Greece
Adel Salem, Suez University, Egypt

Copyright © 2026 Alkwai, Yadav, Alharbi, Dutta and Abbasi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Hojjat Abbasi, aG9qamF0YWJiYXNpbWV5Ym9kaUBnbWFpbC5jb20=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.