Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Clim., 15 January 2026

Sec. Carbon Dioxide Removal

Volume 7 - 2025 | https://doi.org/10.3389/fclim.2025.1710187

This article is part of the Research TopicAdvancements in the Roles of Geomechanics for Energy Extraction and Geological Storage ApplicationsView all articles

A robust deep learning framework for predicting carbon dioxide-water alternating gas injection performance and optimization

  • 1State Key Laboratory of Petroleum Resources and Engineering, China University of Petroleum (Beijing), Beijing, China
  • 2Department of Oil-Gas Field Development Engineering, College of Petroleum Engineering, China University of Petroleum (Beijing), Beijing, China
  • 3School of Earth and Environment, Anhui University of Science and Technology, Huainan, China
  • 4The Faculty of Biosciences, Fisheries and Economics, UiT The Arctic University of Norway, Tromsø, Norway
  • 5OSEAN—Outermost Regions Sustainable Ecosystem for Entrepreneurship and Innovation, University of Madeira Colégio dos Jesuítas, Funchal, Portugal
  • 6Department of Earth and Environmental Sciences, Bahria University, Islamabad, Pakistan
  • 7Department of Petroleum and Natural Gas Engineering, Faculty of Engineering, University of Khartoum, Khartoum, Sudan

Carbon dioxide (CO2) emissions pose a major environmental concern, and various methods are used for CO2 sequestration. CO2-water activating gas (CO2-WAG) injection is a technique used to increase production of oil and sequester CO2 in subsurface formations. However, the performance of the CO2-WAG project depends on various parameters, such as injection rates, cycle size, and ratio, that traditionally require numerous computationally expensive simulations. The study introduces a robust machine learning workflow for CO2-WAG performance prediction and optimization by using a model calibrated using Bell Creek formation properties. Machine learning models are based on algorithms like extreme gradient boosting (XGBoost), linear regression (LR), random forest (RF), k-nearest neighbor (KNN), support vector regression (SVR), artificial neural network (ANN), convolutional neural network (CNN), and hybrid models such as ANN and CNN coupled with XGBoost (ANN-XGBoost, and CNN-XGBoost) to predict CO2-WAG performance. A dataset of 2,400 samples was generated using the CMG-GEM numerical simulator, incorporating seven input parameters (e.g., injection rate, CO2-WAG cycle size, and WAG ratio) and three output parameters, with 80% of the dataset allocated for training and 20% for validation and testing. Among the proposed models, the hybrid model ANN-XGBoost demonstrated superior performance, accurately predicting total oil production, CO2 storage, and efficiency, with high R2 scores of 0.99159, 0.97515, and 0.98706, and corresponding lower RMSE values of 2.8 × 10−2, 1.5 × 10−1, and 2.4 × 10−2. Coupling the proxy with particle swarm optimization (PSO) yielded 12.8% increase in cumulative oil production and 11% increase in CO2 storage. Furthermore, in terms of speed, the projected workflow requires less minutes to complete predictions and optimization, while traditional numerical simulators require 4–5 min per scenario. These findings validates the robustness and computational efficiency of the proposed machine learning workflow for predicting CO2-WAG performance and optimization.

1 Introduction

Fossil fuels, particularly oil, remain the primary source of energy, with demand continuing to rise due to continuous development and urbanization. However, fossil fuels are considered as major contributors to CO2 emissions, as their combustion releases CO2 (Hsu et al., 2012; You et al., 2020; Gao et al., 2023). Although extensive research is ongoing to identify alternative energy sources, no alternative energy source to replace fossil fuels has yet emerged, and fossil fuels are expected to remain as the major energy source in the forthcoming decades (Ud Din et al., 2022; Ud Din et al., 2023, 2025; Naghizadeh et al., 2024). To manage these growing challenges, carbon capture, storage, and utilization (CCUS) is considered a vital technology for modifying climate (Czernichowski-Lauriol et al., 2018; Kolawole et al., 2021; Dziejarski et al., 2023) change by sequestering CO2 in geological formations (Lackner, 2003; Ren et al., 2023), such as coal beds (Bachu, 2000), saline formations (Wang et al., 2021), and depletion of oil and gas reservoirs (Dai et al., 2016, 2018). CO2 injection into reservoirs is an important technology that has been used in the oil and gas industry for decades to increase oil recovery (Lake et al., 2014).

CO2-WAG injection is well established enhanced oil-recovery (EOR) technique, showing high sweep efficiency and reducing the risk of gas channeling, viscous fingering (Yao et al., 2023), and gravity segregation. More specifically, this technique increases oil production and sequesters CO2 in underground formations (Spiteri et al., 2005; Rasmusson et al., 2016; Al-Khdheeawi et al., 2018; Zhong et al., 2019), Figure 1 illustrates a schematic diagram of the CO2-WAG technique and its role in CCUS. The technique was first employed in 1957 by Mobil Corporation in a Canadian sandstone reservoir. This method effectively addressed the early CO2 breakthrough and enhanced the hindrance to gas flow. In addition, CO2-WAG enhances the mobility ratio and decreases water flow hindrance. Survey data indicate that in the United States, approximately 80% of the oilfields have adopted WAG, with consistently positive outcomes. This widespread application underscores its effectiveness in boosting oilfield development and production rates (Sanchez, 1999). Further, Christensen et al. (2001) studied 59 fields employed with WAG, and reported an average 10% increase in oil recovery. This evidence underscores the positive influence of WAG technology in augmenting oilfield production. Several prior studies have further investigated the core-scale heterogeneity in determining the efficiency of oil production through CO2-WAG injection. The results revealed that CO2-WAG injection exhibited superior performance in cases involving homogeneous, layered, and composite samples (Al-Bayati et al., 2018). Han and Gu (2014) evaluated optimized miscible injection through nine core-flooded experiments and demonstrated that optimized miscible injection outperformed other methods; notably, a smaller WAG slug (1:1 ratio) yielded higher recovery factors. Sun et al. (2021) investigated CO2 phase migration through porous media during WAG injection and observed a 46% increase in recovery factor (RF). The results also showed that the gas-to-water injection ratio played a pivotal role in influencing the efficacy of WAG injections, as highlighted in prior studies. Khather et al. (2022) explored the influence of CO2 and carbonate interactions on oil and gas recovery. This study showed a series of experiments utilizing three distinct carbonate rock core samples characterized by heterogeneous properties, such as different oil saturation and permeability values, including both low and moderate values. The results indicated that CO2-WAG injection following water flooding led to a remarkable surge in the RF, exceeding 30% for all the core samples. Ren et al. (2023) examined the use of CO2-EOR and its subsequent CO2 sequestration in the fields located in the Ordos Basin, China, by employing continuous CO2 and WAG injection techniques. This study showed that simultaneous injection of equal amounts of CO2 and WAG substantially improved crude oil production in the examined oilfields. Alrassas et al. (2022) studied CO2-WAG injection in a Yemeni interbedded reservoir and compared with continuous CO2 injection. The study demonstrated that CO2-WAG provided superior reservoir performance, enhancing both oil recovery and CO2 storage, largely due to its higher Kv/Kh ratios. Moreover, the 2:1 WAG ratio outperformed the 1:1 ratio, underscoring the high importance of optimizing injection strategies in interbedded reservoirs to maximize oil recovery and CO2 storage efficiency.

Figure 1
Illustration showing the Carbon Capture, Utilization, and Storage (CCUS) process. CO2 is captured from sources like cement, coal, and natural gas production. It is transported to a refinery, then injected underground. The underground section shows CO2 injection into a well, interacting with oil, water, and grains, enhancing oil recovery. Connected pipelines and tanks represent transportation and storage.

Figure 1. Schematic diagram of CO2-WAG technique and CCUS aims.

Currently, the optimization of CO2-EOR technology remains a central focus across numerous petroleum engineering fields. Dai et al. (2017) applied Monte Carlo simulations to quantify uncertainty in CO2 storage potential within an active EOR project located at the Morrow Reservoir, Farnsworth Unit, Texas. Vo Thanh et al. (2020) employed the WAG process to improve robust optimization of CO2 storage potential in a Vietnamese field. These findings showed that WAG injection effectively improved CO2 sequestration compared with continuous gas injection. Specifically, the nominal and optimal optimization scenarios improved CO2 sequestration by 13 and 15%, respectively, compared to the baseline WAG case. Rodrigues et al. (2019) conducted an optimization study using computer modeling group (CMG) simulator in a Brazilian offshore field and a strategy for designing CO2-WAG operations in carbonate reservoirs, emphasizing the considerations of economic feasibility, efficiency in recycling CO2, and assessment of project risks.

Conventional approaches to parameter optimization are often cumbersome and labor-intensive, while neglecting complex nonlinear interactions and the fundamental variables that drive them. These methods typically rely mostly on distinct models and algorithms that restrict their adaptability to varying oilfield conditions and fluctuations. They possess certain limitations and lack flexibility. Consequently, incorporating more sophisticated optimization methods, such as ML and metaheuristic algorithms, offers a superior solution for handling the intricacies and uncertainties associated with oil fields, and also improved optimization efficiency and precision (He et al., 2021; Li et al., 2022; Sen et al., 2022; Xue et al., 2023a, 2023b, 2024). Presently, notable progress has been made in the field of petroleum exploration and production owing to rapid advancements in intelligent algorithms, particularly in the domain of ML. Nevertheless, ML frameworks have become more and more popular in the context of CO2-EOR and CO2-WAG, there are still certain limitations of current research. Furthermore, several investigations typically concentrate either on oil recovery or CO2 storage, without the simultaneous capturing of the two processes, limiting their applicability to WAG decision-making (Gao et al., 2023; Li et al., 2022). Numerous observations based on operational or geological inputs limited enough to model the main WAG behaviors of cycle size, cycle ratio, pressures of injection and impacts of three-phase hysteresis (Naghizadeh et al., 2024; You et al., 2020). Moreover, prior ML studies regularly use a few instances of simulation to narrow the breadth of the training domain and limit the ability of the model to acquire complex nonlinear connections. Liu et al. (2024) made predications of CO2 storage under different subsurface environments with RF and XGBoost, only having 184 datasets; Ahmadi et al. (2018) created an LSSVM-based proxy using Box–Behnken experimental design, which constrained input combinations. Another recently developed deep-learning model utilizing CNN and MLNN was used to forecast CO2 solubility trapping and recovery of oil based on 814 test cases (AlRassas et al., 2025). This research was limited to prediction, though it did not incorporate an optimization element. In a broader sense, the current ML and DL research lacks explicit consideration of the WAG-specific physics like dynamics of CO2 trapping (Sen et al., 2022; Song et al., 2020), as well as it does not consider operational optimization, which constrains its use in the context of real-time design and decision-making (Vaziri and Sedaee, 2023).

Recent studies have shown that hybrid architectures are not only based on neural networks and boosted trees can significantly outperform individual algorithms in reservoir engineering by being much better able to represent nonlinear interaction without overfitting (Jiao et al., 2024; Otmane et al., 2025; Vaziri and Sedaee, 2023). Continuing on these contributions, the current study constructs a hybrid ANN-XGBoost surrogate model that has been trained using a large and systematically produced synthetic dataset, which covers a large variety of WAG working conditions. The suggested framework, in contrast to the previous literature, which only focuses on prediction, combines prediction with multi-objective optimization through the integration of the surrogate model with the particle swarm optimization (PSO). This allows recognizing operational strategies that optimize the recovery of oil and store CO2 simultaneously in order to overcome the shortcomings of prior ML research and provide a complete, effective workflow to design and optimize CO2-WAG designs.

This study thoroughly analyzed the various effects of CO2-WAG parameters, which include injection rates, cycle sizes, and ratios, on cumulative oil production and CO2 stored using a model based on Bell Creek formation characteristics. Furthermore, the study proposed a robust ML prediction workflow for CO2-WAG technique performance prediction by employing various ML algorithms such as extreme gradient boosting (XGBoost), linear regression (LR), random forest (RF), k-nearest neighbor (KNN), support vector regression (SVR), artificial neural network (ANN), and convolutional neural network (CNN). Additionally, the study compares these algorithms with hybrid algorithms, such as ANN coupled with XGBoost (ANN-XGBoost), and CNN coupled with XGBoost (CNN-XGBoost). The best prediction model will then be coupled with particle swarm optimization (algorithm) to perform the optimization study. The results provide insights into the factors affecting CO2-WAG performance and a robust ML workflow for CO2-WAG performance prediction and optimization. The findings of the study provide engineers and decision makers a robust optimization and decision-making framework.

2 Methodology

2.1 Base model

The study used CMG-GEM (CMG, 2022) to construct a base model reflecting the Bell Creek formation characteristics. The developed model was designed with a five-spot well pattern comprising a central production well and four injection wells located at the corners (Jin et al., 2018). The model consists of 59 × 59 × 6 = 20,886 grid blocks. Furthermore, to mimic the original reservoir conditions, the model was designed with a dip angle of 1°, consistent with the Bell Creek formation, as shown in Figure 2.

Figure 2
Diagram showing a 3D grid block model with layers colored blue on top and red on the bottom. Four corners have labeled injector points, and the center is labeled as a producer.

Figure 2. Schematic diagram of the simulation model.

Table 1 listed all the parameters used in developing the base model. The reservoir is located at a depth of 1,342 m with initial reservoir pressure of 2,400 psi at 41.7 °C. The reservoir conditions put the model above the minimum miscibility pressure of 1,411 psi, which is really important in a CO2-WAG project, to keep the injection pressure above the minimum miscibility pressure (MMP), thus making the CO2 miscible with the oil at reservoir conditions. The reservoir has dual porosity and permeability; the upper part is made of siltstone, while the lower part is of sandstone (Kurz et al., 2013). Where the vertical permeability kv is 0.1 times the horizontal permeability. Furthermore, the model realizes all the trapping mechanisms involved in a CO2 storage process, such as structural trapping, solubility trapping, residual trapping, and mineralization. To apply the residual trapping, the model that was applied in CMG-GEM in-built linear phase permeability hysteresis model was used ( S gr max = 0.30 ) (Land, 1968). However, the mineralization effect is negligible as it requires long periods to take place. Table 2 listed all the components of the fluid model used in the base model, where Peng Robinson EOS was employed for the fluid model tuning; all components are lumped to seven pseudo components by the lumping technique, utilizing WinProp.

Table 1
www.frontiersin.org

Table 1. Model reservoir properties.

Table 2
www.frontiersin.org

Table 2. Fluid composition.

2.2 Database generation

Conducting a ML-based study required a huge amount of high-quality time series data. In the absence of real field data, numerical simulations were used to produce a dataset for a relevant ML study. The dataset includes seven CO2-WAG operational parameters, such as bottom hole pressure injector (water and gas), bottom hole pressure production, WAG cycle size, WAG cycle ratio, gas injection rate, and water injection rate as inputs, and three output parameters, such as cumulative oil production, cumulative CO2 storage, and CO2 storage efficiency. The input parameters maximum and minimum ranges are listed in the Table 3. Then, a specialized Python script (written by our research group to generate input data files for CMG GEM) was used to randomly generate a set of 2,400 input data files for the CMG-GEM within the ranges given in Table 3 (while keeping all other parameters the same).

Table 3
www.frontiersin.org

Table 3. CO2-WAG parameters range.

The script then automatically runs numerical simulations number-by-number using the created data files in the CMG-GEM. Upon completion of the simulations, cumulative oil production and total CO2 stored (used for CO2 storage efficiency calculations) data were extracted from the output files and were used as output parameters to complete the dataset for the ML study. The output parameters and the proposed objective functions can be calculated.

2.2.1 Cumulative oil production

The dimensionless cumulative oil production can be calculated as given in Equation 1:

COP ( x ) = n = 1 T Q n o ( x )     (1)
where x is the variable vector for optimization, Q n o ( x ) is cumulative oil production, n is timestep, T and is the total timestep.

2.2.2 Total CO2 stored

The second objective function in the dataset is total CO2 stored, which can be calculated by Equation 2:

CO 2 SC ( x ) = n = 1 T CO 2 S C n ( x ) = n = 1 T ( M n ICO 2 ( x ) M n PCO 2 ( x ) )     (2)
where, CO 2 SC ( x ) total CO2 stored, x represents the variable vector for optimization, M n ICO 2 ( x ) is CO2 injected at any timestep, M n PCO 2 ( x ) is CO2 produced at any timestep, n is timestep, and T is the total timestep.

2.2.3 Storage efficiency

CO2 storage efficiency is considered as the third objective function; however, it is not included in the optimization stage and is only considered during prediction, and is calculated by Equation 3:

CO 2 Storage efficiency = Amount of CO 2 stored Amount of CO 2 injected     (3)

2.2.4 Coupled optimization

For coupled optimization we consider cumulative oil production and CO2 stored as these are the two main objectives to optimize and can be mathematically expressed by Equation 4:

F = n 1 × COP ( x ) + n 2 × CO 2 SC ( x )     (4)
where, F is our coupled objective function for optimization n 1 , and n 2 are the weights. Equal weights are assigned to treat both the objective variables equal.

2.3 Proxy model development

Advanced computational technologies have broadened the scope of reservoir modeling. However, limited computational resources still pose challenges in terms of uncertainty quantification and optimization workflows. To address this challenge, computationally efficient proxy models can be used. Proxy models, also known as surrogate models, are mathematical or statistical models that mimic complex simulation models by using predetermined input parameters (Zubarev, 2009; Xue et al., 2023a, 2025). These models are used as a potential substitute for complex models, ensuring computational efficiency and significantly reducing the computation cost and time. Various algorithms have been employed for proxy model development in the oil and gas sector. This study employed various algorithms, such as XGBoost, RF, ANN, CNN, KNN, SVR, LR, ANN-XGBoost, and CNN-XGBoost, for proxy model development to accomplish the prediction study, as shown in Figure 3.

Figure 3
Flowchart illustrating a data processing framework for sensitivity analysis and machine learning in oil and gas production. It begins with data gathering and progresses through stages including base model development, model validation, and large dataset creation. Input parameters like water injection rate and cycle ratio are used to generate output parameters such as cumulative oil production and CO2 sequestration. Processes include data preprocessing, normalization, model evaluation, and predictions. Algorithms like linear regression and neural networks are utilized. Final steps involve predictions and optimization techniques like Particle Swarm Optimization, leading to the end.

Figure 3. Flowchart displays the detailed workflow of the study. Machine learning algorithms that include linear regression (LR), support vector regression (SVR), random forest (RF), k-nearest neighbors (KNN), extreme gradient boosting (XGBoost), artificial neural networks (ANN), convolution neural networks (CNN), and hybrid models (ANN-XGBoost and CNN-XGBoost) were used for predicting cumulative oil production, total CO2 stored, and storage efficiency.

Three proxy models were developed for cumulative oil production, total CO2 stored, and storage efficiency for a period of 20 years, a dataset of 2,400 samples that included four input parameters (including injection rate, CO2-WAG cycle size, and ratio) and three output parameters (cumulative oil production, total CO2 stored, and storage efficiency). The dataset was thoroughly analyzed for outliers before training. The dataset was then ready for training by splitting it into training (80%), testing (10%), and validation (10%) datasets. The datasets were normalized by feature scaling to ensure consistency (0–1). After all these steps, the dataset was set ready for training the models to develop the respective proxy models, and Figure 4 shows the specific training process employed.

Figure 4
Flowchart illustrating a data processing pipeline for oil production and CO2 storage. Steps include simulation, data extraction, preparing datasets with inputs and outputs, splitting into training, validation, and testing sets, training machine learning models, and making predictions. Outcomes focus on cumulative oil production, total CO2 stored, and CO2 storage efficiency, followed by coupled optimization.

Figure 4. Specific step involved in the training process.

The employed ML models performance can be evaluated by plotting the cross plots between the actual and predicted values, such as the R2 plots and RMSE calculations. Furthermore, relative errors and residual plots were used to validate the models. These parameters can be calculated by Equations 57:

R 2 = 1 i = 1 n ( y i , Act y i , Pred ) 2 i = 1 n ( y i , Act y ¯ i , Act ) 2     (5)
RMSE = 1 n i = 1 n ( y i , Act y i , Pred ) 2     (6)
ARE = 1 n i = 1 n y i , Act y i , Pred y i , Act     (7)
where, y i , Act is the actual values obtained from simulator, y i , Pred is the values predicted by the various models, and n is represent samples number.

2.4 Machine learning algorithms

The study applied various ML models based on algorithms, like XGBoost, Random Forest, SVR, KNN, LR, ANN, and CNN. The following section explains ANN and XGBoost algorithms in detail, where the other algorithms (ER, KNN, SVR, LR, and CNN) employed in this study are provided in Supplementary material.

2.4.1 Artificial neural network

ANN is a computational model that mimics the human neurological system. ANN comprises of neurons, also called nodes, which are organized into layers, weights, bias, and activation functions, as shown in Figure 5. The basic structure is comprised of several layers, each containing several nodes (neurons). The input layer receives the input dataset, hidden layers (single or multiple) process the information, and output layer processes the results (Hill et al., 1994). The connections between neurons carry weights that adjust during the training process and detect the influence of neurons on one another. In the training process, forward and back propagation were used, where the adjusting weights minimized the error between the actual and predicted values (Song et al., 2020). Activation factors, such as ReLU, enable the model to learn from non-linear complex reservoir models (as listed in Table 4) (Bergen et al., 2019). The advantage of ANN models is that, unlike CNN models, they do not require large datasets for training. Further, ANN models have been successfully employed in various fields, including oil and gas.

Figure 5
Diagram of a neural network model showing inputs \(X_1\) to \(X_n\) each multiplied by weights \(\omega_1\) to \(\omega_n\). A bias \(b\) is added to the weighted sum \(\Sigma\), which is then processed by an activation function \(f\) to produce the output \(y\).

Figure 5. Schematic artificial neural network (ANN) diagram.

Table 4
www.frontiersin.org

Table 4. Hyperparameters used for the ANN and XGBoost models.

2.4.2 XGBoost algorithm

XGBoost algorithm is a form of the gradient boosting decision tree (GBDT) algorithm, (Xu et al., 2019) which is a scalable and efficient tree-boosting system introduced by Chen and Guestrin (2016). It has found immense popularity in classification and regression problems because it is computationally efficient and highly accurate for making predictions. Based on the principles of GBDT, XGBoost provides a number of main optimizations to improve performance. It is worth noting that it uses a second-order Taylor approximation of the loss function, which leads to a higher precision for optimization than the first-order gradient-based optimization methods. Moreover, the algorithm includes L1 and L2 regularization terms in the objective function to avoid overfitting (as listed in Table 4) (Krizhevsky et al., 2017) and foster the generalization and sparsity of the model. Besides the benefits of computational efficiency, XGBoost also makes use of block-based storage to optimize sequential access to memory, as well as to enable parallel computing, which considerably increases the training speed. These improvements combine to make XGBoost a powerful and scalable tool capable of handling large-scale uses in ML applications. In Figure 6 a schematic diagram of the XGBoost algorithm is shown.

Figure 6
Diagram illustrating a gradient boosting process. A dataset feeds into multiple sequential trees, labeled as Tree₁ to Treeₖ. Each tree processes the residuals from the previous step. The outputs from the functions \(f_k(X, \phi_1)\), \(f_k(X, \phi_2)\), and so on, are aggregated into a final output represented by the sum \(\Sigma f_k(X, \theta_k)\), combining all tree outputs.

Figure 6. Schematic diagram of a typical XGBoost algorithm.

2.5 Optimization algorithm

2.5.1 Particle swarm optimization

PSO is a population-based stochastic optimization algorithm first described by Kennedy and Eberhart (1995) with the idea inspired by the social behaviors of bird flocks and fish schools. In PSO, particles are a set of candidate solutions that search the search space by changing their position and velocity through time with both personal and swarm experience. Every particle modifies its course depending on two primary elements such as the optimal place it has ever reached by itself (personal best), and the optimal place reached by any of the swarm of particles (global best). This collaboration enables the swarm to be drawn to optimum or near-optimal solutions in complex and multidimensional spaces. The velocity of the particle is corrected by the following Equation 8:

v i ( t + 1 ) = ω v i ( t ) + c 1 r 1 ( P best , i ( t ) x i ( t ) ) + c 2 r 2 ( g best , i ( t ) x i ( t ) )     (8)
where, v i and x i is the velocity and position of a particle at iteration, ω is the inertia weight, c 1 and c 2 those are acceleration coefficients, and r are randomly generated numbers between 0 and 1, and g best , i . Indicate the personal and global best positions to be observed to date.

PSO is specifically adapted to optimization by its conceptual simplicity, computer efficiency, and minimal parameter tuning. This study used a simple variant of PSO when all the particles become informed on a global scale by a fully connected topology. We set the inertia weight ω = 0.6 and acceleration coefficients c 1 = c 2 = 1.5 . Swarm size was set to 100 particles (relatively cheap to compute) and provides sufficient exploration.

3 Results and discussion

3.1 Base case analysis

Figure 7 shows the base WAG cumulative oil production and cumulative CO2 storage. The base WAG gas injection rate is 4.53 × 104 m3/day, water injection rate 1.9 × 103 m3/day, other operational parameters are listed in Table 5. Yielded a cumulative oil production of 1.1 × 106 bbl and CO2 storage of 4.53 × 105 tons. The following section considers how the water injection rate, gas injection rate, WAG cycle size, and WAG cycle ratios affect cumulative oil production, total CO2 stored, and storage efficiency.

Figure 7
Two graphs display trends over 20 years with dashed lines labeled

Figure 7. Base WAG cumulative oil production and CO2 storage plots: (a) cumulative oil production and (b) cumulative CO2 storage.

Table 5
www.frontiersin.org

Table 5. Base and optimized CO2-WAG parameters and results.

3.1.1 Effect of WAG cycle size and ratio

Figures 8ac depict the effect of WAG cycle size on cumulative oil production, total CO2 stored, and storage efficiency. This study used various WAG cycle sizes, such as 1, 3, 6, 9, 12, and 18 months. From the results, the cumulative oil production was quite stable in all WAG cycle sizes, although its increase was slight at 6 months, possibly indicating an optimum proportion between CO2 and water slug to manage the activation of mobility. The amount of CO2 stored also remained comparatively stable, although a slight increase was observed as the WAG cycles became longer, perhaps because of the long retention time. The storage efficiency is not highly dependent on the WAG cycle size, but the variability is lower for longer WAG cycle durations, implying that trapping mechanisms are more predictable with longer cycle durations.

Figure 8
Box plots showing relationships among WAG (water alternating gas) cycle size and ratio with cumulative oil production, CO2 storage, and storage efficiency. Panels (a), (b), and (c) compare WAG cycle sizes in months for each parameter. Panels (d), (e), and (f) assess WAG cycle ratios. Variability and medians are illustrated.

Figure 8. Effect of WAG-cycle size and ratio on the objective functions. Effect of WAG cycle size on objective functions (a) cumulative oil production, (b) total CO2 stored, and (c) storage efficiency. Effect of WAG cycle ratio on objective functions (d) cumulative oil production, (e) total CO2 stored, and (f) storage efficiency.

Furthermore, the WAG cycle ratio impact on cumulative oil production, total CO2 stored, and storage efficiency is also investigated, as shown in Figures 8df. For thorough analysis, the study utilized various WAG cycle ratios such as 1.1, 1.2, 1.3, 2.1, and 3.1. The results show that cumulative oil production is highest at a ratio of 1:2, where the injection of CO2 takes precedence. This is an indication of the ability of CO2 to mobilize oil better than water. Storage is more efficient because more CO2 is stored as the cycle ratio of CO2 increases over time, reflecting a direct correlation between the impact of the fluid injection composition and its storage operation. Interestingly, at moderate CO2 to water ratios (i.e., 1:2 and 1:1), the storage efficiency is optimized, meaning that there is sufficient CO2 injection and adequate trapping enhanced by water slugs. At higher WAG cycle ratios (2.1, 3.1), the CO2 storage efficiency decreased, probably because of the CO2 breakthrough and less or no solubility trapping.

3.1.2 Effect of water and gas injection rate

Figures 9ac analyzes how the water injection rate influences cumulative oil production, total CO2 stored, and storage efficiency. The oil recovery is enhanced when the water injection rate is increased, and it demonstrates a reinforcement in the displacement of the oil in the reservoir. However, after a certain level, further increases in water injection rate will have very low or no effect on oil production, as in this case, water injection rates above 2,000 m3/day and further increases in water injection rate can lead to an early water breakthrough. The study objective is to investigate water effect on CO2 storage, by carefully examining Figure 9b increase of water above 250 m3 will negatively affect CO2 storage; further increase of water beyond it will significantly decrease CO2 storage. The CO2 retained remains relatively constant at higher and lower water injection rates, which means the water injection activity has little effect on the quantity of CO2 that is retained, and that is perhaps not surprising given the low degrees of miscibility or interactions between water and CO2 plumes, which are likely to occur. On the other hand, storage efficiency is realized to be decreasing significantly with the water injection rate. This tendency can be attributed to the fact that a larger pore space becomes filled with water, thus leaving less space to trap CO2. There is less CO2 that can be trapped by structural, residual, or solubility trapping when the pore network is filled with more water.

Figure 9
Six box plots labeled (a) to (f) represent cumulative oil production, CO₂ storage, and storage efficiency. Plots (a) to (c) analyze water injection rates; (d) to (f) analyze CO₂ injection rates. Varying shades indicate different parameter levels.

Figure 9. Effect of injection rates on objective functions. Effect of water injection rate on objective functions (a) cumulative oil production, (b) total CO2 stored, and (c) CO2 retention efficiency. Effect of gas injection rate on objective functions, (d) cumulative oil production, (e) total CO2 stored, and (f) storage efficiency.

Figures 9df display the effect of the CO2 injection rates on cumulative oil production, CO2 stored, and storage efficiency, respectively. The total oil production increased as the CO2 injection rate increased from 10,000 to 90,000 m3/day. The higher the CO2 injection rate, the better the oil mobility, probably because of better miscibility and support of the reservoir. Similarly, the total CO2 stored increased with increasing injection rate. A further increase in the injection rate from 90,000 to 100,000 m3/day does not indicate any significant increase in the cumulative CO2 storage; this may be due to premature breakthrough or reduced trapping efficiency via over-injection. Interestingly, with the injection rate, there was a negative relationship with storage efficiency. Although a greater amount of CO2 is trapped at higher rates, the volume ratio of effectively trapped CO2 decreases, relating to the poor mobility ratios and increased CO2 migration rates.

3.2 Machine learning model’s performance

The results of the ML model were compared with CMG-GEM. For this purpose, a dataset of 2,400 simulations were compared after conducting simulations. The dataset used for the ML study included input parameters, which included injection rates (water and CO2), WAG cycle size, and ratios, where the output parameters included cumulative oil production, total CO2 stored, and storage efficiency. The dataset was split into 80% training, 10% validation, and 10% testing. The simulation results were used as the actual values and compared with the predictions of the ML models. Figures 1012 depicts the actual versus predicted plots for the three desired output variables. Among all the single models used in XGBoost, it performed better than all models; however, when coupled with an ANN model, the resulting hybrid model outperformed all single (XGBoost, RF, ANN, CNN, SVR, KNN, and LR) and hybrid (ANN-XGBoost and CNN-XGBoost) machine learning models. This is evident from the R2 plots shown in Figure 13b, and the relevant values of the optimal models are listed in Table 6, and all other model values are listed in Supplementary Tables S1–S3. The results show that the predicted values are closely aligned with the actual values and cluster around the actual versus the predicted curve. Furthermore, Figure 13a illustrates the average absolute relative error plots (AARE) of all the models, which validates the superiority of the ANN-XGBoost model over the other models, with the lowest AARE values.

Figure 10
Scatter plots compare actual versus predicted cumulative oil production, with training, validation, and testing data indicated by different colors. Each plot from (a) to (i) shows a linear trend with varied data distributions.

Figure 10. CO2 cumulative oil production actual vs. predicted values plots. (a) ANN-XGBoost, (b) XGBoost, (c) CNN-XGBoost, (d) RF, (e) ANN, (f) CNN, (g) KNN, (h) SVR, and (i) LR.

Figure 11
Nine scatter plots showing predicted versus actual cumulative CO₂ storage for various datasets. Each plot, labeled from (a) to (i), includes blue squares for training data, cyan circles for validation, and orange triangles for testing. A line of actual versus predicted is shown in red for comparison. Plots depict clustered data points along the line, except for (f), (h), and (i), which show more scatter, indicating less accurate predictions in these cases.

Figure 11. Total CO2 stored, actual vs. predicted values plots. (a) ANN-XGBoost, (b) CNN-XGBoost, (c) XGBoost, (d) RF, (e) ANN, (f) CNN, (g) KNN, (h) SVR, and (i) LR.

Figure 12
Nine scatter plots labeled (a) to (i) compare predicted versus actual CO2 storage efficiency. Each plot includes training, validation, and testing data, with a trend line representing actual versus predicted values. The plots depict varying levels of data dispersion and correlation patterns, with closer alignment to the line indicating more accurate predictions. The vertical and horizontal axes range from zero to one, representing predicted and actual CO2 storage efficiency, respectively.

Figure 12. CO2 storage efficiency actual vs. predicted values plots. (a) ANN-XGBoost, (b) XGBoost, (c) CNN-XGBoost, (d) RF, (e) ANN, (f) CNN, (g) KNN, (h) SVR, and (i) LR.

Figure 13
Two bar charts comparing various algorithms. Chart (a) shows absolute relative error, and chart (b) shows R squared score. Algorithms include LR, RF, KNN, SVR, XGBoost, ANN, CNN, CNN_XGBoost, and ANN_XGBoost. Bars represent cumulative oil production, cumulative CO2 storage, and CO2 storage efficiency with colors blue, orange, and cyan, respectively.

Figure 13. All models comparison plots. (a) Absolute relative error plot and (b) R2 plot.

Table 6
www.frontiersin.org

Table 6. ANN-XGBoost model statistical analysis for cumulative oil production, total CO2 stored, and storage efficiency.

Furthermore, to check the performance of the respective models, this study also considered relative deviation plots. Actually, the relative deviation plots are important to see how the model behaves and show any deviation of the predicted values from the actual values; therefore, the higher the deviation, the less accurate the model will be, and the lower the deviation, the higher the accuracy of the respective model. In Figures 1416 the relative deviation plots for all models for all three output variables were shown. These plots were used to determine whether the model was under-predicting or over-predicting. Based on the results of all the models, the ANN-XGBoost model shows the least deviation from the actual values, as in the case of oil production. The majority of the values are clustered around the zero line and have a slight deviation of 0.045 ± 0.155. Similarly, in total CO2 stored, the majority of the values are clustered around the zero line with a deviation of −0.05 ± 2.55. The same trend was observed in the case of CO2 storage efficiency, with a slight deviation of −0.39 ± 0.78. All the results show that the proposed model resulted in minimal deviation error and provides balanced performance.

Figure 14
Graphs showing relative deviation versus actual values for different models: (a) ANN-XGBoost, (b) CNN-XGBoost, (c) XGBoost, (d) RF, (e) ANN, (f) CNN, (g) KNN, (h) SVR, (i) LR. Each plot features blue dots and horizontal reference lines.

Figure 14. Cumulative oil production, relative deviation plots. (a) ANN-XGBoost, (b) CNN-XGBoost, (c) XGBoost, (d) RF, (e) ANN, (f) CNN, (g) KNN, (h) SVR, and (i) LR.

Figure 15
Nine scatter plots show the relative deviation versus actual values for different models: (a) ANN-XGBoost, (b) CNN-XGBoost, (c) XGBoost, (d) RF, (e) ANN, (f) CNN, (g) KNN, (h) SVR, and (i) LR. Each graph displays data points clustered around zero, with the y-axis ranging from varying negative to positive values, indicating model prediction deviations.

Figure 15. Total CO2 stored, relative deviation plots. (a) ANN-XGBoost, (b) CNN-XGBoost, (c) XGBoost, (d) RF, (e) ANN, (f) CNN, (g) KNN, (h) SVR, and (i) LR.

Figure 16
Nine scatter plots show relative deviation versus actual values for different models: (a) ANN-XGBoost, (b) CNN-XGBoost, (c) XGBoost, (d) RF, (e) ANN, (f) CNN, (g) KNN, (h) SVR, and (i) LR. Each graph displays data points clustering around zero deviation, indicating varied prediction accuracy across models.

Figure 16. CO2 storage efficiency, relative deviation plots. (a) ANN-XGBoost, (b) CNN-XGBoost, (c) XGBoost, (d) RF, (e) ANN, (f) CNN, (g) KNN, (h) SVR, (i) LR.

Figures 1719 represents the residual plots of all models utilized in the study for the three desired output variables. Residual plots were used for the model bias and pattern analysis. This shows that the desired models show bias or pattern trends. Among the models ANN-XGBoost model results in lower residual values, such as 0.015 ± 0.095, 0.025 ± 0.525, and ± 0.08 for cumulative oil production, CO2 retained, and CO2 retention efficiency, respectively. The majority of the values are spread around the zero line without any visible trend, which further validates the performance of the ANN-XGBoost model and can be utilized for prediction. Based on the above, the overall performances of the top three models were ANN-XGBoost > CNN-XGBoost > XGBoost.

Figure 17
Nine scatter plots showing residuals against actual values for different models. (a) ANN-XGBoost shows a dense cluster around zero with minor spread. (b) CNN-XGBoost displays a similar pattern. (c) XGBoost has slightly wider spread. (d) RF is more centralized. (e) ANN shows uniform spread around zero. (f) CNN has a rising trend. (g) KNN is tightly clustered. (h) SVR spreads widely but centralizes. (i) LR is evenly spread around zero. Each plot includes dotted lines marking the residual boundaries.

Figure 17. Cumulative oil production residual plots. (a) ANN-XGBoost, (b) CNN-XGBoost, (c) XGBoost, (d) RF, (e) ANN, (f) CNN, (g) KNN, (h) SVR, and (i) LR.

Figure 18
Nine scatter plots labeled (a) to (i) show different machine learning model residuals versus actual values. Models include ANN-XGBoost, CNN-XGBoost, XGBoost, RF, ANN, CNN, KNN, SVR, and LR. Each plot features residuals on the y-axis and actual values on the x-axis, with data points scattered and a reference line at zero residuals.

Figure 18. Total CO2 stored residual plots. (a) ANN-XGBoost, (b) CNN-XGBoost, (c) XGBoost, (d) RF, (e) ANN, (f) CNN, (g) KNN, (h) SVR, and (i) LR.

Figure 19
Nine scatter plots display residuals against actual values for different models. Each graph shows data points clustered around a horizontal line at zero residual, with some deviation. Models include ANN-XGBoost (a), CNN-XGBoost (b), XGBoost (c), RF (d), ANN (e), CNN (f), KNN (g), SVR (h), and LR (i). Residuals vary slightly across models.

Figure 19. CO2 storage efficiency residual plots. (a) ANN-XGBoost, (b) CNN-XGBoost, (c) XGBoost, (d) RF, (e) ANN, (f) CNN, (g) KNN, (h) SVR, and (i) LR.

Additionally, this study compares the robustness and speed of ML models with those of conventional simulators. The CMG-GEM requires at least 4–5 min to run as a single case; overall, it requires hours for all scenarios to complete, whereas the proposed ML model requires 10–15 s to predict the results of all scenarios, showing strong computational dominance over conventional simulators (Table 7). The PC environment utilized in this study is Core i7-8550U, with a processor of 2.00 GHz and 16 GB RAM. Figure 20 shows the ANN-XGBoost model prediction of the set objective functions. As CO2-WAG project engineers require a large number of cases to check various scenarios to choose the best and optimal ones, employing such models will be difficult and decrease the time required to conduct such studies of CO2-WAG projects.

Table 7
www.frontiersin.org

Table 7. Runtime comparison between simulator and the proxy model.

Figure 20
Three scatter plots comparing simulator actual values (blue) and ANN-XGBoost predicted values (red) for different metrics. (a) Cumulative oil production, (b) cumulative carbon dioxide storage, and (c) carbon dioxide storage efficiency versus sample number. Each plot shows a spread of data points across the sample range.

Figure 20. ANN-XGBoost model prediction plots. (a) cumulative oil production, (b) total CO2 stored, and (c) storage efficiency.

3.3 Coupled optimization

CO2-WAG exploitation of the reservoir is greatly dependent on various operational parameters that directly impact cumulative oil production and CO2 storage. To solve this, the proxy model (ANN-XGBoost) is integrated with the PSO optimization algorithm to optimize these parameters in a way to maximizes both oil production and CO2 storage. The parameters that were put into consideration during the optimization process were injection rates (water and gas), cycle size, cycle ratio, and BHP (injection and production). Table 5 depicts the optimized parameter ranges, and Figures 21a,b illustrate the resultant cumulative oil production and CO2 storage, respectively.

Figure 21
Two graphs compare the effectiveness of optimized and base Water-Alternating-Gas (WAG) techniques over twenty years. Graph (a) shows cumulative oil production, with optimized WAG producing more oil than the base. Graph (b) shows cumulative CO2 storage, with optimized WAG storing more CO2 than the base. Dashed lines represent base WAG, and dotted lines represent optimized WAG.

Figure 21. Optimized WAG cumulative oil production and CO2 storage plots. (a) cumulative oil production and (b) cumulative CO2 storage.

The results show that CO2-WAG operational parameters have a significant effect on both oil production and CO2 storage; the cumulative oil production and CO2 storage rose to 1.3 × 106 bbl and 5.56 × 105 tons from 1.11 × 106 bbl and 4.53 × 105 tons, respectively. Similarly, the production bottom-whole pressure rose to 1.31 × 103 psi, gas injection rose to 8.53 × 104 m3/day, water injection rose to 4.56 × 102 m3/day, cycle length was also changed from a 3-month cycle to a 6-month cycle. All these changes in operational parameters result in higher oil production and CO2 storage.

The results demonstrated how important it is to design operational parameters carefully. Further, it demonstrated that the use of ANN-XGBoost proxy model coupled with PSO optimization not only improves oil production and CO2 storage but also gives a systematic and robust framework for operation parameters optimization. That will enable both engineers to design quick operational parameters and decision makers to make effective decisions.

3.4 Research implications

The methodology of this work has much more far-reaching implications than the CO₂-WAG optimization and has a wider applicability in the field of subsurface engineering, environmental modeling, and energy-transition technologies. The hybrid ANNxGBoost surrogate model is an effective substitute to standard full-physics simulators that shows the fact that complicated nonlinear multiphase processes can be modeled correctly at a fraction of the computational expense. Hybrid frameworks of this type are effective in the increased oil recovery operations including polymer, surfactant, and miscible-gas flooding, where a quick scenario screening is necessary (Jiao et al., 2024; Otmane et al., 2025). This capacity to make thousands of predictions in seconds places this method as an effective instrument in conducting large-scale sensitivity analyses, designing, and assessing uncertainty. Recently, hybrid deep-learning architectures have been demonstrated to increase real-time safe operation decision-making through continuous model calibration and fast assessment of changing conditions of a reservoir (Otmane et al., 2025; Xue et al., 2023). The framework introduced in this paper gives the underlying elements required in such high-level workflows.

Apart from oil and gas sectors, the workflow is largely applicable in geothermal reservoir engineering, where the coupled thermal-hydraulic modeling is computationally expensive. The use of surrogate modeling has become one of the enabling technologies in geothermal production prediction and enhanced design of the geothermal system (Li et al., 2025; Xue et al., 2023). Other applications of the same benefits are groundwater remediation and environmental impact studies, where surrogate models speed up the process of simulating contaminant transport, remediation planning, and monitoring plumes (Luo et al., 2023; Wu et al., 2022). Notably, the hybrid ANN XGBoost model is also very convenient in predicting petrophysical properties, such as porosity, permeability, and saturation prediction. These properties tend to have complicated nonlinear relationships that are dependent on lithology, diagenesis, and depositional environment. ML techniques, especially the hybrid methods, have shown to be superior to empirical correlations and other conventional inversion techniques in predicting formation properties based on the logs, core data, and seismic attributes (Kalule et al., 2023; Talebkeikhah et al., 2021). The proposed framework can characterize reservoirs, identify facies, and update models in fields with limited core data or missing logging programs. It provides an opportunity to predict the static reserve properties quickly and with high accuracy.

The versatility of the model further supports the emerging energy-transition technologies of subsurface hydrogen storage and CO2 capture, with repeating injection-withdrawal cycles and plume movement, where risk evaluation and optimization of operation need to be performed by relying on repeated simulations of injection-withdrawal cycles and plume migration (Mao et al., 2024). In a broader sense, the multi-objective optimization framework employed in this paper is applicable to a very diverse category of engineering systems, including renewable-energy integration, optimization of chemical processes, and emissions-reduction technologies, in which the rapid search of high-dimensional parameter space can improve operational decision-making. In general, despite the fact that the current work is devoted to the CO2-WAG processes, the surrogate modeling and optimization framework under focus is flexible and transferable to wide range of subsurface, environmental, and energy-system applications.

3.5 Limitations and future work

The proposed framework demonstrates strong predictive and computational capabilities; there are a number of limitations that should be recognized to provide context to the presented results. The dataset used for training the surrogate model was created entirely from synthetic numerical simulations based on a representative reservoir model. Although this approach guarantees controlled variability in the operational parameters, it is not sufficient to capture heterogeneity, small-scale stratification, and dynamic reservoir behavior observed in actual fields. As a result, the generalizability of the trained model to other formations, depositional environments or fluid systems may require re-training using site-specific simulations or field data. The numerical model was constructed based on certain assumptions, although a moderate grid size was used to maintain computational feasibility for a total of 2,400 simulation runs, finer-scale heterogeneity, and complex facies transitions may not be completely represented. Additionally, the reservoir physics incorporated into the CMG-GEM simulator, such as the relative permeability curves, capillary-pressure behavior and Peng–Robinson EOS, introduce model dependency that may affect the prediction of the CO2, trapping efficiency and fluid displacement outcomes.

From ML perspective, the hybrid ANN-XGBoost model is inherently sensitive to hyperparameter choice, architecture design, and training domain coverage. Although hyperparameters were optimized to acheive stable and accurate predictions, challenges remain, including overfitting, sensitivity to training data imbalance, and poor extrapolation outside the training samples. More sophisticated tuning strategies, such as Bayesian optimization, genetic algorithms, or cross-validated grid search, could enhance model robustness, though at the cost of increased computational time. There is also no specific consideration of the noise measurement, operational uncertainty, or variability that one would expect in field results of the framework. The optimization results, therefore, reflect an idealized case that has been obtained from the synthetic inputs and not the noisy or incomplete datasets that are common in actual reservoir settings. Additionally, the current optimization is only focused on operational parameters, without integrating techno-economic parameters like capital and operating costs, net present value (NPV), or incentives for emissions credits. Economic and environmental uncertainty analysis coupled with multi-objective optimization should be incorporated into future work in order to create more comprehensive decision-support tools for use in this field. Despite these limitations, the study builds a solid foundation from which future development can be realized, especially by taking in field data, widening the training domain, adding uncertainty quantification, and improving physics-based constraints in the surrogate modeling process.

4 Conclusion

This work establishes a robust, data-driven framework for predicting and optimizing CO2-WAG performance by linking high-fidelity numerical simulation with advanced ML and metaheuristic optimization. Algorithms such as XGBoost, RF, KNN, SVR, LR, ANN, CNN, ANN-XGBoost, and CNN-XGBoost, were tested to predict CO2-WAG performance. The best-performing prediction model (proxy model) was then coupled with the PSO algorithm for integrated optimization. The dataset comprised 2,400 samples, having seven input parameters and three output parameters. The following conclusions were drawn from this study:

1. Both the CO2-WAG cycle size and ratio had a significant influenced oil production and CO2 storage. A cycle size of 6 months and a wag cycle ratio of 1:1 produced higher oil production and led to higher CO2 storage.

2. The injection rates (both water and gas) had a strong effect on oil production and CO2 storage. However, carefully design was required to increase the injection rates, any random increase led to early water or gas breakthrough.

3. Among the nine tested ML algorithms, the hybrid ANN model coupled with XGBoost (ANN-XGBoost) yielded the best prediction results with a high R2 score (0.99159, 0.97515, and 0.98706) and lower RMSE values (2.8 × 10−2, 1.5 × 10−1, and 2.4 × 10−2). The proxy model reproduced the full physics simulators outputs with negligible bias.

4. The proxy model (ANN-XGBoost) coupled with the PSO optimization algorithm yielded 12.8% higher cumulative oil production and 11% greater CO2 storage compared to the base WAG case.

5. The proposed optimization framework required only minutes to complete optimization. This offers engineers an alternate robust optimization workflow in comparison to conventional simulators.

These results confirm that the proposed ANN-XGBoost + PSO framework provides a practical, rapid, and reliable decision-support tool for CO₂-WAG operations. Its applications extend beyond CO2-WAG to other areas of the oil and gas sector, such as chemical injection processes. Furthermore, this workflow can also be used in other fields facing similar optimization challenges.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Author contributions

SU: Conceptualization, Formal analysis, Investigation, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing. LX: Conceptualization, Supervision, Validation, Writing – review & editing. GD: Writing – review & editing. TA-A: Funding acquisition, Writing – review & editing. ME: Writing – review & editing. AA: Writing – review & editing.

Funding

The author(s) declared that financial support was received for this work and/or its publication. This study had the support of national funds through Fundaçãopara a Ciência e Tecnologia, I. P. (FCT), under the projects UIDB. This study is also supported by the National Natural Science Foundation of China (Grant No. 52274048) and Beijing Natural Science Foundation (Grant No. 3222037). Open access funding provided by UiT-The Arctic University Norway.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that Generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fclim.2025.1710187/full#supplementary-material

References

Ahmadi, M. A., Zendehboudi, S., and James, L. A. (2018). Developing a robust proxy model of CO2 injection: coupling Box–Behnken design and a connectionist method. Fuel 215, 904–914. doi: 10.1016/j.fuel.2017.11.030

Crossref Full Text | Google Scholar

Al-Bayati, D., Saeedi, A., Myers, M., White, C., Xie, Q., and Clennell, B. (2018). Insight investigation of miscible SCCO2 water alternating gas (WAG) injection performance in heterogeneous sandstone reservoirs. J. CO2 Util. 28, 255–263. doi: 10.1016/j.jcou.2018.10.010

Crossref Full Text | Google Scholar

Al-Khdheeawi, E. A., Vialle, S., Barifcani, A., Sarmadivaleh, M., and Iglauer, S. (2018). Effect of wettability heterogeneity and reservoir temperature on CO2 storage efficiency in deep saline aquifers. Int. J. Greenhouse Gas Control 68, 216–229. doi: 10.1016/j.ijggc.2017.11.016

Crossref Full Text | Google Scholar

AlRassas, A. M., Al-Alimi, D., Zosseder, K., and Al-qaness, M. A. A. (2025). AI-driven predictive framework for CO2 sequestration and enhanced oil recovery: insights from a depleted oil reservoir. J. Clean. Prod. 519:146054. doi: 10.1016/j.jclepro.2025.146054

Crossref Full Text | Google Scholar

Alrassas, A. M., Vo Thanh, H., Ren, S., Sun, R., Al-Areeq, N. M., Kolawole, O., et al. (2022). CO2 sequestration and enhanced oil recovery via the water alternating gas scheme in a mixed transgressive sandstone-carbonate reservoir: case study of a large Middle East oilfield. Energy Fuel 36, 10299–10314. doi: 10.1021/acs.energyfuels.2c02185

Crossref Full Text | Google Scholar

Bachu, S. (2000). Sequestration of CO2 in geological media: criteria and approach for site selection in response to climate change. Energy Convers. Manag. 41, 953–970. doi: 10.1016/S0196-8904(99)00149-1

Crossref Full Text | Google Scholar

Bergen, K. J., Johnson, P. A., de Hoop, M. V., and Beroza, G. C. (2019). Machine learning for data-driven discovery in solid Earth geoscience. Science 363:eaau0323. doi: 10.1126/science.aau0323,

PubMed Abstract | Crossref Full Text | Google Scholar

Chen, T., and Guestrin, C. (2016). XGBoost Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 785–794

Google Scholar

Christensen, J. R., Stenby, E. H., and Skauge, A. (2001). Review of WAG field experience. SPE Reserv. Eval. Eng. 4, 97–106. doi: 10.2118/71203-pa

Crossref Full Text | Google Scholar

CMG (2022). CMG GEM user’s guide. Calgary, AB: Computer Modelling Group Ltd.

Google Scholar

Czernichowski-Lauriol, I., Berenblyum, R., Bigi, S., Car, M., Gastine, M., Persoglia, S., et al. (2018). CO2GeoNet actions in Europe for advancing CCUS through global cooperation. Energy Procedia 154, 73–79. doi: 10.1016/j.egypro.2018.11.013

Crossref Full Text | Google Scholar

Dai, Z., Viswanathan, H., Middleton, R., Pan, F., Ampomah, W., Yang, C., et al. (2016). CO2 accounting and risk analysis for CO2 sequestration at enhanced oil recovery sites. Environ. Sci. Technol. 50, 7546–7554. doi: 10.1021/acs.est.6b01744,

PubMed Abstract | Crossref Full Text | Google Scholar

Dai, Z., Viswanathan, H., Xiao, T., Middleton, R., Pan, F., Ampomah, W., et al. (2017). CO2 sequestration and enhanced oil recovery at depleted oil/gas reservoirs. Energy Procedia 114, 6957–6967. doi: 10.1016/j.egypro.2017.08.034

Crossref Full Text | Google Scholar

Dai, Z., Zhang, Y., Bielicki, J., Amooie, M. A., Zhang, M., Yang, C., et al. (2018). Heterogeneity-assisted carbon dioxide storage in marine sediments. Appl. Energy 225, 876–883. doi: 10.1016/j.apenergy.2018.05.038

Crossref Full Text | Google Scholar

Dziejarski, B., Krzyżyńska, R., and Andersson, K. (2023). Current status of carbon capture, utilization, and storage technologies in the global economy: a survey of technical assessment. Fuel 342:127776. doi: 10.1016/j.fuel.2023.127776

Crossref Full Text | Google Scholar

Gao, M., Liu, Z., Qian, S., Liu, W., Li, W., Yin, H., et al. (2023). Machine-learning-based approach to optimize CO2-WAG flooding in low permeability oil reservoirs. Energies 16:6149. doi: 10.3390/en16176149

Crossref Full Text | Google Scholar

Han, L., and Gu, Y. (2014). Optimization of miscible CO2 water-alternating-gas injection in the bakken formation. Energy Fuel 28, 6811–6819. doi: 10.1021/ef501547x

Crossref Full Text | Google Scholar

He, R., Ma, W., Ma, X., and Liu, Y. (2021). Modeling and optimizing for operation of CO2-EOR project based on machine learning methods and greedy algorithm. Energy Rep. 7, 3664–3677. doi: 10.1016/j.egyr.2021.05.067

Crossref Full Text | Google Scholar

Hill, T., Marquez, L., O’Connor, M., and Remus, W. (1994). Artificial neural network models for forecasting and decision making. Int. J. Forecast. 10, 5–15. doi: 10.1016/0169-2070(94)90045-0

Crossref Full Text | Google Scholar

Hsu, C.-W., Chen, L.-T., Hu, A. H., and Chang, Y.-M. (2012). Site selection for carbon dioxide geological storage using analytic network process. Sep. Purif. Technol. 94, 146–153. doi: 10.1016/j.seppur.2011.08.019

Crossref Full Text | Google Scholar

Jiao, S., Li, W., Li, Z., Gai, J., Zou, L., and Su, Y. (2024). Hybrid physics-machine learning models for predicting rate of penetration in the Halahatang oil field, Tarim Basin. Sci. Rep. 14:5957. doi: 10.1038/s41598-024-56640-y,

PubMed Abstract | Crossref Full Text | Google Scholar

Jin, L., Pekot, L. J., Smith, S. A., Salako, O., Peterson, K. J., Bosshart, N. W., et al. (2018). Effects of gas relative permeability hysteresis and solubility on associated CO2 storage performance. Int. J. Greenhouse Gas Control 75, 140–150. doi: 10.1016/j.ijggc.2018.06.002

Crossref Full Text | Google Scholar

Kalule, R., Abderrahmane, H. A., Alameri, W., and Sassi, M. (2023). Stacked ensemble machine learning for porosity and absolute permeability prediction of carbonate rock plugs. Sci. Rep. 13:9855. doi: 10.1038/s41598-023-36096-2,

PubMed Abstract | Crossref Full Text | Google Scholar

Kennedy, J., and Eberhart, R. (1995). Particle swarm optimization. Proceedings of ICNN’95—International Conference on Neural Networks. 1942–1948

Google Scholar

Khather, M., Yekeen, N., Al-Yaseri, A., Al-Mukainah, H., Giwelli, A., and Saeedi, A. (2022). The impact of wormhole generation in carbonate reservoirs on CO2-WAG oil recovery. J. Petroleum Sci. Eng. 212:110354. doi: 10.1016/j.petrol.2022.110354

Crossref Full Text | Google Scholar

Kolawole, O., Ispas, I., Kumar, M., Weber, J., Zhao, B., and Zanoni, G. (2021). How can biogeomechanical alterations in shales impact caprock integrity and CO2 storage? Fuel 291:120149. doi: 10.1016/j.fuel.2021.120149

Crossref Full Text | Google Scholar

Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 84–90. doi: 10.1145/3065386

Crossref Full Text | Google Scholar

Kurz, B. A., Heebink, L. V., Eylands, K. E., Smith, S. A., Hamling, J. A., Klapperich, R. J., et al. (2013). “Bell Creek test site—preinjection geochemical report” in Plains CO2 Reduction 765 (PCOR) Partnership Phase III Task 4—Deliverable D33, Prepared for National Energy 766 Technology Laboratory U.S. Department of Energy Cooperative Agreement No. DE-FC26-767 (Grand Forks, ND: Energy & Environmental Research Center).

Google Scholar

Lackner, K. S. (2003). A guide to CO2 sequestration. Science 300, 1677–1678. doi: 10.1126/science.1079033,

PubMed Abstract | Crossref Full Text | Google Scholar

Lake, L. W., Johns, R., Rossen, B., and Pope, G. (2014). Fundamentals of enhanced oil recovery. Richardson, TX: Society of Petroleum Engineers (SPE).

Google Scholar

Land, C. S. (1968). Calculation of imbibition relative permeability for two- and three-phase flow from rock properties. Soc. Pet. Eng. J. 8, 149–156. doi: 10.2118/1942-PA

Crossref Full Text | Google Scholar

Li, H., Gong, C., Liu, S., Xu, J., and Imani, G. (2022). Machine learning-assisted prediction of oil production and CO2 storage effect in CO2-water-alternating-gas injection (CO2-WAG). Appl. Sci. 12:10958. doi: 10.3390/app122110958

Crossref Full Text | Google Scholar

Li, F., Guo, X., Qi, X., Feng, B., Liu, J., Xie, Y., et al. (2025). A surrogate model-based optimization approach for geothermal well-doublet placement using a regularized LSTM-CNN model and grey wolf optimizer. Sustainability 17:266. doi: 10.3390/su17010266

Crossref Full Text | Google Scholar

Liu, M., Li, Z., Qi, J., Meng, Y., Zhou, J., Ni, M., et al. (2024). Prediction of CO2 storage in different geological conditions based on machine learning. Energy Fuel 38, 22340–22350. doi: 10.1021/acs.energyfuels.4c04274

Crossref Full Text | Google Scholar

Luo, J., Ma, X., Ji, Y., Li, X., Song, Z., and Lu, W. (2023). Review of machine learning-based surrogate models of groundwater contaminant modeling. Environ. Res. 238:117268. doi: 10.1016/j.envres.2023.117268,

PubMed Abstract | Crossref Full Text | Google Scholar

Mao, S., Chen, B., Malki, M., Chen, F., Morales, M., Ma, Z., et al. (2024). Efficient prediction of hydrogen storage performance in depleted gas reservoirs using machine learning. Appl. Energy 361:122914. doi: 10.1016/j.apenergy.2024.122914

Crossref Full Text | Google Scholar

Naghizadeh, A., Jafari, S., Norouzi-Apourvari, S., Schaffie, M., and Hemmati-Sarapardeh, A. (2024). Multi-objective optimization of water-alternating flue gas process using machine learning and nature-inspired algorithms in a real geological field. Energy 293:130413. doi: 10.1016/j.energy.2024.130413

Crossref Full Text | Google Scholar

Otmane, M., Imtiaz, S., Jaluta, A. M., and Aborig, A. (2025). Boosting reservoir prediction accuracy: a hybrid methodology combining traditional reservoir simulation and modern machine learning approaches. Energies 18:657. doi: 10.3390/en18030657

Crossref Full Text | Google Scholar

Rasmusson, K., Rasmusson, M., Tsang, Y., and Niemi, A. (2016). A simulation study of the effect of trapping model, geological heterogeneity and injection strategies on CO2 trapping. Int. J. Greenhouse Gas Control 52, 52–72. doi: 10.1016/j.ijggc.2016.06.020

Crossref Full Text | Google Scholar

Ren, D., Wang, X., Kou, Z., Wang, S., Wang, H., Wang, X., et al. (2023). Feasibility evaluation of CO2 EOR and storage in tight oil reservoirs: a demonstration project in the Ordos Basin. Fuel 331:125652. doi: 10.1016/j.fuel.2022.125652

Crossref Full Text | Google Scholar

Rodrigues, H., Mackay, E., Arnold, D., and Silva, D. (2019). Optimization of CO2-WAG and calcite scale management in pre-salt carbonate reservoirs. Offshore Technology Conference Brasil, (OTC)

Google Scholar

Sanchez, N. L. (1999). Management of water alternating gas (WAG) injection projects. Latin American and Caribbean Petroleum Engineering Conference, (SPE)

Google Scholar

Sen, D., Chen, H., and Datta-Gupta, A. (2022). Inter-well connectivity detection in CO2 WAG projects using statistical recurrent unit models. Fuel 311:122600. doi: 10.1016/j.fuel.2021.122600

Crossref Full Text | Google Scholar

Song, Y., Sung, W., Jang, Y., and Jung, W. (2020). Application of an artificial neural network in predicting the effectiveness of trapping mechanisms on CO2 sequestration in saline aquifers. Int. J. Greenhouse Gas Control 98:103042. doi: 10.1016/j.ijggc.2020.103042

Crossref Full Text | Google Scholar

Spiteri, E. J., Juanes, R., Blunt, M. J., and Orr, F. M. (2005). Relative permeability hysteresis: trapping models and application to geological CO2 sequestration. SPE Annual Technical Conference and Exhibition, (SPE)

Google Scholar

Sun, X., Liu, J., Dai, X., Wang, X., Yapanto, L. M., and Zekiy, A. O. (2021). On the application of surfactant and water alternating gas (SAG/WAG) injection to improve oil recovery in tight reservoirs. Energy Rep. 7, 2452–2459. doi: 10.1016/j.egyr.2021.04.034

Crossref Full Text | Google Scholar

Talebkeikhah, M., Sadeghtabaghi, Z., and Shabani, M. (2021). A comparison of machine learning approaches for prediction of permeability using well log data in the hydrocarbon reservoirs. J. Hum. Earth Future 2, 82–99. doi: 10.28991/HEF-2021-02-02-01

Crossref Full Text | Google Scholar

Ud Din, S., Guo, D., Ning, F., and Xue, L. (2022). Natural gas hydrate production methods: a review. J. Appl. Emerg. Sci. doi: 10.36785/jaes.121529

Crossref Full Text | Google Scholar

Ud Din, S., Liang, X., Dongdong, G., and Fulong, N. (2025). Inhibition effect on gas hydrate formation in various kinetic hydrate inhibitor systems. J. Porous Media. doi: 10.1615/JPorMedia.2025057554

Crossref Full Text | Google Scholar

Ud Din, S., Wimalasiri, R., Ehsan, M., Liang, X., Ning, F., Guo, D., et al. (2023). Assessing public perception and willingness to pay for renewable energy in Pakistan through the theory of planned behavior. Front. Energy Res. 11:1088297. doi: 10.3389/fenrg.2023.1088297

Crossref Full Text | Google Scholar

Vaziri, P., and Sedaee, B. (2023). A machine learning-based approach to the multiobjective optimization of CO2 injection and water production during CCS in a saline aquifer based on field data. Energy Sci. Eng. 11, 1671–1687. doi: 10.1002/ese3.1412

Crossref Full Text | Google Scholar

Vo Thanh, H., Sugai, Y., Nguele, R., and Sasaki, K. (2020). Robust optimization of CO2 sequestration through a water alternating gas process under geological uncertainties in Cuu Long Basin, Vietnam. J. Nat. Gas Sci. Eng. 76:103208. doi: 10.1016/j.jngse.2020.103208

Crossref Full Text | Google Scholar

Wang, F., Ping, S., Yuan, Y., Sun, Z., Tian, H., and Yang, Z. (2021). Effects of the mechanical response of low-permeability sandstone reservoirs on CO2 geological storage based on laboratory experiments and numerical simulations. Sci. Total Environ. 796:149066. doi: 10.1016/j.scitotenv.2021.149066,

PubMed Abstract | Crossref Full Text | Google Scholar

Wu, M., Xu, J., Hu, P., Lu, Q., Xu, P., Chen, H., et al. (2022). An adaptive surrogate-assisted simulation-optimization method for identifying release history of groundwater contaminant sources. Water 14:1659. doi: 10.3390/w14101659

Crossref Full Text | Google Scholar

Xu, Y., Zhao, X., Chen, Y., and Yang, Z. (2019). Research on a mixed gas classification algorithm based on extreme random tree. Appl. Sci. 9:1728. doi: 10.3390/app9091728

Crossref Full Text | Google Scholar

Xue, L., Li, D., and Dou, H. (2023a). Artificial intelligence methods for oil and gas reservoir development: current progresses and perspectives. Adv. Geo-Energy Res. 10, 65–70. doi: 10.46690/ager.2023.10.07

Crossref Full Text | Google Scholar

Xue, L., Wang, J., Han, J., Yang, M., Mwasmwasa, M., and Nanguka, F. (2023b). Gas well performance prediction using deep learning jointly driven by decline curve analysis model and production data. Adv. Geo-Energy Res. 8, 159–169. doi: 10.46690/ager.2023.06.03

Crossref Full Text | Google Scholar

Xue, L., Xu, S., Nie, J., Qin, J., Han, J. X., Liu, Y. T., et al. (2024). An efficient data-driven global sensitivity analysis method of shale gas production through convolutional neural network. Pet. Sci. 21, 2475–2484. doi: 10.1016/j.petsci.2024.02.010

Crossref Full Text | Google Scholar

Xue, Z., Zhang, K., Zhang, C., Ma, H., and Chen, Z. (2023). Comparative data-driven enhanced geothermal systems forecasting models: a case study of Qiabuqia field in China. Energy 280:128255. doi: 10.1016/j.energy.2023.128255

Crossref Full Text | Google Scholar

Xue, L., Zhu, Y., Ren, J., Liao, H., Dai, Q., and Tu, B. (2025). Coupled optimization method for CO2-EOR and storage based on machine learning. J. Porous Media 28, 37–53. doi: 10.1615/JPorMedia.2024052865

Crossref Full Text | Google Scholar

Yao, P., Yu, Z., Zhang, Y., and Xu, T. (2023). Application of machine learning in carbon capture and storage: an in-depth insight from the perspective of geoscience. Fuel 333:126296. doi: 10.1016/j.fuel.2022.126296

Crossref Full Text | Google Scholar

You, J., Ampomah, W., and Sun, Q. (2020). Co-optimizing water-alternating-carbon dioxide injection projects using a machine learning assisted computational framework. Appl. Energy 279:115695. doi: 10.1016/j.apenergy.2020.115695

Crossref Full Text | Google Scholar

Zhong, Z., Liu, S., Carr, T. R., Takbiri-Borujeni, A., Kazemi, M., and Fu, Q. (2019). Numerical simulation of water-alternating-gas process for optimizing EOR and carbon storage. Energy Procedia 158, 6079–6086. doi: 10.1016/j.egypro.2019.01.507

Crossref Full Text | Google Scholar

Zubarev, D. I. (2009). Pros and cons of applying proxy-models as a substitute for full reservoir simulations. SPE Annual Technical Conference and Exhibition, (SPE).

Google Scholar

Keywords: ANN, CCUS, CO2-WAG, CO2-WAG parameters, machine learning, XGBoost

Citation: Ud Din S, Xue L, Dongdong G, Abu-Alam T, Ehsan M and Ahmed AA (2026) A robust deep learning framework for predicting carbon dioxide-water alternating gas injection performance and optimization. Front. Clim. 7:1710187. doi: 10.3389/fclim.2025.1710187

Received: 29 September 2025; Revised: 12 December 2025; Accepted: 30 December 2025;
Published: 15 January 2026.

Edited by:

Yanhui Han, Houston Research Center, United States

Reviewed by:

Eslam Gomaa Al-Sakkari, Polytechnique Montréal, Canada
Mostafa Saghafi, Bruno Kessler Foundation (FBK), Italy

Copyright © 2026 Ud Din, Xue, Dongdong, Abu-Alam, Ehsan and Ahmed. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Shahab Ud Din, ZW5nci5zaGFoYWIucGdAZ21haWwuY29t; Liang Xue, eHVlbGlhbmdAY3VwLmVkdS5jbg==; Tamer Abu-Alam, dGFtZXIuYWJ1LWFsYW1AdWl0Lm5v

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.