- 1State Key Laboratory of Petroleum Resources and Engineering, China University of Petroleum (Beijing), Beijing, China
- 2Department of Oil-Gas Field Development Engineering, College of Petroleum Engineering, China University of Petroleum (Beijing), Beijing, China
- 3School of Earth and Environment, Anhui University of Science and Technology, Huainan, China
- 4The Faculty of Biosciences, Fisheries and Economics, UiT The Arctic University of Norway, Tromsø, Norway
- 5OSEAN—Outermost Regions Sustainable Ecosystem for Entrepreneurship and Innovation, University of Madeira Colégio dos Jesuítas, Funchal, Portugal
- 6Department of Earth and Environmental Sciences, Bahria University, Islamabad, Pakistan
- 7Department of Petroleum and Natural Gas Engineering, Faculty of Engineering, University of Khartoum, Khartoum, Sudan
Carbon dioxide (CO2) emissions pose a major environmental concern, and various methods are used for CO2 sequestration. CO2-water activating gas (CO2-WAG) injection is a technique used to increase production of oil and sequester CO2 in subsurface formations. However, the performance of the CO2-WAG project depends on various parameters, such as injection rates, cycle size, and ratio, that traditionally require numerous computationally expensive simulations. The study introduces a robust machine learning workflow for CO2-WAG performance prediction and optimization by using a model calibrated using Bell Creek formation properties. Machine learning models are based on algorithms like extreme gradient boosting (XGBoost), linear regression (LR), random forest (RF), k-nearest neighbor (KNN), support vector regression (SVR), artificial neural network (ANN), convolutional neural network (CNN), and hybrid models such as ANN and CNN coupled with XGBoost (ANN-XGBoost, and CNN-XGBoost) to predict CO2-WAG performance. A dataset of 2,400 samples was generated using the CMG-GEM numerical simulator, incorporating seven input parameters (e.g., injection rate, CO2-WAG cycle size, and WAG ratio) and three output parameters, with 80% of the dataset allocated for training and 20% for validation and testing. Among the proposed models, the hybrid model ANN-XGBoost demonstrated superior performance, accurately predicting total oil production, CO2 storage, and efficiency, with high R2 scores of 0.99159, 0.97515, and 0.98706, and corresponding lower RMSE values of 2.8 × 10−2, 1.5 × 10−1, and 2.4 × 10−2. Coupling the proxy with particle swarm optimization (PSO) yielded 12.8% increase in cumulative oil production and 11% increase in CO2 storage. Furthermore, in terms of speed, the projected workflow requires less minutes to complete predictions and optimization, while traditional numerical simulators require 4–5 min per scenario. These findings validates the robustness and computational efficiency of the proposed machine learning workflow for predicting CO2-WAG performance and optimization.
1 Introduction
Fossil fuels, particularly oil, remain the primary source of energy, with demand continuing to rise due to continuous development and urbanization. However, fossil fuels are considered as major contributors to CO2 emissions, as their combustion releases CO2 (Hsu et al., 2012; You et al., 2020; Gao et al., 2023). Although extensive research is ongoing to identify alternative energy sources, no alternative energy source to replace fossil fuels has yet emerged, and fossil fuels are expected to remain as the major energy source in the forthcoming decades (Ud Din et al., 2022; Ud Din et al., 2023, 2025; Naghizadeh et al., 2024). To manage these growing challenges, carbon capture, storage, and utilization (CCUS) is considered a vital technology for modifying climate (Czernichowski-Lauriol et al., 2018; Kolawole et al., 2021; Dziejarski et al., 2023) change by sequestering CO2 in geological formations (Lackner, 2003; Ren et al., 2023), such as coal beds (Bachu, 2000), saline formations (Wang et al., 2021), and depletion of oil and gas reservoirs (Dai et al., 2016, 2018). CO2 injection into reservoirs is an important technology that has been used in the oil and gas industry for decades to increase oil recovery (Lake et al., 2014).
CO2-WAG injection is well established enhanced oil-recovery (EOR) technique, showing high sweep efficiency and reducing the risk of gas channeling, viscous fingering (Yao et al., 2023), and gravity segregation. More specifically, this technique increases oil production and sequesters CO2 in underground formations (Spiteri et al., 2005; Rasmusson et al., 2016; Al-Khdheeawi et al., 2018; Zhong et al., 2019), Figure 1 illustrates a schematic diagram of the CO2-WAG technique and its role in CCUS. The technique was first employed in 1957 by Mobil Corporation in a Canadian sandstone reservoir. This method effectively addressed the early CO2 breakthrough and enhanced the hindrance to gas flow. In addition, CO2-WAG enhances the mobility ratio and decreases water flow hindrance. Survey data indicate that in the United States, approximately 80% of the oilfields have adopted WAG, with consistently positive outcomes. This widespread application underscores its effectiveness in boosting oilfield development and production rates (Sanchez, 1999). Further, Christensen et al. (2001) studied 59 fields employed with WAG, and reported an average 10% increase in oil recovery. This evidence underscores the positive influence of WAG technology in augmenting oilfield production. Several prior studies have further investigated the core-scale heterogeneity in determining the efficiency of oil production through CO2-WAG injection. The results revealed that CO2-WAG injection exhibited superior performance in cases involving homogeneous, layered, and composite samples (Al-Bayati et al., 2018). Han and Gu (2014) evaluated optimized miscible injection through nine core-flooded experiments and demonstrated that optimized miscible injection outperformed other methods; notably, a smaller WAG slug (1:1 ratio) yielded higher recovery factors. Sun et al. (2021) investigated CO2 phase migration through porous media during WAG injection and observed a 46% increase in recovery factor (RF). The results also showed that the gas-to-water injection ratio played a pivotal role in influencing the efficacy of WAG injections, as highlighted in prior studies. Khather et al. (2022) explored the influence of CO2 and carbonate interactions on oil and gas recovery. This study showed a series of experiments utilizing three distinct carbonate rock core samples characterized by heterogeneous properties, such as different oil saturation and permeability values, including both low and moderate values. The results indicated that CO2-WAG injection following water flooding led to a remarkable surge in the RF, exceeding 30% for all the core samples. Ren et al. (2023) examined the use of CO2-EOR and its subsequent CO2 sequestration in the fields located in the Ordos Basin, China, by employing continuous CO2 and WAG injection techniques. This study showed that simultaneous injection of equal amounts of CO2 and WAG substantially improved crude oil production in the examined oilfields. Alrassas et al. (2022) studied CO2-WAG injection in a Yemeni interbedded reservoir and compared with continuous CO2 injection. The study demonstrated that CO2-WAG provided superior reservoir performance, enhancing both oil recovery and CO2 storage, largely due to its higher Kv/Kh ratios. Moreover, the 2:1 WAG ratio outperformed the 1:1 ratio, underscoring the high importance of optimizing injection strategies in interbedded reservoirs to maximize oil recovery and CO2 storage efficiency.
Currently, the optimization of CO2-EOR technology remains a central focus across numerous petroleum engineering fields. Dai et al. (2017) applied Monte Carlo simulations to quantify uncertainty in CO2 storage potential within an active EOR project located at the Morrow Reservoir, Farnsworth Unit, Texas. Vo Thanh et al. (2020) employed the WAG process to improve robust optimization of CO2 storage potential in a Vietnamese field. These findings showed that WAG injection effectively improved CO2 sequestration compared with continuous gas injection. Specifically, the nominal and optimal optimization scenarios improved CO2 sequestration by 13 and 15%, respectively, compared to the baseline WAG case. Rodrigues et al. (2019) conducted an optimization study using computer modeling group (CMG) simulator in a Brazilian offshore field and a strategy for designing CO2-WAG operations in carbonate reservoirs, emphasizing the considerations of economic feasibility, efficiency in recycling CO2, and assessment of project risks.
Conventional approaches to parameter optimization are often cumbersome and labor-intensive, while neglecting complex nonlinear interactions and the fundamental variables that drive them. These methods typically rely mostly on distinct models and algorithms that restrict their adaptability to varying oilfield conditions and fluctuations. They possess certain limitations and lack flexibility. Consequently, incorporating more sophisticated optimization methods, such as ML and metaheuristic algorithms, offers a superior solution for handling the intricacies and uncertainties associated with oil fields, and also improved optimization efficiency and precision (He et al., 2021; Li et al., 2022; Sen et al., 2022; Xue et al., 2023a, 2023b, 2024). Presently, notable progress has been made in the field of petroleum exploration and production owing to rapid advancements in intelligent algorithms, particularly in the domain of ML. Nevertheless, ML frameworks have become more and more popular in the context of CO2-EOR and CO2-WAG, there are still certain limitations of current research. Furthermore, several investigations typically concentrate either on oil recovery or CO2 storage, without the simultaneous capturing of the two processes, limiting their applicability to WAG decision-making (Gao et al., 2023; Li et al., 2022). Numerous observations based on operational or geological inputs limited enough to model the main WAG behaviors of cycle size, cycle ratio, pressures of injection and impacts of three-phase hysteresis (Naghizadeh et al., 2024; You et al., 2020). Moreover, prior ML studies regularly use a few instances of simulation to narrow the breadth of the training domain and limit the ability of the model to acquire complex nonlinear connections. Liu et al. (2024) made predications of CO2 storage under different subsurface environments with RF and XGBoost, only having 184 datasets; Ahmadi et al. (2018) created an LSSVM-based proxy using Box–Behnken experimental design, which constrained input combinations. Another recently developed deep-learning model utilizing CNN and MLNN was used to forecast CO2 solubility trapping and recovery of oil based on 814 test cases (AlRassas et al., 2025). This research was limited to prediction, though it did not incorporate an optimization element. In a broader sense, the current ML and DL research lacks explicit consideration of the WAG-specific physics like dynamics of CO2 trapping (Sen et al., 2022; Song et al., 2020), as well as it does not consider operational optimization, which constrains its use in the context of real-time design and decision-making (Vaziri and Sedaee, 2023).
Recent studies have shown that hybrid architectures are not only based on neural networks and boosted trees can significantly outperform individual algorithms in reservoir engineering by being much better able to represent nonlinear interaction without overfitting (Jiao et al., 2024; Otmane et al., 2025; Vaziri and Sedaee, 2023). Continuing on these contributions, the current study constructs a hybrid ANN-XGBoost surrogate model that has been trained using a large and systematically produced synthetic dataset, which covers a large variety of WAG working conditions. The suggested framework, in contrast to the previous literature, which only focuses on prediction, combines prediction with multi-objective optimization through the integration of the surrogate model with the particle swarm optimization (PSO). This allows recognizing operational strategies that optimize the recovery of oil and store CO2 simultaneously in order to overcome the shortcomings of prior ML research and provide a complete, effective workflow to design and optimize CO2-WAG designs.
This study thoroughly analyzed the various effects of CO2-WAG parameters, which include injection rates, cycle sizes, and ratios, on cumulative oil production and CO2 stored using a model based on Bell Creek formation characteristics. Furthermore, the study proposed a robust ML prediction workflow for CO2-WAG technique performance prediction by employing various ML algorithms such as extreme gradient boosting (XGBoost), linear regression (LR), random forest (RF), k-nearest neighbor (KNN), support vector regression (SVR), artificial neural network (ANN), and convolutional neural network (CNN). Additionally, the study compares these algorithms with hybrid algorithms, such as ANN coupled with XGBoost (ANN-XGBoost), and CNN coupled with XGBoost (CNN-XGBoost). The best prediction model will then be coupled with particle swarm optimization (algorithm) to perform the optimization study. The results provide insights into the factors affecting CO2-WAG performance and a robust ML workflow for CO2-WAG performance prediction and optimization. The findings of the study provide engineers and decision makers a robust optimization and decision-making framework.
2 Methodology
2.1 Base model
The study used CMG-GEM (CMG, 2022) to construct a base model reflecting the Bell Creek formation characteristics. The developed model was designed with a five-spot well pattern comprising a central production well and four injection wells located at the corners (Jin et al., 2018). The model consists of 59 × 59 × 6 = 20,886 grid blocks. Furthermore, to mimic the original reservoir conditions, the model was designed with a dip angle of 1°, consistent with the Bell Creek formation, as shown in Figure 2.
Table 1 listed all the parameters used in developing the base model. The reservoir is located at a depth of 1,342 m with initial reservoir pressure of 2,400 psi at 41.7 °C. The reservoir conditions put the model above the minimum miscibility pressure of 1,411 psi, which is really important in a CO2-WAG project, to keep the injection pressure above the minimum miscibility pressure (MMP), thus making the CO2 miscible with the oil at reservoir conditions. The reservoir has dual porosity and permeability; the upper part is made of siltstone, while the lower part is of sandstone (Kurz et al., 2013). Where the vertical permeability kv is 0.1 times the horizontal permeability. Furthermore, the model realizes all the trapping mechanisms involved in a CO2 storage process, such as structural trapping, solubility trapping, residual trapping, and mineralization. To apply the residual trapping, the model that was applied in CMG-GEM in-built linear phase permeability hysteresis model was used (Land, 1968). However, the mineralization effect is negligible as it requires long periods to take place. Table 2 listed all the components of the fluid model used in the base model, where Peng Robinson EOS was employed for the fluid model tuning; all components are lumped to seven pseudo components by the lumping technique, utilizing WinProp.
2.2 Database generation
Conducting a ML-based study required a huge amount of high-quality time series data. In the absence of real field data, numerical simulations were used to produce a dataset for a relevant ML study. The dataset includes seven CO2-WAG operational parameters, such as bottom hole pressure injector (water and gas), bottom hole pressure production, WAG cycle size, WAG cycle ratio, gas injection rate, and water injection rate as inputs, and three output parameters, such as cumulative oil production, cumulative CO2 storage, and CO2 storage efficiency. The input parameters maximum and minimum ranges are listed in the Table 3. Then, a specialized Python script (written by our research group to generate input data files for CMG GEM) was used to randomly generate a set of 2,400 input data files for the CMG-GEM within the ranges given in Table 3 (while keeping all other parameters the same).
The script then automatically runs numerical simulations number-by-number using the created data files in the CMG-GEM. Upon completion of the simulations, cumulative oil production and total CO2 stored (used for CO2 storage efficiency calculations) data were extracted from the output files and were used as output parameters to complete the dataset for the ML study. The output parameters and the proposed objective functions can be calculated.
2.2.1 Cumulative oil production
The dimensionless cumulative oil production can be calculated as given in Equation 1:
2.2.2 Total CO2 stored
The second objective function in the dataset is total CO2 stored, which can be calculated by Equation 2:
2.2.3 Storage efficiency
CO2 storage efficiency is considered as the third objective function; however, it is not included in the optimization stage and is only considered during prediction, and is calculated by Equation 3:
2.2.4 Coupled optimization
For coupled optimization we consider cumulative oil production and CO2 stored as these are the two main objectives to optimize and can be mathematically expressed by Equation 4:
2.3 Proxy model development
Advanced computational technologies have broadened the scope of reservoir modeling. However, limited computational resources still pose challenges in terms of uncertainty quantification and optimization workflows. To address this challenge, computationally efficient proxy models can be used. Proxy models, also known as surrogate models, are mathematical or statistical models that mimic complex simulation models by using predetermined input parameters (Zubarev, 2009; Xue et al., 2023a, 2025). These models are used as a potential substitute for complex models, ensuring computational efficiency and significantly reducing the computation cost and time. Various algorithms have been employed for proxy model development in the oil and gas sector. This study employed various algorithms, such as XGBoost, RF, ANN, CNN, KNN, SVR, LR, ANN-XGBoost, and CNN-XGBoost, for proxy model development to accomplish the prediction study, as shown in Figure 3.
Figure 3. Flowchart displays the detailed workflow of the study. Machine learning algorithms that include linear regression (LR), support vector regression (SVR), random forest (RF), k-nearest neighbors (KNN), extreme gradient boosting (XGBoost), artificial neural networks (ANN), convolution neural networks (CNN), and hybrid models (ANN-XGBoost and CNN-XGBoost) were used for predicting cumulative oil production, total CO2 stored, and storage efficiency.
Three proxy models were developed for cumulative oil production, total CO2 stored, and storage efficiency for a period of 20 years, a dataset of 2,400 samples that included four input parameters (including injection rate, CO2-WAG cycle size, and ratio) and three output parameters (cumulative oil production, total CO2 stored, and storage efficiency). The dataset was thoroughly analyzed for outliers before training. The dataset was then ready for training by splitting it into training (80%), testing (10%), and validation (10%) datasets. The datasets were normalized by feature scaling to ensure consistency (0–1). After all these steps, the dataset was set ready for training the models to develop the respective proxy models, and Figure 4 shows the specific training process employed.
The employed ML models performance can be evaluated by plotting the cross plots between the actual and predicted values, such as the R2 plots and RMSE calculations. Furthermore, relative errors and residual plots were used to validate the models. These parameters can be calculated by Equations 5−7:
2.4 Machine learning algorithms
The study applied various ML models based on algorithms, like XGBoost, Random Forest, SVR, KNN, LR, ANN, and CNN. The following section explains ANN and XGBoost algorithms in detail, where the other algorithms (ER, KNN, SVR, LR, and CNN) employed in this study are provided in Supplementary material.
2.4.1 Artificial neural network
ANN is a computational model that mimics the human neurological system. ANN comprises of neurons, also called nodes, which are organized into layers, weights, bias, and activation functions, as shown in Figure 5. The basic structure is comprised of several layers, each containing several nodes (neurons). The input layer receives the input dataset, hidden layers (single or multiple) process the information, and output layer processes the results (Hill et al., 1994). The connections between neurons carry weights that adjust during the training process and detect the influence of neurons on one another. In the training process, forward and back propagation were used, where the adjusting weights minimized the error between the actual and predicted values (Song et al., 2020). Activation factors, such as ReLU, enable the model to learn from non-linear complex reservoir models (as listed in Table 4) (Bergen et al., 2019). The advantage of ANN models is that, unlike CNN models, they do not require large datasets for training. Further, ANN models have been successfully employed in various fields, including oil and gas.
2.4.2 XGBoost algorithm
XGBoost algorithm is a form of the gradient boosting decision tree (GBDT) algorithm, (Xu et al., 2019) which is a scalable and efficient tree-boosting system introduced by Chen and Guestrin (2016). It has found immense popularity in classification and regression problems because it is computationally efficient and highly accurate for making predictions. Based on the principles of GBDT, XGBoost provides a number of main optimizations to improve performance. It is worth noting that it uses a second-order Taylor approximation of the loss function, which leads to a higher precision for optimization than the first-order gradient-based optimization methods. Moreover, the algorithm includes L1 and L2 regularization terms in the objective function to avoid overfitting (as listed in Table 4) (Krizhevsky et al., 2017) and foster the generalization and sparsity of the model. Besides the benefits of computational efficiency, XGBoost also makes use of block-based storage to optimize sequential access to memory, as well as to enable parallel computing, which considerably increases the training speed. These improvements combine to make XGBoost a powerful and scalable tool capable of handling large-scale uses in ML applications. In Figure 6 a schematic diagram of the XGBoost algorithm is shown.
2.5 Optimization algorithm
2.5.1 Particle swarm optimization
PSO is a population-based stochastic optimization algorithm first described by Kennedy and Eberhart (1995) with the idea inspired by the social behaviors of bird flocks and fish schools. In PSO, particles are a set of candidate solutions that search the search space by changing their position and velocity through time with both personal and swarm experience. Every particle modifies its course depending on two primary elements such as the optimal place it has ever reached by itself (personal best), and the optimal place reached by any of the swarm of particles (global best). This collaboration enables the swarm to be drawn to optimum or near-optimal solutions in complex and multidimensional spaces. The velocity of the particle is corrected by the following Equation 8:
PSO is specifically adapted to optimization by its conceptual simplicity, computer efficiency, and minimal parameter tuning. This study used a simple variant of PSO when all the particles become informed on a global scale by a fully connected topology. We set the inertia weight and acceleration coefficients . Swarm size was set to 100 particles (relatively cheap to compute) and provides sufficient exploration.
3 Results and discussion
3.1 Base case analysis
Figure 7 shows the base WAG cumulative oil production and cumulative CO2 storage. The base WAG gas injection rate is 4.53 × 104 m3/day, water injection rate 1.9 × 103 m3/day, other operational parameters are listed in Table 5. Yielded a cumulative oil production of 1.1 × 106 bbl and CO2 storage of 4.53 × 105 tons. The following section considers how the water injection rate, gas injection rate, WAG cycle size, and WAG cycle ratios affect cumulative oil production, total CO2 stored, and storage efficiency.
Figure 7. Base WAG cumulative oil production and CO2 storage plots: (a) cumulative oil production and (b) cumulative CO2 storage.
3.1.1 Effect of WAG cycle size and ratio
Figures 8a–c depict the effect of WAG cycle size on cumulative oil production, total CO2 stored, and storage efficiency. This study used various WAG cycle sizes, such as 1, 3, 6, 9, 12, and 18 months. From the results, the cumulative oil production was quite stable in all WAG cycle sizes, although its increase was slight at 6 months, possibly indicating an optimum proportion between CO2 and water slug to manage the activation of mobility. The amount of CO2 stored also remained comparatively stable, although a slight increase was observed as the WAG cycles became longer, perhaps because of the long retention time. The storage efficiency is not highly dependent on the WAG cycle size, but the variability is lower for longer WAG cycle durations, implying that trapping mechanisms are more predictable with longer cycle durations.
Figure 8. Effect of WAG-cycle size and ratio on the objective functions. Effect of WAG cycle size on objective functions (a) cumulative oil production, (b) total CO2 stored, and (c) storage efficiency. Effect of WAG cycle ratio on objective functions (d) cumulative oil production, (e) total CO2 stored, and (f) storage efficiency.
Furthermore, the WAG cycle ratio impact on cumulative oil production, total CO2 stored, and storage efficiency is also investigated, as shown in Figures 8d–f. For thorough analysis, the study utilized various WAG cycle ratios such as 1.1, 1.2, 1.3, 2.1, and 3.1. The results show that cumulative oil production is highest at a ratio of 1:2, where the injection of CO2 takes precedence. This is an indication of the ability of CO2 to mobilize oil better than water. Storage is more efficient because more CO2 is stored as the cycle ratio of CO2 increases over time, reflecting a direct correlation between the impact of the fluid injection composition and its storage operation. Interestingly, at moderate CO2 to water ratios (i.e., 1:2 and 1:1), the storage efficiency is optimized, meaning that there is sufficient CO2 injection and adequate trapping enhanced by water slugs. At higher WAG cycle ratios (2.1, 3.1), the CO2 storage efficiency decreased, probably because of the CO2 breakthrough and less or no solubility trapping.
3.1.2 Effect of water and gas injection rate
Figures 9a–c analyzes how the water injection rate influences cumulative oil production, total CO2 stored, and storage efficiency. The oil recovery is enhanced when the water injection rate is increased, and it demonstrates a reinforcement in the displacement of the oil in the reservoir. However, after a certain level, further increases in water injection rate will have very low or no effect on oil production, as in this case, water injection rates above 2,000 m3/day and further increases in water injection rate can lead to an early water breakthrough. The study objective is to investigate water effect on CO2 storage, by carefully examining Figure 9b increase of water above 250 m3 will negatively affect CO2 storage; further increase of water beyond it will significantly decrease CO2 storage. The CO2 retained remains relatively constant at higher and lower water injection rates, which means the water injection activity has little effect on the quantity of CO2 that is retained, and that is perhaps not surprising given the low degrees of miscibility or interactions between water and CO2 plumes, which are likely to occur. On the other hand, storage efficiency is realized to be decreasing significantly with the water injection rate. This tendency can be attributed to the fact that a larger pore space becomes filled with water, thus leaving less space to trap CO2. There is less CO2 that can be trapped by structural, residual, or solubility trapping when the pore network is filled with more water.
Figure 9. Effect of injection rates on objective functions. Effect of water injection rate on objective functions (a) cumulative oil production, (b) total CO2 stored, and (c) CO2 retention efficiency. Effect of gas injection rate on objective functions, (d) cumulative oil production, (e) total CO2 stored, and (f) storage efficiency.
Figures 9d–f display the effect of the CO2 injection rates on cumulative oil production, CO2 stored, and storage efficiency, respectively. The total oil production increased as the CO2 injection rate increased from 10,000 to 90,000 m3/day. The higher the CO2 injection rate, the better the oil mobility, probably because of better miscibility and support of the reservoir. Similarly, the total CO2 stored increased with increasing injection rate. A further increase in the injection rate from 90,000 to 100,000 m3/day does not indicate any significant increase in the cumulative CO2 storage; this may be due to premature breakthrough or reduced trapping efficiency via over-injection. Interestingly, with the injection rate, there was a negative relationship with storage efficiency. Although a greater amount of CO2 is trapped at higher rates, the volume ratio of effectively trapped CO2 decreases, relating to the poor mobility ratios and increased CO2 migration rates.
3.2 Machine learning model’s performance
The results of the ML model were compared with CMG-GEM. For this purpose, a dataset of 2,400 simulations were compared after conducting simulations. The dataset used for the ML study included input parameters, which included injection rates (water and CO2), WAG cycle size, and ratios, where the output parameters included cumulative oil production, total CO2 stored, and storage efficiency. The dataset was split into 80% training, 10% validation, and 10% testing. The simulation results were used as the actual values and compared with the predictions of the ML models. Figures 10–12 depicts the actual versus predicted plots for the three desired output variables. Among all the single models used in XGBoost, it performed better than all models; however, when coupled with an ANN model, the resulting hybrid model outperformed all single (XGBoost, RF, ANN, CNN, SVR, KNN, and LR) and hybrid (ANN-XGBoost and CNN-XGBoost) machine learning models. This is evident from the R2 plots shown in Figure 13b, and the relevant values of the optimal models are listed in Table 6, and all other model values are listed in Supplementary Tables S1–S3. The results show that the predicted values are closely aligned with the actual values and cluster around the actual versus the predicted curve. Furthermore, Figure 13a illustrates the average absolute relative error plots (AARE) of all the models, which validates the superiority of the ANN-XGBoost model over the other models, with the lowest AARE values.
Figure 10. CO2 cumulative oil production actual vs. predicted values plots. (a) ANN-XGBoost, (b) XGBoost, (c) CNN-XGBoost, (d) RF, (e) ANN, (f) CNN, (g) KNN, (h) SVR, and (i) LR.
Figure 11. Total CO2 stored, actual vs. predicted values plots. (a) ANN-XGBoost, (b) CNN-XGBoost, (c) XGBoost, (d) RF, (e) ANN, (f) CNN, (g) KNN, (h) SVR, and (i) LR.
Figure 12. CO2 storage efficiency actual vs. predicted values plots. (a) ANN-XGBoost, (b) XGBoost, (c) CNN-XGBoost, (d) RF, (e) ANN, (f) CNN, (g) KNN, (h) SVR, and (i) LR.
Table 6. ANN-XGBoost model statistical analysis for cumulative oil production, total CO2 stored, and storage efficiency.
Furthermore, to check the performance of the respective models, this study also considered relative deviation plots. Actually, the relative deviation plots are important to see how the model behaves and show any deviation of the predicted values from the actual values; therefore, the higher the deviation, the less accurate the model will be, and the lower the deviation, the higher the accuracy of the respective model. In Figures 14–16 the relative deviation plots for all models for all three output variables were shown. These plots were used to determine whether the model was under-predicting or over-predicting. Based on the results of all the models, the ANN-XGBoost model shows the least deviation from the actual values, as in the case of oil production. The majority of the values are clustered around the zero line and have a slight deviation of 0.045 ± 0.155. Similarly, in total CO2 stored, the majority of the values are clustered around the zero line with a deviation of −0.05 ± 2.55. The same trend was observed in the case of CO2 storage efficiency, with a slight deviation of −0.39 ± 0.78. All the results show that the proposed model resulted in minimal deviation error and provides balanced performance.
Figure 14. Cumulative oil production, relative deviation plots. (a) ANN-XGBoost, (b) CNN-XGBoost, (c) XGBoost, (d) RF, (e) ANN, (f) CNN, (g) KNN, (h) SVR, and (i) LR.
Figure 15. Total CO2 stored, relative deviation plots. (a) ANN-XGBoost, (b) CNN-XGBoost, (c) XGBoost, (d) RF, (e) ANN, (f) CNN, (g) KNN, (h) SVR, and (i) LR.
Figure 16. CO2 storage efficiency, relative deviation plots. (a) ANN-XGBoost, (b) CNN-XGBoost, (c) XGBoost, (d) RF, (e) ANN, (f) CNN, (g) KNN, (h) SVR, (i) LR.
Figures 17–19 represents the residual plots of all models utilized in the study for the three desired output variables. Residual plots were used for the model bias and pattern analysis. This shows that the desired models show bias or pattern trends. Among the models ANN-XGBoost model results in lower residual values, such as 0.015 ± 0.095, 0.025 ± 0.525, and ± 0.08 for cumulative oil production, CO2 retained, and CO2 retention efficiency, respectively. The majority of the values are spread around the zero line without any visible trend, which further validates the performance of the ANN-XGBoost model and can be utilized for prediction. Based on the above, the overall performances of the top three models were ANN-XGBoost > CNN-XGBoost > XGBoost.
Figure 17. Cumulative oil production residual plots. (a) ANN-XGBoost, (b) CNN-XGBoost, (c) XGBoost, (d) RF, (e) ANN, (f) CNN, (g) KNN, (h) SVR, and (i) LR.
Figure 18. Total CO2 stored residual plots. (a) ANN-XGBoost, (b) CNN-XGBoost, (c) XGBoost, (d) RF, (e) ANN, (f) CNN, (g) KNN, (h) SVR, and (i) LR.
Figure 19. CO2 storage efficiency residual plots. (a) ANN-XGBoost, (b) CNN-XGBoost, (c) XGBoost, (d) RF, (e) ANN, (f) CNN, (g) KNN, (h) SVR, and (i) LR.
Additionally, this study compares the robustness and speed of ML models with those of conventional simulators. The CMG-GEM requires at least 4–5 min to run as a single case; overall, it requires hours for all scenarios to complete, whereas the proposed ML model requires 10–15 s to predict the results of all scenarios, showing strong computational dominance over conventional simulators (Table 7). The PC environment utilized in this study is Core i7-8550U, with a processor of 2.00 GHz and 16 GB RAM. Figure 20 shows the ANN-XGBoost model prediction of the set objective functions. As CO2-WAG project engineers require a large number of cases to check various scenarios to choose the best and optimal ones, employing such models will be difficult and decrease the time required to conduct such studies of CO2-WAG projects.
Figure 20. ANN-XGBoost model prediction plots. (a) cumulative oil production, (b) total CO2 stored, and (c) storage efficiency.
3.3 Coupled optimization
CO2-WAG exploitation of the reservoir is greatly dependent on various operational parameters that directly impact cumulative oil production and CO2 storage. To solve this, the proxy model (ANN-XGBoost) is integrated with the PSO optimization algorithm to optimize these parameters in a way to maximizes both oil production and CO2 storage. The parameters that were put into consideration during the optimization process were injection rates (water and gas), cycle size, cycle ratio, and BHP (injection and production). Table 5 depicts the optimized parameter ranges, and Figures 21a,b illustrate the resultant cumulative oil production and CO2 storage, respectively.
Figure 21. Optimized WAG cumulative oil production and CO2 storage plots. (a) cumulative oil production and (b) cumulative CO2 storage.
The results show that CO2-WAG operational parameters have a significant effect on both oil production and CO2 storage; the cumulative oil production and CO2 storage rose to 1.3 × 106 bbl and 5.56 × 105 tons from 1.11 × 106 bbl and 4.53 × 105 tons, respectively. Similarly, the production bottom-whole pressure rose to 1.31 × 103 psi, gas injection rose to 8.53 × 104 m3/day, water injection rose to 4.56 × 102 m3/day, cycle length was also changed from a 3-month cycle to a 6-month cycle. All these changes in operational parameters result in higher oil production and CO2 storage.
The results demonstrated how important it is to design operational parameters carefully. Further, it demonstrated that the use of ANN-XGBoost proxy model coupled with PSO optimization not only improves oil production and CO2 storage but also gives a systematic and robust framework for operation parameters optimization. That will enable both engineers to design quick operational parameters and decision makers to make effective decisions.
3.4 Research implications
The methodology of this work has much more far-reaching implications than the CO₂-WAG optimization and has a wider applicability in the field of subsurface engineering, environmental modeling, and energy-transition technologies. The hybrid ANNxGBoost surrogate model is an effective substitute to standard full-physics simulators that shows the fact that complicated nonlinear multiphase processes can be modeled correctly at a fraction of the computational expense. Hybrid frameworks of this type are effective in the increased oil recovery operations including polymer, surfactant, and miscible-gas flooding, where a quick scenario screening is necessary (Jiao et al., 2024; Otmane et al., 2025). This capacity to make thousands of predictions in seconds places this method as an effective instrument in conducting large-scale sensitivity analyses, designing, and assessing uncertainty. Recently, hybrid deep-learning architectures have been demonstrated to increase real-time safe operation decision-making through continuous model calibration and fast assessment of changing conditions of a reservoir (Otmane et al., 2025; Xue et al., 2023). The framework introduced in this paper gives the underlying elements required in such high-level workflows.
Apart from oil and gas sectors, the workflow is largely applicable in geothermal reservoir engineering, where the coupled thermal-hydraulic modeling is computationally expensive. The use of surrogate modeling has become one of the enabling technologies in geothermal production prediction and enhanced design of the geothermal system (Li et al., 2025; Xue et al., 2023). Other applications of the same benefits are groundwater remediation and environmental impact studies, where surrogate models speed up the process of simulating contaminant transport, remediation planning, and monitoring plumes (Luo et al., 2023; Wu et al., 2022). Notably, the hybrid ANN XGBoost model is also very convenient in predicting petrophysical properties, such as porosity, permeability, and saturation prediction. These properties tend to have complicated nonlinear relationships that are dependent on lithology, diagenesis, and depositional environment. ML techniques, especially the hybrid methods, have shown to be superior to empirical correlations and other conventional inversion techniques in predicting formation properties based on the logs, core data, and seismic attributes (Kalule et al., 2023; Talebkeikhah et al., 2021). The proposed framework can characterize reservoirs, identify facies, and update models in fields with limited core data or missing logging programs. It provides an opportunity to predict the static reserve properties quickly and with high accuracy.
The versatility of the model further supports the emerging energy-transition technologies of subsurface hydrogen storage and CO2 capture, with repeating injection-withdrawal cycles and plume movement, where risk evaluation and optimization of operation need to be performed by relying on repeated simulations of injection-withdrawal cycles and plume migration (Mao et al., 2024). In a broader sense, the multi-objective optimization framework employed in this paper is applicable to a very diverse category of engineering systems, including renewable-energy integration, optimization of chemical processes, and emissions-reduction technologies, in which the rapid search of high-dimensional parameter space can improve operational decision-making. In general, despite the fact that the current work is devoted to the CO2-WAG processes, the surrogate modeling and optimization framework under focus is flexible and transferable to wide range of subsurface, environmental, and energy-system applications.
3.5 Limitations and future work
The proposed framework demonstrates strong predictive and computational capabilities; there are a number of limitations that should be recognized to provide context to the presented results. The dataset used for training the surrogate model was created entirely from synthetic numerical simulations based on a representative reservoir model. Although this approach guarantees controlled variability in the operational parameters, it is not sufficient to capture heterogeneity, small-scale stratification, and dynamic reservoir behavior observed in actual fields. As a result, the generalizability of the trained model to other formations, depositional environments or fluid systems may require re-training using site-specific simulations or field data. The numerical model was constructed based on certain assumptions, although a moderate grid size was used to maintain computational feasibility for a total of 2,400 simulation runs, finer-scale heterogeneity, and complex facies transitions may not be completely represented. Additionally, the reservoir physics incorporated into the CMG-GEM simulator, such as the relative permeability curves, capillary-pressure behavior and Peng–Robinson EOS, introduce model dependency that may affect the prediction of the CO2, trapping efficiency and fluid displacement outcomes.
From ML perspective, the hybrid ANN-XGBoost model is inherently sensitive to hyperparameter choice, architecture design, and training domain coverage. Although hyperparameters were optimized to acheive stable and accurate predictions, challenges remain, including overfitting, sensitivity to training data imbalance, and poor extrapolation outside the training samples. More sophisticated tuning strategies, such as Bayesian optimization, genetic algorithms, or cross-validated grid search, could enhance model robustness, though at the cost of increased computational time. There is also no specific consideration of the noise measurement, operational uncertainty, or variability that one would expect in field results of the framework. The optimization results, therefore, reflect an idealized case that has been obtained from the synthetic inputs and not the noisy or incomplete datasets that are common in actual reservoir settings. Additionally, the current optimization is only focused on operational parameters, without integrating techno-economic parameters like capital and operating costs, net present value (NPV), or incentives for emissions credits. Economic and environmental uncertainty analysis coupled with multi-objective optimization should be incorporated into future work in order to create more comprehensive decision-support tools for use in this field. Despite these limitations, the study builds a solid foundation from which future development can be realized, especially by taking in field data, widening the training domain, adding uncertainty quantification, and improving physics-based constraints in the surrogate modeling process.
4 Conclusion
This work establishes a robust, data-driven framework for predicting and optimizing CO2-WAG performance by linking high-fidelity numerical simulation with advanced ML and metaheuristic optimization. Algorithms such as XGBoost, RF, KNN, SVR, LR, ANN, CNN, ANN-XGBoost, and CNN-XGBoost, were tested to predict CO2-WAG performance. The best-performing prediction model (proxy model) was then coupled with the PSO algorithm for integrated optimization. The dataset comprised 2,400 samples, having seven input parameters and three output parameters. The following conclusions were drawn from this study:
1. Both the CO2-WAG cycle size and ratio had a significant influenced oil production and CO2 storage. A cycle size of 6 months and a wag cycle ratio of 1:1 produced higher oil production and led to higher CO2 storage.
2. The injection rates (both water and gas) had a strong effect on oil production and CO2 storage. However, carefully design was required to increase the injection rates, any random increase led to early water or gas breakthrough.
3. Among the nine tested ML algorithms, the hybrid ANN model coupled with XGBoost (ANN-XGBoost) yielded the best prediction results with a high R2 score (0.99159, 0.97515, and 0.98706) and lower RMSE values (2.8 × 10−2, 1.5 × 10−1, and 2.4 × 10−2). The proxy model reproduced the full physics simulators outputs with negligible bias.
4. The proxy model (ANN-XGBoost) coupled with the PSO optimization algorithm yielded 12.8% higher cumulative oil production and 11% greater CO2 storage compared to the base WAG case.
5. The proposed optimization framework required only minutes to complete optimization. This offers engineers an alternate robust optimization workflow in comparison to conventional simulators.
These results confirm that the proposed ANN-XGBoost + PSO framework provides a practical, rapid, and reliable decision-support tool for CO₂-WAG operations. Its applications extend beyond CO2-WAG to other areas of the oil and gas sector, such as chemical injection processes. Furthermore, this workflow can also be used in other fields facing similar optimization challenges.
Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.
Author contributions
SU: Conceptualization, Formal analysis, Investigation, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing. LX: Conceptualization, Supervision, Validation, Writing – review & editing. GD: Writing – review & editing. TA-A: Funding acquisition, Writing – review & editing. ME: Writing – review & editing. AA: Writing – review & editing.
Funding
The author(s) declared that financial support was received for this work and/or its publication. This study had the support of national funds through Fundaçãopara a Ciência e Tecnologia, I. P. (FCT), under the projects UIDB. This study is also supported by the National Natural Science Foundation of China (Grant No. 52274048) and Beijing Natural Science Foundation (Grant No. 3222037). Open access funding provided by UiT-The Arctic University Norway.
Conflict of interest
The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declared that Generative AI was not used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fclim.2025.1710187/full#supplementary-material
References
Ahmadi, M. A., Zendehboudi, S., and James, L. A. (2018). Developing a robust proxy model of CO2 injection: coupling Box–Behnken design and a connectionist method. Fuel 215, 904–914. doi: 10.1016/j.fuel.2017.11.030
Al-Bayati, D., Saeedi, A., Myers, M., White, C., Xie, Q., and Clennell, B. (2018). Insight investigation of miscible SCCO2 water alternating gas (WAG) injection performance in heterogeneous sandstone reservoirs. J. CO2 Util. 28, 255–263. doi: 10.1016/j.jcou.2018.10.010
Al-Khdheeawi, E. A., Vialle, S., Barifcani, A., Sarmadivaleh, M., and Iglauer, S. (2018). Effect of wettability heterogeneity and reservoir temperature on CO2 storage efficiency in deep saline aquifers. Int. J. Greenhouse Gas Control 68, 216–229. doi: 10.1016/j.ijggc.2017.11.016
AlRassas, A. M., Al-Alimi, D., Zosseder, K., and Al-qaness, M. A. A. (2025). AI-driven predictive framework for CO2 sequestration and enhanced oil recovery: insights from a depleted oil reservoir. J. Clean. Prod. 519:146054. doi: 10.1016/j.jclepro.2025.146054
Alrassas, A. M., Vo Thanh, H., Ren, S., Sun, R., Al-Areeq, N. M., Kolawole, O., et al. (2022). CO2 sequestration and enhanced oil recovery via the water alternating gas scheme in a mixed transgressive sandstone-carbonate reservoir: case study of a large Middle East oilfield. Energy Fuel 36, 10299–10314. doi: 10.1021/acs.energyfuels.2c02185
Bachu, S. (2000). Sequestration of CO2 in geological media: criteria and approach for site selection in response to climate change. Energy Convers. Manag. 41, 953–970. doi: 10.1016/S0196-8904(99)00149-1
Bergen, K. J., Johnson, P. A., de Hoop, M. V., and Beroza, G. C. (2019). Machine learning for data-driven discovery in solid Earth geoscience. Science 363:eaau0323. doi: 10.1126/science.aau0323,
Chen, T., and Guestrin, C. (2016). XGBoost Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 785–794
Christensen, J. R., Stenby, E. H., and Skauge, A. (2001). Review of WAG field experience. SPE Reserv. Eval. Eng. 4, 97–106. doi: 10.2118/71203-pa
Czernichowski-Lauriol, I., Berenblyum, R., Bigi, S., Car, M., Gastine, M., Persoglia, S., et al. (2018). CO2GeoNet actions in Europe for advancing CCUS through global cooperation. Energy Procedia 154, 73–79. doi: 10.1016/j.egypro.2018.11.013
Dai, Z., Viswanathan, H., Middleton, R., Pan, F., Ampomah, W., Yang, C., et al. (2016). CO2 accounting and risk analysis for CO2 sequestration at enhanced oil recovery sites. Environ. Sci. Technol. 50, 7546–7554. doi: 10.1021/acs.est.6b01744,
Dai, Z., Viswanathan, H., Xiao, T., Middleton, R., Pan, F., Ampomah, W., et al. (2017). CO2 sequestration and enhanced oil recovery at depleted oil/gas reservoirs. Energy Procedia 114, 6957–6967. doi: 10.1016/j.egypro.2017.08.034
Dai, Z., Zhang, Y., Bielicki, J., Amooie, M. A., Zhang, M., Yang, C., et al. (2018). Heterogeneity-assisted carbon dioxide storage in marine sediments. Appl. Energy 225, 876–883. doi: 10.1016/j.apenergy.2018.05.038
Dziejarski, B., Krzyżyńska, R., and Andersson, K. (2023). Current status of carbon capture, utilization, and storage technologies in the global economy: a survey of technical assessment. Fuel 342:127776. doi: 10.1016/j.fuel.2023.127776
Gao, M., Liu, Z., Qian, S., Liu, W., Li, W., Yin, H., et al. (2023). Machine-learning-based approach to optimize CO2-WAG flooding in low permeability oil reservoirs. Energies 16:6149. doi: 10.3390/en16176149
Han, L., and Gu, Y. (2014). Optimization of miscible CO2 water-alternating-gas injection in the bakken formation. Energy Fuel 28, 6811–6819. doi: 10.1021/ef501547x
He, R., Ma, W., Ma, X., and Liu, Y. (2021). Modeling and optimizing for operation of CO2-EOR project based on machine learning methods and greedy algorithm. Energy Rep. 7, 3664–3677. doi: 10.1016/j.egyr.2021.05.067
Hill, T., Marquez, L., O’Connor, M., and Remus, W. (1994). Artificial neural network models for forecasting and decision making. Int. J. Forecast. 10, 5–15. doi: 10.1016/0169-2070(94)90045-0
Hsu, C.-W., Chen, L.-T., Hu, A. H., and Chang, Y.-M. (2012). Site selection for carbon dioxide geological storage using analytic network process. Sep. Purif. Technol. 94, 146–153. doi: 10.1016/j.seppur.2011.08.019
Jiao, S., Li, W., Li, Z., Gai, J., Zou, L., and Su, Y. (2024). Hybrid physics-machine learning models for predicting rate of penetration in the Halahatang oil field, Tarim Basin. Sci. Rep. 14:5957. doi: 10.1038/s41598-024-56640-y,
Jin, L., Pekot, L. J., Smith, S. A., Salako, O., Peterson, K. J., Bosshart, N. W., et al. (2018). Effects of gas relative permeability hysteresis and solubility on associated CO2 storage performance. Int. J. Greenhouse Gas Control 75, 140–150. doi: 10.1016/j.ijggc.2018.06.002
Kalule, R., Abderrahmane, H. A., Alameri, W., and Sassi, M. (2023). Stacked ensemble machine learning for porosity and absolute permeability prediction of carbonate rock plugs. Sci. Rep. 13:9855. doi: 10.1038/s41598-023-36096-2,
Kennedy, J., and Eberhart, R. (1995). Particle swarm optimization. Proceedings of ICNN’95—International Conference on Neural Networks. 1942–1948
Khather, M., Yekeen, N., Al-Yaseri, A., Al-Mukainah, H., Giwelli, A., and Saeedi, A. (2022). The impact of wormhole generation in carbonate reservoirs on CO2-WAG oil recovery. J. Petroleum Sci. Eng. 212:110354. doi: 10.1016/j.petrol.2022.110354
Kolawole, O., Ispas, I., Kumar, M., Weber, J., Zhao, B., and Zanoni, G. (2021). How can biogeomechanical alterations in shales impact caprock integrity and CO2 storage? Fuel 291:120149. doi: 10.1016/j.fuel.2021.120149
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 84–90. doi: 10.1145/3065386
Kurz, B. A., Heebink, L. V., Eylands, K. E., Smith, S. A., Hamling, J. A., Klapperich, R. J., et al. (2013). “Bell Creek test site—preinjection geochemical report” in Plains CO2 Reduction 765 (PCOR) Partnership Phase III Task 4—Deliverable D33, Prepared for National Energy 766 Technology Laboratory U.S. Department of Energy Cooperative Agreement No. DE-FC26-767 (Grand Forks, ND: Energy & Environmental Research Center).
Lackner, K. S. (2003). A guide to CO2 sequestration. Science 300, 1677–1678. doi: 10.1126/science.1079033,
Lake, L. W., Johns, R., Rossen, B., and Pope, G. (2014). Fundamentals of enhanced oil recovery. Richardson, TX: Society of Petroleum Engineers (SPE).
Land, C. S. (1968). Calculation of imbibition relative permeability for two- and three-phase flow from rock properties. Soc. Pet. Eng. J. 8, 149–156. doi: 10.2118/1942-PA
Li, H., Gong, C., Liu, S., Xu, J., and Imani, G. (2022). Machine learning-assisted prediction of oil production and CO2 storage effect in CO2-water-alternating-gas injection (CO2-WAG). Appl. Sci. 12:10958. doi: 10.3390/app122110958
Li, F., Guo, X., Qi, X., Feng, B., Liu, J., Xie, Y., et al. (2025). A surrogate model-based optimization approach for geothermal well-doublet placement using a regularized LSTM-CNN model and grey wolf optimizer. Sustainability 17:266. doi: 10.3390/su17010266
Liu, M., Li, Z., Qi, J., Meng, Y., Zhou, J., Ni, M., et al. (2024). Prediction of CO2 storage in different geological conditions based on machine learning. Energy Fuel 38, 22340–22350. doi: 10.1021/acs.energyfuels.4c04274
Luo, J., Ma, X., Ji, Y., Li, X., Song, Z., and Lu, W. (2023). Review of machine learning-based surrogate models of groundwater contaminant modeling. Environ. Res. 238:117268. doi: 10.1016/j.envres.2023.117268,
Mao, S., Chen, B., Malki, M., Chen, F., Morales, M., Ma, Z., et al. (2024). Efficient prediction of hydrogen storage performance in depleted gas reservoirs using machine learning. Appl. Energy 361:122914. doi: 10.1016/j.apenergy.2024.122914
Naghizadeh, A., Jafari, S., Norouzi-Apourvari, S., Schaffie, M., and Hemmati-Sarapardeh, A. (2024). Multi-objective optimization of water-alternating flue gas process using machine learning and nature-inspired algorithms in a real geological field. Energy 293:130413. doi: 10.1016/j.energy.2024.130413
Otmane, M., Imtiaz, S., Jaluta, A. M., and Aborig, A. (2025). Boosting reservoir prediction accuracy: a hybrid methodology combining traditional reservoir simulation and modern machine learning approaches. Energies 18:657. doi: 10.3390/en18030657
Rasmusson, K., Rasmusson, M., Tsang, Y., and Niemi, A. (2016). A simulation study of the effect of trapping model, geological heterogeneity and injection strategies on CO2 trapping. Int. J. Greenhouse Gas Control 52, 52–72. doi: 10.1016/j.ijggc.2016.06.020
Ren, D., Wang, X., Kou, Z., Wang, S., Wang, H., Wang, X., et al. (2023). Feasibility evaluation of CO2 EOR and storage in tight oil reservoirs: a demonstration project in the Ordos Basin. Fuel 331:125652. doi: 10.1016/j.fuel.2022.125652
Rodrigues, H., Mackay, E., Arnold, D., and Silva, D. (2019). Optimization of CO2-WAG and calcite scale management in pre-salt carbonate reservoirs. Offshore Technology Conference Brasil, (OTC)
Sanchez, N. L. (1999). Management of water alternating gas (WAG) injection projects. Latin American and Caribbean Petroleum Engineering Conference, (SPE)
Sen, D., Chen, H., and Datta-Gupta, A. (2022). Inter-well connectivity detection in CO2 WAG projects using statistical recurrent unit models. Fuel 311:122600. doi: 10.1016/j.fuel.2021.122600
Song, Y., Sung, W., Jang, Y., and Jung, W. (2020). Application of an artificial neural network in predicting the effectiveness of trapping mechanisms on CO2 sequestration in saline aquifers. Int. J. Greenhouse Gas Control 98:103042. doi: 10.1016/j.ijggc.2020.103042
Spiteri, E. J., Juanes, R., Blunt, M. J., and Orr, F. M. (2005). Relative permeability hysteresis: trapping models and application to geological CO2 sequestration. SPE Annual Technical Conference and Exhibition, (SPE)
Sun, X., Liu, J., Dai, X., Wang, X., Yapanto, L. M., and Zekiy, A. O. (2021). On the application of surfactant and water alternating gas (SAG/WAG) injection to improve oil recovery in tight reservoirs. Energy Rep. 7, 2452–2459. doi: 10.1016/j.egyr.2021.04.034
Talebkeikhah, M., Sadeghtabaghi, Z., and Shabani, M. (2021). A comparison of machine learning approaches for prediction of permeability using well log data in the hydrocarbon reservoirs. J. Hum. Earth Future 2, 82–99. doi: 10.28991/HEF-2021-02-02-01
Ud Din, S., Guo, D., Ning, F., and Xue, L. (2022). Natural gas hydrate production methods: a review. J. Appl. Emerg. Sci. doi: 10.36785/jaes.121529
Ud Din, S., Liang, X., Dongdong, G., and Fulong, N. (2025). Inhibition effect on gas hydrate formation in various kinetic hydrate inhibitor systems. J. Porous Media. doi: 10.1615/JPorMedia.2025057554
Ud Din, S., Wimalasiri, R., Ehsan, M., Liang, X., Ning, F., Guo, D., et al. (2023). Assessing public perception and willingness to pay for renewable energy in Pakistan through the theory of planned behavior. Front. Energy Res. 11:1088297. doi: 10.3389/fenrg.2023.1088297
Vaziri, P., and Sedaee, B. (2023). A machine learning-based approach to the multiobjective optimization of CO2 injection and water production during CCS in a saline aquifer based on field data. Energy Sci. Eng. 11, 1671–1687. doi: 10.1002/ese3.1412
Vo Thanh, H., Sugai, Y., Nguele, R., and Sasaki, K. (2020). Robust optimization of CO2 sequestration through a water alternating gas process under geological uncertainties in Cuu Long Basin, Vietnam. J. Nat. Gas Sci. Eng. 76:103208. doi: 10.1016/j.jngse.2020.103208
Wang, F., Ping, S., Yuan, Y., Sun, Z., Tian, H., and Yang, Z. (2021). Effects of the mechanical response of low-permeability sandstone reservoirs on CO2 geological storage based on laboratory experiments and numerical simulations. Sci. Total Environ. 796:149066. doi: 10.1016/j.scitotenv.2021.149066,
Wu, M., Xu, J., Hu, P., Lu, Q., Xu, P., Chen, H., et al. (2022). An adaptive surrogate-assisted simulation-optimization method for identifying release history of groundwater contaminant sources. Water 14:1659. doi: 10.3390/w14101659
Xu, Y., Zhao, X., Chen, Y., and Yang, Z. (2019). Research on a mixed gas classification algorithm based on extreme random tree. Appl. Sci. 9:1728. doi: 10.3390/app9091728
Xue, L., Li, D., and Dou, H. (2023a). Artificial intelligence methods for oil and gas reservoir development: current progresses and perspectives. Adv. Geo-Energy Res. 10, 65–70. doi: 10.46690/ager.2023.10.07
Xue, L., Wang, J., Han, J., Yang, M., Mwasmwasa, M., and Nanguka, F. (2023b). Gas well performance prediction using deep learning jointly driven by decline curve analysis model and production data. Adv. Geo-Energy Res. 8, 159–169. doi: 10.46690/ager.2023.06.03
Xue, L., Xu, S., Nie, J., Qin, J., Han, J. X., Liu, Y. T., et al. (2024). An efficient data-driven global sensitivity analysis method of shale gas production through convolutional neural network. Pet. Sci. 21, 2475–2484. doi: 10.1016/j.petsci.2024.02.010
Xue, Z., Zhang, K., Zhang, C., Ma, H., and Chen, Z. (2023). Comparative data-driven enhanced geothermal systems forecasting models: a case study of Qiabuqia field in China. Energy 280:128255. doi: 10.1016/j.energy.2023.128255
Xue, L., Zhu, Y., Ren, J., Liao, H., Dai, Q., and Tu, B. (2025). Coupled optimization method for CO2-EOR and storage based on machine learning. J. Porous Media 28, 37–53. doi: 10.1615/JPorMedia.2024052865
Yao, P., Yu, Z., Zhang, Y., and Xu, T. (2023). Application of machine learning in carbon capture and storage: an in-depth insight from the perspective of geoscience. Fuel 333:126296. doi: 10.1016/j.fuel.2022.126296
You, J., Ampomah, W., and Sun, Q. (2020). Co-optimizing water-alternating-carbon dioxide injection projects using a machine learning assisted computational framework. Appl. Energy 279:115695. doi: 10.1016/j.apenergy.2020.115695
Zhong, Z., Liu, S., Carr, T. R., Takbiri-Borujeni, A., Kazemi, M., and Fu, Q. (2019). Numerical simulation of water-alternating-gas process for optimizing EOR and carbon storage. Energy Procedia 158, 6079–6086. doi: 10.1016/j.egypro.2019.01.507
Keywords: ANN, CCUS, CO2-WAG, CO2-WAG parameters, machine learning, XGBoost
Citation: Ud Din S, Xue L, Dongdong G, Abu-Alam T, Ehsan M and Ahmed AA (2026) A robust deep learning framework for predicting carbon dioxide-water alternating gas injection performance and optimization. Front. Clim. 7:1710187. doi: 10.3389/fclim.2025.1710187
Edited by:
Yanhui Han, Houston Research Center, United StatesReviewed by:
Eslam Gomaa Al-Sakkari, Polytechnique Montréal, CanadaMostafa Saghafi, Bruno Kessler Foundation (FBK), Italy
Copyright © 2026 Ud Din, Xue, Dongdong, Abu-Alam, Ehsan and Ahmed. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Shahab Ud Din, ZW5nci5zaGFoYWIucGdAZ21haWwuY29t; Liang Xue, eHVlbGlhbmdAY3VwLmVkdS5jbg==; Tamer Abu-Alam, dGFtZXIuYWJ1LWFsYW1AdWl0Lm5v
Guo Dongdong3