A robust deep learning framework for predicting carbon dioxide-water alternating gas injection performance and optimization

Ud Din, Shahab; Xue, Liang; Dongdong, Guo; Abu-Alam, Tamer; Ehsan, Muhsan; Ahmed, Anas A.

doi:10.3389/fclim.2025.1710187

ORIGINAL RESEARCH article

Front. Clim., 15 January 2026

Sec. Carbon Dioxide Removal

Volume 7 - 2025 | https://doi.org/10.3389/fclim.2025.1710187

A robust deep learning framework for predicting carbon dioxide-water alternating gas injection performance and optimization

1. State Key Laboratory of Petroleum Resources and Engineering, China University of Petroleum (Beijing), Beijing, China
2. Department of Oil-Gas Field Development Engineering, College of Petroleum Engineering, China University of Petroleum (Beijing), Beijing, China
3. School of Earth and Environment, Anhui University of Science and Technology, Huainan, China
4. The Faculty of Biosciences, Fisheries and Economics, UiT The Arctic University of Norway, Tromsø, Norway
5. OSEAN—Outermost Regions Sustainable Ecosystem for Entrepreneurship and Innovation, University of Madeira Colégio dos Jesuítas, Funchal, Portugal
6. Department of Earth and Environmental Sciences, Bahria University, Islamabad, Pakistan
7. Department of Petroleum and Natural Gas Engineering, Faculty of Engineering, University of Khartoum, Khartoum, Sudan

Article metrics

View details

930

Views

101

Downloads

Abstract

Carbon dioxide (CO₂) emissions pose a major environmental concern, and various methods are used for CO₂ sequestration. CO₂-water activating gas (CO₂-WAG) injection is a technique used to increase production of oil and sequester CO₂ in subsurface formations. However, the performance of the CO₂-WAG project depends on various parameters, such as injection rates, cycle size, and ratio, that traditionally require numerous computationally expensive simulations. The study introduces a robust machine learning workflow for CO₂-WAG performance prediction and optimization by using a model calibrated using Bell Creek formation properties. Machine learning models are based on algorithms like extreme gradient boosting (XGBoost), linear regression (LR), random forest (RF), k-nearest neighbor (KNN), support vector regression (SVR), artificial neural network (ANN), convolutional neural network (CNN), and hybrid models such as ANN and CNN coupled with XGBoost (ANN-XGBoost, and CNN-XGBoost) to predict CO₂-WAG performance. A dataset of 2,400 samples was generated using the CMG-GEM numerical simulator, incorporating seven input parameters (e.g., injection rate, CO₂-WAG cycle size, and WAG ratio) and three output parameters, with 80% of the dataset allocated for training and 20% for validation and testing. Among the proposed models, the hybrid model ANN-XGBoost demonstrated superior performance, accurately predicting total oil production, CO₂ storage, and efficiency, with high R² scores of 0.99159, 0.97515, and 0.98706, and corresponding lower RMSE values of 2.8 × 10⁻², 1.5 × 10⁻¹, and 2.4 × 10⁻². Coupling the proxy with particle swarm optimization (PSO) yielded 12.8% increase in cumulative oil production and 11% increase in CO₂ storage. Furthermore, in terms of speed, the projected workflow requires less minutes to complete predictions and optimization, while traditional numerical simulators require 4–5 min per scenario. These findings validates the robustness and computational efficiency of the proposed machine learning workflow for predicting CO₂-WAG performance and optimization.

1 Introduction

Fossil fuels, particularly oil, remain the primary source of energy, with demand continuing to rise due to continuous development and urbanization. However, fossil fuels are considered as major contributors to CO₂ emissions, as their combustion releases CO₂ (Hsu et al., 2012; You et al., 2020; Gao et al., 2023). Although extensive research is ongoing to identify alternative energy sources, no alternative energy source to replace fossil fuels has yet emerged, and fossil fuels are expected to remain as the major energy source in the forthcoming decades (Ud Din et al., 2022; Ud Din et al., 2023, 2025; Naghizadeh et al., 2024). To manage these growing challenges, carbon capture, storage, and utilization (CCUS) is considered a vital technology for modifying climate (Czernichowski-Lauriol et al., 2018; Kolawole et al., 2021; Dziejarski et al., 2023) change by sequestering CO₂ in geological formations (Lackner, 2003; Ren et al., 2023), such as coal beds (Bachu, 2000), saline formations (Wang et al., 2021), and depletion of oil and gas reservoirs (Dai et al., 2016, 2018). CO₂ injection into reservoirs is an important technology that has been used in the oil and gas industry for decades to increase oil recovery (Lake et al., 2014).

CO₂-WAG injection is well established enhanced oil-recovery (EOR) technique, showing high sweep efficiency and reducing the risk of gas channeling, viscous fingering (Yao et al., 2023), and gravity segregation. More specifically, this technique increases oil production and sequesters CO₂ in underground formations (Spiteri et al., 2005; Rasmusson et al., 2016; Al-Khdheeawi et al., 2018; Zhong et al., 2019), Figure 1 illustrates a schematic diagram of the CO₂-WAG technique and its role in CCUS. The technique was first employed in 1957 by Mobil Corporation in a Canadian sandstone reservoir. This method effectively addressed the early CO₂ breakthrough and enhanced the hindrance to gas flow. In addition, CO₂-WAG enhances the mobility ratio and decreases water flow hindrance. Survey data indicate that in the United States, approximately 80% of the oilfields have adopted WAG, with consistently positive outcomes. This widespread application underscores its effectiveness in boosting oilfield development and production rates (Sanchez, 1999). Further, Christensen et al. (2001) studied 59 fields employed with WAG, and reported an average 10% increase in oil recovery. This evidence underscores the positive influence of WAG technology in augmenting oilfield production. Several prior studies have further investigated the core-scale heterogeneity in determining the efficiency of oil production through CO₂-WAG injection. The results revealed that CO₂-WAG injection exhibited superior performance in cases involving homogeneous, layered, and composite samples (Al-Bayati et al., 2018). Han and Gu (2014) evaluated optimized miscible injection through nine core-flooded experiments and demonstrated that optimized miscible injection outperformed other methods; notably, a smaller WAG slug (1:1 ratio) yielded higher recovery factors. Sun et al. (2021) investigated CO₂ phase migration through porous media during WAG injection and observed a 46% increase in recovery factor (RF). The results also showed that the gas-to-water injection ratio played a pivotal role in influencing the efficacy of WAG injections, as highlighted in prior studies. Khather et al. (2022) explored the influence of CO₂ and carbonate interactions on oil and gas recovery. This study showed a series of experiments utilizing three distinct carbonate rock core samples characterized by heterogeneous properties, such as different oil saturation and permeability values, including both low and moderate values. The results indicated that CO₂-WAG injection following water flooding led to a remarkable surge in the RF, exceeding 30% for all the core samples. Ren et al. (2023) examined the use of CO₂-EOR and its subsequent CO₂ sequestration in the fields located in the Ordos Basin, China, by employing continuous CO₂ and WAG injection techniques. This study showed that simultaneous injection of equal amounts of CO₂ and WAG substantially improved crude oil production in the examined oilfields. Alrassas et al. (2022) studied CO₂-WAG injection in a Yemeni interbedded reservoir and compared with continuous CO₂ injection. The study demonstrated that CO₂-WAG provided superior reservoir performance, enhancing both oil recovery and CO₂ storage, largely due to its higher Kv/Kh ratios. Moreover, the 2:1 WAG ratio outperformed the 1:1 ratio, underscoring the high importance of optimizing injection strategies in interbedded reservoirs to maximize oil recovery and CO₂ storage efficiency.

Figure 1

Currently, the optimization of CO₂-EOR technology remains a central focus across numerous petroleum engineering fields. Dai et al. (2017) applied Monte Carlo simulations to quantify uncertainty in CO₂ storage potential within an active EOR project located at the Morrow Reservoir, Farnsworth Unit, Texas. Vo Thanh et al. (2020) employed the WAG process to improve robust optimization of CO₂ storage potential in a Vietnamese field. These findings showed that WAG injection effectively improved CO₂ sequestration compared with continuous gas injection. Specifically, the nominal and optimal optimization scenarios improved CO₂ sequestration by 13 and 15%, respectively, compared to the baseline WAG case. Rodrigues et al. (2019) conducted an optimization study using computer modeling group (CMG) simulator in a Brazilian offshore field and a strategy for designing CO₂-WAG operations in carbonate reservoirs, emphasizing the considerations of economic feasibility, efficiency in recycling CO₂, and assessment of project risks.

Conventional approaches to parameter optimization are often cumbersome and labor-intensive, while neglecting complex nonlinear interactions and the fundamental variables that drive them. These methods typically rely mostly on distinct models and algorithms that restrict their adaptability to varying oilfield conditions and fluctuations. They possess certain limitations and lack flexibility. Consequently, incorporating more sophisticated optimization methods, such as ML and metaheuristic algorithms, offers a superior solution for handling the intricacies and uncertainties associated with oil fields, and also improved optimization efficiency and precision (He et al., 2021; Li et al., 2022; Sen et al., 2022; Xue et al., 2023a, 2023b, 2024). Presently, notable progress has been made in the field of petroleum exploration and production owing to rapid advancements in intelligent algorithms, particularly in the domain of ML. Nevertheless, ML frameworks have become more and more popular in the context of CO₂-EOR and CO₂-WAG, there are still certain limitations of current research. Furthermore, several investigations typically concentrate either on oil recovery or CO₂ storage, without the simultaneous capturing of the two processes, limiting their applicability to WAG decision-making (Gao et al., 2023; Li et al., 2022). Numerous observations based on operational or geological inputs limited enough to model the main WAG behaviors of cycle size, cycle ratio, pressures of injection and impacts of three-phase hysteresis (Naghizadeh et al., 2024; You et al., 2020). Moreover, prior ML studies regularly use a few instances of simulation to narrow the breadth of the training domain and limit the ability of the model to acquire complex nonlinear connections. Liu et al. (2024) made predications of CO₂ storage under different subsurface environments with RF and XGBoost, only having 184 datasets; Ahmadi et al. (2018) created an LSSVM-based proxy using Box–Behnken experimental design, which constrained input combinations. Another recently developed deep-learning model utilizing CNN and MLNN was used to forecast CO₂ solubility trapping and recovery of oil based on 814 test cases (AlRassas et al., 2025). This research was limited to prediction, though it did not incorporate an optimization element. In a broader sense, the current ML and DL research lacks explicit consideration of the WAG-specific physics like dynamics of CO₂ trapping (Sen et al., 2022; Song et al., 2020), as well as it does not consider operational optimization, which constrains its use in the context of real-time design and decision-making (Vaziri and Sedaee, 2023).

Recent studies have shown that hybrid architectures are not only based on neural networks and boosted trees can significantly outperform individual algorithms in reservoir engineering by being much better able to represent nonlinear interaction without overfitting (Jiao et al., 2024; Otmane et al., 2025; Vaziri and Sedaee, 2023). Continuing on these contributions, the current study constructs a hybrid ANN-XGBoost surrogate model that has been trained using a large and systematically produced synthetic dataset, which covers a large variety of WAG working conditions. The suggested framework, in contrast to the previous literature, which only focuses on prediction, combines prediction with multi-objective optimization through the integration of the surrogate model with the particle swarm optimization (PSO). This allows recognizing operational strategies that optimize the recovery of oil and store CO₂ simultaneously in order to overcome the shortcomings of prior ML research and provide a complete, effective workflow to design and optimize CO₂-WAG designs.

This study thoroughly analyzed the various effects of CO₂-WAG parameters, which include injection rates, cycle sizes, and ratios, on cumulative oil production and CO₂ stored using a model based on Bell Creek formation characteristics. Furthermore, the study proposed a robust ML prediction workflow for CO₂-WAG technique performance prediction by employing various ML algorithms such as extreme gradient boosting (XGBoost), linear regression (LR), random forest (RF), k-nearest neighbor (KNN), support vector regression (SVR), artificial neural network (ANN), and convolutional neural network (CNN). Additionally, the study compares these algorithms with hybrid algorithms, such as ANN coupled with XGBoost (ANN-XGBoost), and CNN coupled with XGBoost (CNN-XGBoost). The best prediction model will then be coupled with particle swarm optimization (algorithm) to perform the optimization study. The results provide insights into the factors affecting CO₂-WAG performance and a robust ML workflow for CO₂-WAG performance prediction and optimization. The findings of the study provide engineers and decision makers a robust optimization and decision-making framework.

2 Methodology

2.1 Base model

The study used CMG-GEM (CMG, 2022) to construct a base model reflecting the Bell Creek formation characteristics. The developed model was designed with a five-spot well pattern comprising a central production well and four injection wells located at the corners (Jin et al., 2018). The model consists of 59 × 59 × 6 = 20,886 grid blocks. Furthermore, to mimic the original reservoir conditions, the model was designed with a dip angle of 1°, consistent with the Bell Creek formation, as shown in Figure 2.

Figure 2

Table 1 listed all the parameters used in developing the base model. The reservoir is located at a depth of 1,342 m with initial reservoir pressure of 2,400 psi at 41.7 °C. The reservoir conditions put the model above the minimum miscibility pressure of 1,411 psi, which is really important in a CO₂-WAG project, to keep the injection pressure above the minimum miscibility pressure (MMP), thus making the CO₂ miscible with the oil at reservoir conditions. The reservoir has dual porosity and permeability; the upper part is made of siltstone, while the lower part is of sandstone (Kurz et al., 2013). Where the vertical permeability kv is 0.1 times the horizontal permeability. Furthermore, the model realizes all the trapping mechanisms involved in a CO₂ storage process, such as structural trapping, solubility trapping, residual trapping, and mineralization. To apply the residual trapping, the model that was applied in CMG-GEM in-built linear phase permeability hysteresis model was used (Land, 1968). However, the mineralization effect is negligible as it requires long periods to take place. Table 2 listed all the components of the fluid model used in the base model, where Peng Robinson EOS was employed for the fluid model tuning; all components are lumped to seven pseudo components by the lumping technique, utilizing WinProp.

Table 1

Properties	Values
Depth	1,342 m
Reservoir thickness	18 m
Dip angle	1° (northwest)
Porosity	0.15, 0.25
Permeability	50, 700 md
Reservoir pressure	2,400 psi
Temperature	41.7 °C
BHP_production	1,200 psi
API gravity	37
Oil viscosity	2.2 cP
Saturation pressure	925 psi
MMP	1,411 psi

Model reservoir properties.

Table 2

Components	Mole fraction	Injection
CO₂	0.0042	1
N₂₋C₂H₆	0.1963	—
C₃H–NC₄	0.0428	—
IC₅–C₇	0.1526	—
C₈–C₁₃	0.2860	—
C₁₄–C₂₄	0.1997	—
C₂₅–C₃₆₊	0.1184	—

Fluid composition.

2.2 Database generation

Conducting a ML-based study required a huge amount of high-quality time series data. In the absence of real field data, numerical simulations were used to produce a dataset for a relevant ML study. The dataset includes seven CO₂-WAG operational parameters, such as bottom hole pressure injector (water and gas), bottom hole pressure production, WAG cycle size, WAG cycle ratio, gas injection rate, and water injection rate as inputs, and three output parameters, such as cumulative oil production, cumulative CO₂ storage, and CO₂ storage efficiency. The input parameters maximum and minimum ranges are listed in the Table 3. Then, a specialized Python script (written by our research group to generate input data files for CMG GEM) was used to randomly generate a set of 2,400 input data files for the CMG-GEM within the ranges given in Table 3 (while keeping all other parameters the same).

Table 3

Parameters	Units	Min	Max
Water injection rate	×10³ m³/day	0.05	3
Gas injection rate	×10³ m³/day	10	100
Cycle size	Months	1	18
Cycle ratio	—	1.1	3.1
BHP_gas injector	×10³ psi	1.8	3.3
BHP_water injector	×10³ psi	1.8	3.3
BHP_production	×10³ psi	0.8	1.4

CO₂-WAG parameters range.

The script then automatically runs numerical simulations number-by-number using the created data files in the CMG-GEM. Upon completion of the simulations, cumulative oil production and total CO₂ stored (used for CO₂ storage efficiency calculations) data were extracted from the output files and were used as output parameters to complete the dataset for the ML study. The output parameters and the proposed objective functions can be calculated.

2.2.1 Cumulative oil production

The dimensionless cumulative oil production can be calculated as given in Equation 1:where is the variable vector for optimization, is cumulative oil production, is timestep, and is the total timestep.

2.2.2 Total CO₂ stored

The second objective function in the dataset is total CO₂ stored, which can be calculated by Equation 2:where, total CO₂ stored, represents the variable vector for optimization, is CO₂ injected at any timestep, is CO₂ produced at any timestep, is timestep, and is the total timestep.

2.2.3 Storage efficiency

CO₂ storage efficiency is considered as the third objective function; however, it is not included in the optimization stage and is only considered during prediction, and is calculated by Equation 3:

2.2.4 Coupled optimization

For coupled optimization we consider cumulative oil production and CO₂ stored as these are the two main objectives to optimize and can be mathematically expressed by Equation 4:where, is our coupled objective function for optimization , and are the weights. Equal weights are assigned to treat both the objective variables equal.

2.3 Proxy model development

Advanced computational technologies have broadened the scope of reservoir modeling. However, limited computational resources still pose challenges in terms of uncertainty quantification and optimization workflows. To address this challenge, computationally efficient proxy models can be used. Proxy models, also known as surrogate models, are mathematical or statistical models that mimic complex simulation models by using predetermined input parameters (Zubarev, 2009; Xue et al., 2023a, 2025). These models are used as a potential substitute for complex models, ensuring computational efficiency and significantly reducing the computation cost and time. Various algorithms have been employed for proxy model development in the oil and gas sector. This study employed various algorithms, such as XGBoost, RF, ANN, CNN, KNN, SVR, LR, ANN-XGBoost, and CNN-XGBoost, for proxy model development to accomplish the prediction study, as shown in Figure 3.

Figure 3

Three proxy models were developed for cumulative oil production, total CO₂ stored, and storage efficiency for a period of 20 years, a dataset of 2,400 samples that included four input parameters (including injection rate, CO₂-WAG cycle size, and ratio) and three output parameters (cumulative oil production, total CO₂ stored, and storage efficiency). The dataset was thoroughly analyzed for outliers before training. The dataset was then ready for training by splitting it into training (80%), testing (10%), and validation (10%) datasets. The datasets were normalized by feature scaling to ensure consistency (0–1). After all these steps, the dataset was set ready for training the models to develop the respective proxy models, and Figure 4 shows the specific training process employed.

Figure 4

The employed ML models performance can be evaluated by plotting the cross plots between the actual and predicted values, such as the R² plots and RMSE calculations. Furthermore, relative errors and residual plots were used to validate the models. These parameters can be calculated by Equations 5−7:where, is the actual values obtained from simulator, is the values predicted by the various models, and is represent samples number.

2.4 Machine learning algorithms

The study applied various ML models based on algorithms, like XGBoost, Random Forest, SVR, KNN, LR, ANN, and CNN. The following section explains ANN and XGBoost algorithms in detail, where the other algorithms (ER, KNN, SVR, LR, and CNN) employed in this study are provided in Supplementary material.

2.4.1 Artificial neural network

ANN is a computational model that mimics the human neurological system. ANN comprises of neurons, also called nodes, which are organized into layers, weights, bias, and activation functions, as shown in Figure 5. The basic structure is comprised of several layers, each containing several nodes (neurons). The input layer receives the input dataset, hidden layers (single or multiple) process the information, and output layer processes the results (Hill et al., 1994). The connections between neurons carry weights that adjust during the training process and detect the influence of neurons on one another. In the training process, forward and back propagation were used, where the adjusting weights minimized the error between the actual and predicted values (Song et al., 2020). Activation factors, such as ReLU, enable the model to learn from non-linear complex reservoir models (as listed in Table 4) (Bergen et al., 2019). The advantage of ANN models is that, unlike CNN models, they do not require large datasets for training. Further, ANN models have been successfully employed in various fields, including oil and gas.

Figure 5

Table 4

Model	Hyperparameter	Value
ANN	Learning rate	0.001
	Hidden layers	3
	Neurons per layer	100, 150, 200
	Activation	ReLU
	Batch size	32
XGBoost	n_estimator	100
	Learning rate (η)	0.05
	Estimator	300
	Subsample	0.8
	L1/L2 regularization	Enabled

Hyperparameters used for the ANN and XGBoost models.

2.4.2 XGBoost algorithm

XGBoost algorithm is a form of the gradient boosting decision tree (GBDT) algorithm, (Xu et al., 2019) which is a scalable and efficient tree-boosting system introduced by Chen and Guestrin (2016). It has found immense popularity in classification and regression problems because it is computationally efficient and highly accurate for making predictions. Based on the principles of GBDT, XGBoost provides a number of main optimizations to improve performance. It is worth noting that it uses a second-order Taylor approximation of the loss function, which leads to a higher precision for optimization than the first-order gradient-based optimization methods. Moreover, the algorithm includes L1 and L2 regularization terms in the objective function to avoid overfitting (as listed in Table 4) (Krizhevsky et al., 2017) and foster the generalization and sparsity of the model. Besides the benefits of computational efficiency, XGBoost also makes use of block-based storage to optimize sequential access to memory, as well as to enable parallel computing, which considerably increases the training speed. These improvements combine to make XGBoost a powerful and scalable tool capable of handling large-scale uses in ML applications. In Figure 6 a schematic diagram of the XGBoost algorithm is shown.

Figure 6

2.5 Optimization algorithm

2.5.1 Particle swarm optimization

PSO is a population-based stochastic optimization algorithm first described by Kennedy and Eberhart (1995) with the idea inspired by the social behaviors of bird flocks and fish schools. In PSO, particles are a set of candidate solutions that search the search space by changing their position and velocity through time with both personal and swarm experience. Every particle modifies its course depending on two primary elements such as the optimal place it has ever reached by itself (personal best), and the optimal place reached by any of the swarm of particles (global best). This collaboration enables the swarm to be drawn to optimum or near-optimal solutions in complex and multidimensional spaces. The velocity of the particle is corrected by the following Equation 8:where, and is the velocity and position of a particle at iteration, is the inertia weight, and those are acceleration coefficients, and are randomly generated numbers between 0 and 1, and . Indicate the personal and global best positions to be observed to date.

PSO is specifically adapted to optimization by its conceptual simplicity, computer efficiency, and minimal parameter tuning. This study used a simple variant of PSO when all the particles become informed on a global scale by a fully connected topology. We set the inertia weight and acceleration coefficients . Swarm size was set to 100 particles (relatively cheap to compute) and provides sufficient exploration.

3 Results and discussion

3.1 Base case analysis

Figure 7 shows the base WAG cumulative oil production and cumulative CO₂ storage. The base WAG gas injection rate is 4.53 × 10⁴ m³/day, water injection rate 1.9 × 10³ m³/day, other operational parameters are listed in Table 5. Yielded a cumulative oil production of 1.1 × 10⁶ bbl and CO₂ storage of 4.53 × 10⁵ tons. The following section considers how the water injection rate, gas injection rate, WAG cycle size, and WAG cycle ratios affect cumulative oil production, total CO₂ stored, and storage efficiency.

Figure 7

Table 5

Parameters	Units	Base WAG	Optimized WAG
Water injection rate	×10² m³/day	1.9	2.7
Gas injection rate	×10⁴ m³/day	4.56	8.53
Cycle size	Month	3	6
Cycle ratio	—	1.1	1.1
BHP_gas injector	×10³ psi	3.0	2.9
BHP_water injector	×10³ psi	2.8	3.1
BHP_production	×10³ psi	1.20	1.31
Cumulative oil	×10⁶ bbl	1.1	1.3
Cumulative CO₂	×10⁵ tons	4.53	5.56
Objective function		1.065	1.22

Base and optimized CO₂-WAG parameters and results.

3.1.1 Effect of WAG cycle size and ratio

Figures 8a–c depict the effect of WAG cycle size on cumulative oil production, total CO₂ stored, and storage efficiency. This study used various WAG cycle sizes, such as 1, 3, 6, 9, 12, and 18 months. From the results, the cumulative oil production was quite stable in all WAG cycle sizes, although its increase was slight at 6 months, possibly indicating an optimum proportion between CO₂ and water slug to manage the activation of mobility. The amount of CO₂ stored also remained comparatively stable, although a slight increase was observed as the WAG cycles became longer, perhaps because of the long retention time. The storage efficiency is not highly dependent on the WAG cycle size, but the variability is lower for longer WAG cycle durations, implying that trapping mechanisms are more predictable with longer cycle durations.

Figure 8

Furthermore, the WAG cycle ratio impact on cumulative oil production, total CO₂ stored, and storage efficiency is also investigated, as shown in Figures 8d–f. For thorough analysis, the study utilized various WAG cycle ratios such as 1.1, 1.2, 1.3, 2.1, and 3.1. The results show that cumulative oil production is highest at a ratio of 1:2, where the injection of CO₂ takes precedence. This is an indication of the ability of CO₂ to mobilize oil better than water. Storage is more efficient because more CO₂ is stored as the cycle ratio of CO₂ increases over time, reflecting a direct correlation between the impact of the fluid injection composition and its storage operation. Interestingly, at moderate CO₂ to water ratios (i.e., 1:2 and 1:1), the storage efficiency is optimized, meaning that there is sufficient CO₂ injection and adequate trapping enhanced by water slugs. At higher WAG cycle ratios (2.1, 3.1), the CO₂ storage efficiency decreased, probably because of the CO₂ breakthrough and less or no solubility trapping.

3.1.2 Effect of water and gas injection rate

Figures 9a–c analyzes how the water injection rate influences cumulative oil production, total CO₂ stored, and storage efficiency. The oil recovery is enhanced when the water injection rate is increased, and it demonstrates a reinforcement in the displacement of the oil in the reservoir. However, after a certain level, further increases in water injection rate will have very low or no effect on oil production, as in this case, water injection rates above 2,000 m³/day and further increases in water injection rate can lead to an early water breakthrough. The study objective is to investigate water effect on CO₂ storage, by carefully examining Figure 9b increase of water above 250 m³ will negatively affect CO₂ storage; further increase of water beyond it will significantly decrease CO₂ storage. The CO₂ retained remains relatively constant at higher and lower water injection rates, which means the water injection activity has little effect on the quantity of CO₂ that is retained, and that is perhaps not surprising given the low degrees of miscibility or interactions between water and CO₂ plumes, which are likely to occur. On the other hand, storage efficiency is realized to be decreasing significantly with the water injection rate. This tendency can be attributed to the fact that a larger pore space becomes filled with water, thus leaving less space to trap CO₂. There is less CO₂ that can be trapped by structural, residual, or solubility trapping when the pore network is filled with more water.

Figure 9

Figures 9d–f display the effect of the CO₂ injection rates on cumulative oil production, CO₂ stored, and storage efficiency, respectively. The total oil production increased as the CO₂ injection rate increased from 10,000 to 90,000 m³/day. The higher the CO₂ injection rate, the better the oil mobility, probably because of better miscibility and support of the reservoir. Similarly, the total CO₂ stored increased with increasing injection rate. A further increase in the injection rate from 90,000 to 100,000 m³/day does not indicate any significant increase in the cumulative CO₂ storage; this may be due to premature breakthrough or reduced trapping efficiency via over-injection. Interestingly, with the injection rate, there was a negative relationship with storage efficiency. Although a greater amount of CO₂ is trapped at higher rates, the volume ratio of effectively trapped CO₂ decreases, relating to the poor mobility ratios and increased CO₂ migration rates.

3.2 Machine learning model’s performance

The results of the ML model were compared with CMG-GEM. For this purpose, a dataset of 2,400 simulations were compared after conducting simulations. The dataset used for the ML study included input parameters, which included injection rates (water and CO₂), WAG cycle size, and ratios, where the output parameters included cumulative oil production, total CO₂ stored, and storage efficiency. The dataset was split into 80% training, 10% validation, and 10% testing. The simulation results were used as the actual values and compared with the predictions of the ML models. Figures 10–12 depicts the actual versus predicted plots for the three desired output variables. Among all the single models used in XGBoost, it performed better than all models; however, when coupled with an ANN model, the resulting hybrid model outperformed all single (XGBoost, RF, ANN, CNN, SVR, KNN, and LR) and hybrid (ANN-XGBoost and CNN-XGBoost) machine learning models. This is evident from the R² plots shown in Figure 13b, and the relevant values of the optimal models are listed in Table 6, and all other model values are listed in Supplementary Tables S1–S3. The results show that the predicted values are closely aligned with the actual values and cluster around the actual versus the predicted curve. Furthermore, Figure 13a illustrates the average absolute relative error plots (AARE) of all the models, which validates the superiority of the ANN-XGBoost model over the other models, with the lowest AARE values.

Figure 10

Figure 11

Figure 12

Figure 13

Table 6

Dataset	Target	R²	MSE	RMSE	AARE
Training	Oil	0.99972	2.9 × 10⁻⁵	5.3 × 10⁻³	0.00358
	CO₂	0.99969	8.4 × 10⁻⁴	2.9 × 10⁻²	0.03576
	Efficiency	0.99975	1.5 × 10⁻⁵	3.8 × 10⁻³	0.00850
Validation	Oil	0.99110	8.1 × 10⁻⁴	8.1 × 10⁻⁴	0.02010
	CO₂	0.99014	2.4 × 10⁻²	2.4 × 10⁻²	0.10648
	Efficiency	0.98767	6.0 × 10⁻⁴	6.0 × 10⁻⁴	0.04641
Testing	Oil	0.99159	8.1 × 10⁻⁴	2.8 × 10⁻²	0.01962
	CO₂	0.97515	6.5 × 10⁻²	1.5 × 10⁻¹	0.10359
	Efficiency	0.98706	7.5 × 10⁻⁴	2.4 × 10⁻²	0.05144

ANN-XGBoost model statistical analysis for cumulative oil production, total CO₂ stored, and storage efficiency.

Furthermore, to check the performance of the respective models, this study also considered relative deviation plots. Actually, the relative deviation plots are important to see how the model behaves and show any deviation of the predicted values from the actual values; therefore, the higher the deviation, the less accurate the model will be, and the lower the deviation, the higher the accuracy of the respective model. In Figures 14–16 the relative deviation plots for all models for all three output variables were shown. These plots were used to determine whether the model was under-predicting or over-predicting. Based on the results of all the models, the ANN-XGBoost model shows the least deviation from the actual values, as in the case of oil production. The majority of the values are clustered around the zero line and have a slight deviation of 0.045 ± 0.155. Similarly, in total CO₂ stored, the majority of the values are clustered around the zero line with a deviation of −0.05 ± 2.55. The same trend was observed in the case of CO₂ storage efficiency, with a slight deviation of −0.39 ± 0.78. All the results show that the proposed model resulted in minimal deviation error and provides balanced performance.

Figure 14

Figure 15

Figure 16

Figures 17–19 represents the residual plots of all models utilized in the study for the three desired output variables. Residual plots were used for the model bias and pattern analysis. This shows that the desired models show bias or pattern trends. Among the models ANN-XGBoost model results in lower residual values, such as 0.015 ± 0.095, 0.025 ± 0.525, and ± 0.08 for cumulative oil production, CO₂ retained, and CO₂ retention efficiency, respectively. The majority of the values are spread around the zero line without any visible trend, which further validates the performance of the ANN-XGBoost model and can be utilized for prediction. Based on the above, the overall performances of the top three models were ANN-XGBoost > CNN-XGBoost > XGBoost.

Figure 17

Figure 18

Figure 19

Additionally, this study compares the robustness and speed of ML models with those of conventional simulators. The CMG-GEM requires at least 4–5 min to run as a single case; overall, it requires hours for all scenarios to complete, whereas the proposed ML model requires 10–15 s to predict the results of all scenarios, showing strong computational dominance over conventional simulators (Table 7). The PC environment utilized in this study is Core i7-8550U, with a processor of 2.00 GHz and 16 GB RAM. Figure 20 shows the ANN-XGBoost model prediction of the set objective functions. As CO₂-WAG project engineers require a large number of cases to check various scenarios to choose the best and optimal ones, employing such models will be difficult and decrease the time required to conduct such studies of CO₂-WAG projects.

Table 7

Method	Runtime (per run)	Runtime (2,400 cases)
CMG-GEM	4–5 min	180 h
ANN-XGBoost	<1 s	10–15 s

Runtime comparison between simulator and the proxy model.

Figure 20

3.3 Coupled optimization

CO₂-WAG exploitation of the reservoir is greatly dependent on various operational parameters that directly impact cumulative oil production and CO₂ storage. To solve this, the proxy model (ANN-XGBoost) is integrated with the PSO optimization algorithm to optimize these parameters in a way to maximizes both oil production and CO₂ storage. The parameters that were put into consideration during the optimization process were injection rates (water and gas), cycle size, cycle ratio, and BHP (injection and production). Table 5 depicts the optimized parameter ranges, and Figures 21a,b illustrate the resultant cumulative oil production and CO₂ storage, respectively.

Figure 21

The results show that CO₂-WAG operational parameters have a significant effect on both oil production and CO₂ storage; the cumulative oil production and CO₂ storage rose to 1.3 × 10⁶ bbl and 5.56 × 10⁵ tons from 1.11 × 10⁶ bbl and 4.53 × 10⁵ tons, respectively. Similarly, the production bottom-whole pressure rose to 1.31 × 10³ psi, gas injection rose to 8.53 × 10⁴ m³/day, water injection rose to 4.56 × 10² m³/day, cycle length was also changed from a 3-month cycle to a 6-month cycle. All these changes in operational parameters result in higher oil production and CO₂ storage.

The results demonstrated how important it is to design operational parameters carefully. Further, it demonstrated that the use of ANN-XGBoost proxy model coupled with PSO optimization not only improves oil production and CO₂ storage but also gives a systematic and robust framework for operation parameters optimization. That will enable both engineers to design quick operational parameters and decision makers to make effective decisions.

3.4 Research implications

The methodology of this work has much more far-reaching implications than the CO₂-WAG optimization and has a wider applicability in the field of subsurface engineering, environmental modeling, and energy-transition technologies. The hybrid ANNxGBoost surrogate model is an effective substitute to standard full-physics simulators that shows the fact that complicated nonlinear multiphase processes can be modeled correctly at a fraction of the computational expense. Hybrid frameworks of this type are effective in the increased oil recovery operations including polymer, surfactant, and miscible-gas flooding, where a quick scenario screening is necessary (Jiao et al., 2024; Otmane et al., 2025). This capacity to make thousands of predictions in seconds places this method as an effective instrument in conducting large-scale sensitivity analyses, designing, and assessing uncertainty. Recently, hybrid deep-learning architectures have been demonstrated to increase real-time safe operation decision-making through continuous model calibration and fast assessment of changing conditions of a reservoir (Otmane et al., 2025; Xue et al., 2023). The framework introduced in this paper gives the underlying elements required in such high-level workflows.

Apart from oil and gas sectors, the workflow is largely applicable in geothermal reservoir engineering, where the coupled thermal-hydraulic modeling is computationally expensive. The use of surrogate modeling has become one of the enabling technologies in geothermal production prediction and enhanced design of the geothermal system (Li et al., 2025; Xue et al., 2023). Other applications of the same benefits are groundwater remediation and environmental impact studies, where surrogate models speed up the process of simulating contaminant transport, remediation planning, and monitoring plumes (Luo et al., 2023; Wu et al., 2022). Notably, the hybrid ANN XGBoost model is also very convenient in predicting petrophysical properties, such as porosity, permeability, and saturation prediction. These properties tend to have complicated nonlinear relationships that are dependent on lithology, diagenesis, and depositional environment. ML techniques, especially the hybrid methods, have shown to be superior to empirical correlations and other conventional inversion techniques in predicting formation properties based on the logs, core data, and seismic attributes (Kalule et al., 2023; Talebkeikhah et al., 2021). The proposed framework can characterize reservoirs, identify facies, and update models in fields with limited core data or missing logging programs. It provides an opportunity to predict the static reserve properties quickly and with high accuracy.

The versatility of the model further supports the emerging energy-transition technologies of subsurface hydrogen storage and CO₂ capture, with repeating injection-withdrawal cycles and plume movement, where risk evaluation and optimization of operation need to be performed by relying on repeated simulations of injection-withdrawal cycles and plume migration (Mao et al., 2024). In a broader sense, the multi-objective optimization framework employed in this paper is applicable to a very diverse category of engineering systems, including renewable-energy integration, optimization of chemical processes, and emissions-reduction technologies, in which the rapid search of high-dimensional parameter space can improve operational decision-making. In general, despite the fact that the current work is devoted to the CO₂-WAG processes, the surrogate modeling and optimization framework under focus is flexible and transferable to wide range of subsurface, environmental, and energy-system applications.

3.5 Limitations and future work

The proposed framework demonstrates strong predictive and computational capabilities; there are a number of limitations that should be recognized to provide context to the presented results. The dataset used for training the surrogate model was created entirely from synthetic numerical simulations based on a representative reservoir model. Although this approach guarantees controlled variability in the operational parameters, it is not sufficient to capture heterogeneity, small-scale stratification, and dynamic reservoir behavior observed in actual fields. As a result, the generalizability of the trained model to other formations, depositional environments or fluid systems may require re-training using site-specific simulations or field data. The numerical model was constructed based on certain assumptions, although a moderate grid size was used to maintain computational feasibility for a total of 2,400 simulation runs, finer-scale heterogeneity, and complex facies transitions may not be completely represented. Additionally, the reservoir physics incorporated into the CMG-GEM simulator, such as the relative permeability curves, capillary-pressure behavior and Peng–Robinson EOS, introduce model dependency that may affect the prediction of the CO_2, trapping efficiency and fluid displacement outcomes.

From ML perspective, the hybrid ANN-XGBoost model is inherently sensitive to hyperparameter choice, architecture design, and training domain coverage. Although hyperparameters were optimized to acheive stable and accurate predictions, challenges remain, including overfitting, sensitivity to training data imbalance, and poor extrapolation outside the training samples. More sophisticated tuning strategies, such as Bayesian optimization, genetic algorithms, or cross-validated grid search, could enhance model robustness, though at the cost of increased computational time. There is also no specific consideration of the noise measurement, operational uncertainty, or variability that one would expect in field results of the framework. The optimization results, therefore, reflect an idealized case that has been obtained from the synthetic inputs and not the noisy or incomplete datasets that are common in actual reservoir settings. Additionally, the current optimization is only focused on operational parameters, without integrating techno-economic parameters like capital and operating costs, net present value (NPV), or incentives for emissions credits. Economic and environmental uncertainty analysis coupled with multi-objective optimization should be incorporated into future work in order to create more comprehensive decision-support tools for use in this field. Despite these limitations, the study builds a solid foundation from which future development can be realized, especially by taking in field data, widening the training domain, adding uncertainty quantification, and improving physics-based constraints in the surrogate modeling process.

4 Conclusion

This work establishes a robust, data-driven framework for predicting and optimizing CO

₂

-WAG performance by linking high-fidelity numerical simulation with advanced ML and metaheuristic optimization. Algorithms such as XGBoost, RF, KNN, SVR, LR, ANN, CNN, ANN-XGBoost, and CNN-XGBoost, were tested to predict CO

₂

-WAG performance. The best-performing prediction model (proxy model) was then coupled with the PSO algorithm for integrated optimization. The dataset comprised 2,400 samples, having seven input parameters and three output parameters. The following conclusions were drawn from this study:

Both the CO₂-WAG cycle size and ratio had a significant influenced oil production and CO₂ storage. A cycle size of 6 months and a wag cycle ratio of 1:1 produced higher oil production and led to higher CO₂ storage.
The injection rates (both water and gas) had a strong effect on oil production and CO₂ storage. However, carefully design was required to increase the injection rates, any random increase led to early water or gas breakthrough.
Among the nine tested ML algorithms, the hybrid ANN model coupled with XGBoost (ANN-XGBoost) yielded the best prediction results with a high R² score (0.99159, 0.97515, and 0.98706) and lower RMSE values (2.8 × 10⁻², 1.5 × 10⁻¹, and 2.4 × 10⁻²). The proxy model reproduced the full physics simulators outputs with negligible bias.
The proxy model (ANN-XGBoost) coupled with the PSO optimization algorithm yielded 12.8% higher cumulative oil production and 11% greater CO₂ storage compared to the base WAG case.
The proposed optimization framework required only minutes to complete optimization. This offers engineers an alternate robust optimization workflow in comparison to conventional simulators.

These results confirm that the proposed ANN-XGBoost + PSO framework provides a practical, rapid, and reliable decision-support tool for CO₂-WAG operations. Its applications extend beyond CO₂-WAG to other areas of the oil and gas sector, such as chemical injection processes. Furthermore, this workflow can also be used in other fields facing similar optimization challenges.

Statements

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Author contributions

SU: Conceptualization, Formal analysis, Investigation, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing. LX: Conceptualization, Supervision, Validation, Writing – review & editing. GD: Writing – review & editing. TA-A: Funding acquisition, Writing – review & editing. ME: Writing – review & editing. AA: Writing – review & editing.

Funding

The author(s) declared that financial support was received for this work and/or its publication. This study had the support of national funds through Fundaçãopara a Ciência e Tecnologia, I. P. (FCT), under the projects UIDB. This study is also supported by the National Natural Science Foundation of China (Grant No. 52274048) and Beijing Natural Science Foundation (Grant No. 3222037). Open access funding provided by UiT-The Arctic University Norway.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that Generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fclim.2025.1710187/full#supplementary-material

References

1
AhmadiM. A.ZendehboudiS.JamesL. A. (2018). Developing a robust proxy model of CO₂ injection: coupling Box–Behnken design and a connectionist method. Fuel215, 904–914. doi: 10.1016/j.fuel.2017.11.030
- CrossRef
- Google Scholar
2
Al-BayatiD.SaeediA.MyersM.WhiteC.XieQ.ClennellB. (2018). Insight investigation of miscible SCCO₂ water alternating gas (WAG) injection performance in heterogeneous sandstone reservoirs. J. CO₂ Util.28, 255–263. doi: 10.1016/j.jcou.2018.10.010
- CrossRef
- Google Scholar
3
Al-KhdheeawiE. A.VialleS.BarifcaniA.SarmadivalehM.IglauerS. (2018). Effect of wettability heterogeneity and reservoir temperature on CO₂ storage efficiency in deep saline aquifers. Int. J. Greenhouse Gas Control68, 216–229. doi: 10.1016/j.ijggc.2017.11.016
- CrossRef
- Google Scholar
4
AlRassasA. M.Al-AlimiD.ZossederK.Al-qanessM. A. A. (2025). AI-driven predictive framework for CO₂ sequestration and enhanced oil recovery: insights from a depleted oil reservoir. J. Clean. Prod.519:146054. doi: 10.1016/j.jclepro.2025.146054
- CrossRef
- Google Scholar
5
AlrassasA. M.Vo ThanhH.RenS.SunR.Al-AreeqN. M.KolawoleO.et al. (2022). CO₂ sequestration and enhanced oil recovery via the water alternating gas scheme in a mixed transgressive sandstone-carbonate reservoir: case study of a large Middle East oilfield. Energy Fuel36, 10299–10314. doi: 10.1021/acs.energyfuels.2c02185
- CrossRef
- Google Scholar
6
BachuS. (2000). Sequestration of CO₂ in geological media: criteria and approach for site selection in response to climate change. Energy Convers. Manag.41, 953–970. doi: 10.1016/S0196-8904(99)00149-1
- CrossRef
- Google Scholar
7
BergenK. J.JohnsonP. A.de HoopM. V.BerozaG. C. (2019). Machine learning for data-driven discovery in solid Earth geoscience. Science363:eaau0323. doi: 10.1126/science.aau0323,
8
ChenT.GuestrinC. (2016). XGBoostProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 785–794
- Google Scholar
9
ChristensenJ. R.StenbyE. H.SkaugeA. (2001). Review of WAG field experience. SPE Reserv. Eval. Eng.4, 97–106. doi: 10.2118/71203-pa
- CrossRef
- Google Scholar
10
CMG (2022). CMG GEM user’s guide. Calgary, AB: Computer Modelling Group Ltd.
- Google Scholar
11
Czernichowski-LauriolI.BerenblyumR.BigiS.CarM.GastineM.PersogliaS.et al. (2018). CO2GeoNet actions in Europe for advancing CCUS through global cooperation. Energy Procedia154, 73–79. doi: 10.1016/j.egypro.2018.11.013
- CrossRef
- Google Scholar
12
DaiZ.ViswanathanH.MiddletonR.PanF.AmpomahW.YangC.et al. (2016). CO₂ accounting and risk analysis for CO₂ sequestration at enhanced oil recovery sites. Environ. Sci. Technol.50, 7546–7554. doi: 10.1021/acs.est.6b01744,
13
DaiZ.ViswanathanH.XiaoT.MiddletonR.PanF.AmpomahW.et al. (2017). CO₂ sequestration and enhanced oil recovery at depleted oil/gas reservoirs. Energy Procedia114, 6957–6967. doi: 10.1016/j.egypro.2017.08.034
- CrossRef
- Google Scholar
14
DaiZ.ZhangY.BielickiJ.AmooieM. A.ZhangM.YangC.et al. (2018). Heterogeneity-assisted carbon dioxide storage in marine sediments. Appl. Energy225, 876–883. doi: 10.1016/j.apenergy.2018.05.038
- CrossRef
- Google Scholar
15
DziejarskiB.KrzyżyńskaR.AnderssonK. (2023). Current status of carbon capture, utilization, and storage technologies in the global economy: a survey of technical assessment. Fuel342:127776. doi: 10.1016/j.fuel.2023.127776
- CrossRef
- Google Scholar
16
GaoM.LiuZ.QianS.LiuW.LiW.YinH.et al. (2023). Machine-learning-based approach to optimize CO₂-WAG flooding in low permeability oil reservoirs. Energies16:6149. doi: 10.3390/en16176149
- CrossRef
- Google Scholar
17
HanL.GuY. (2014). Optimization of miscible CO₂ water-alternating-gas injection in the bakken formation. Energy Fuel28, 6811–6819. doi: 10.1021/ef501547x
- CrossRef
- Google Scholar
18
HeR.MaW.MaX.LiuY. (2021). Modeling and optimizing for operation of CO₂-EOR project based on machine learning methods and greedy algorithm. Energy Rep.7, 3664–3677. doi: 10.1016/j.egyr.2021.05.067
- CrossRef
- Google Scholar
19
HillT.MarquezL.O’ConnorM.RemusW. (1994). Artificial neural network models for forecasting and decision making. Int. J. Forecast.10, 5–15. doi: 10.1016/0169-2070(94)90045-0
- CrossRef
- Google Scholar
20
HsuC.-W.ChenL.-T.HuA. H.ChangY.-M. (2012). Site selection for carbon dioxide geological storage using analytic network process. Sep. Purif. Technol.94, 146–153. doi: 10.1016/j.seppur.2011.08.019
- CrossRef
- Google Scholar
21
JiaoS.LiW.LiZ.GaiJ.ZouL.SuY. (2024). Hybrid physics-machine learning models for predicting rate of penetration in the Halahatang oil field, Tarim Basin. Sci. Rep.14:5957. doi: 10.1038/s41598-024-56640-y,
22
JinL.PekotL. J.SmithS. A.SalakoO.PetersonK. J.BosshartN. W.et al. (2018). Effects of gas relative permeability hysteresis and solubility on associated CO₂ storage performance. Int. J. Greenhouse Gas Control75, 140–150. doi: 10.1016/j.ijggc.2018.06.002
- CrossRef
- Google Scholar
23
KaluleR.AbderrahmaneH. A.AlameriW.SassiM. (2023). Stacked ensemble machine learning for porosity and absolute permeability prediction of carbonate rock plugs. Sci. Rep.13:9855. doi: 10.1038/s41598-023-36096-2,
24
KennedyJ.EberhartR. (1995). Particle swarm optimization. Proceedings of ICNN’95—International Conference on Neural Networks. 1942–1948
- Google Scholar
25
KhatherM.YekeenN.Al-YaseriA.Al-MukainahH.GiwelliA.SaeediA. (2022). The impact of wormhole generation in carbonate reservoirs on CO₂-WAG oil recovery. J. Petroleum Sci. Eng.212:110354. doi: 10.1016/j.petrol.2022.110354
- CrossRef
- Google Scholar
26
KolawoleO.IspasI.KumarM.WeberJ.ZhaoB.ZanoniG. (2021). How can biogeomechanical alterations in shales impact caprock integrity and CO₂ storage?Fuel291:120149. doi: 10.1016/j.fuel.2021.120149
- CrossRef
- Google Scholar
27
KrizhevskyA.SutskeverI.HintonG. E. (2017). ImageNet classification with deep convolutional neural networks. Commun. ACM60, 84–90. doi: 10.1145/3065386
- CrossRef
- Google Scholar
28
KurzB. A.HeebinkL. V.EylandsK. E.SmithS. A.HamlingJ. A.KlapperichR. J.et al. (2013). “Bell Creek test site—preinjection geochemical report” in Plains CO₂ Reduction 765 (PCOR) Partnership Phase III Task 4—Deliverable D33, Prepared for National Energy 766 Technology Laboratory U.S. Department of Energy Cooperative Agreement No. DE-FC26-767 (Grand Forks, ND: Energy & Environmental Research Center).
- Google Scholar
29
LacknerK. S. (2003). A guide to CO₂ sequestration. Science300, 1677–1678. doi: 10.1126/science.1079033,
30
LakeL. W.JohnsR.RossenB.PopeG. (2014). Fundamentals of enhanced oil recovery. Richardson, TX: Society of Petroleum Engineers (SPE).
- Google Scholar
31
LandC. S. (1968). Calculation of imbibition relative permeability for two- and three-phase flow from rock properties. Soc. Pet. Eng. J.8, 149–156. doi: 10.2118/1942-PA
- CrossRef
- Google Scholar
32
LiH.GongC.LiuS.XuJ.ImaniG. (2022). Machine learning-assisted prediction of oil production and CO₂ storage effect in CO₂-water-alternating-gas injection (CO₂-WAG). Appl. Sci.12:10958. doi: 10.3390/app122110958
- CrossRef
- Google Scholar
33
LiF.GuoX.QiX.FengB.LiuJ.XieY.et al. (2025). A surrogate model-based optimization approach for geothermal well-doublet placement using a regularized LSTM-CNN model and grey wolf optimizer. Sustainability17:266. doi: 10.3390/su17010266
- CrossRef
- Google Scholar
34
LiuM.LiZ.QiJ.MengY.ZhouJ.NiM.et al. (2024). Prediction of CO₂ storage in different geological conditions based on machine learning. Energy Fuel38, 22340–22350. doi: 10.1021/acs.energyfuels.4c04274
- CrossRef
- Google Scholar
35
LuoJ.MaX.JiY.LiX.SongZ.LuW. (2023). Review of machine learning-based surrogate models of groundwater contaminant modeling. Environ. Res.238:117268. doi: 10.1016/j.envres.2023.117268,
36
MaoS.ChenB.MalkiM.ChenF.MoralesM.MaZ.et al. (2024). Efficient prediction of hydrogen storage performance in depleted gas reservoirs using machine learning. Appl. Energy361:122914. doi: 10.1016/j.apenergy.2024.122914
- CrossRef
- Google Scholar
37
NaghizadehA.JafariS.Norouzi-ApourvariS.SchaffieM.Hemmati-SarapardehA. (2024). Multi-objective optimization of water-alternating flue gas process using machine learning and nature-inspired algorithms in a real geological field. Energy293:130413. doi: 10.1016/j.energy.2024.130413
- CrossRef
- Google Scholar
38
OtmaneM.ImtiazS.JalutaA. M.AborigA. (2025). Boosting reservoir prediction accuracy: a hybrid methodology combining traditional reservoir simulation and modern machine learning approaches. Energies18:657. doi: 10.3390/en18030657
- CrossRef
- Google Scholar
39
RasmussonK.RasmussonM.TsangY.NiemiA. (2016). A simulation study of the effect of trapping model, geological heterogeneity and injection strategies on CO₂ trapping. Int. J. Greenhouse Gas Control52, 52–72. doi: 10.1016/j.ijggc.2016.06.020
- CrossRef
- Google Scholar
40
RenD.WangX.KouZ.WangS.WangH.WangX.et al. (2023). Feasibility evaluation of CO₂ EOR and storage in tight oil reservoirs: a demonstration project in the Ordos Basin. Fuel331:125652. doi: 10.1016/j.fuel.2022.125652
- CrossRef
- Google Scholar
41
RodriguesH.MackayE.ArnoldD.SilvaD. (2019). Optimization of CO₂-WAG and calcite scale management in pre-salt carbonate reservoirs. Offshore Technology Conference Brasil, (OTC)
- Google Scholar
42
SanchezN. L. (1999). Management of water alternating gas (WAG) injection projects. Latin American and Caribbean Petroleum Engineering Conference, (SPE)
- Google Scholar
43
SenD.ChenH.Datta-GuptaA. (2022). Inter-well connectivity detection in CO₂ WAG projects using statistical recurrent unit models. Fuel311:122600. doi: 10.1016/j.fuel.2021.122600
- CrossRef
- Google Scholar
44
SongY.SungW.JangY.JungW. (2020). Application of an artificial neural network in predicting the effectiveness of trapping mechanisms on CO₂ sequestration in saline aquifers. Int. J. Greenhouse Gas Control98:103042. doi: 10.1016/j.ijggc.2020.103042
- CrossRef
- Google Scholar
45
SpiteriE. J.JuanesR.BluntM. J.OrrF. M. (2005). Relative permeability hysteresis: trapping models and application to geological CO₂ sequestration. SPE Annual Technical Conference and Exhibition, (SPE)
- Google Scholar
46
SunX.LiuJ.DaiX.WangX.YapantoL. M.ZekiyA. O. (2021). On the application of surfactant and water alternating gas (SAG/WAG) injection to improve oil recovery in tight reservoirs. Energy Rep.7, 2452–2459. doi: 10.1016/j.egyr.2021.04.034
- CrossRef
- Google Scholar
47
TalebkeikhahM.SadeghtabaghiZ.ShabaniM. (2021). A comparison of machine learning approaches for prediction of permeability using well log data in the hydrocarbon reservoirs. J. Hum. Earth Future2, 82–99. doi: 10.28991/HEF-2021-02-02-01
- CrossRef
- Google Scholar
48
Ud DinS.GuoD.NingF.XueL. (2022). Natural gas hydrate production methods: a review. J. Appl. Emerg. Sci. doi: 10.36785/jaes.121529
- CrossRef
- Google Scholar
49
Ud DinS.LiangX.DongdongG.FulongN. (2025). Inhibition effect on gas hydrate formation in various kinetic hydrate inhibitor systems. J. Porous Media. doi: 10.1615/JPorMedia.2025057554
- CrossRef
- Google Scholar
50
Ud DinS.WimalasiriR.EhsanM.LiangX.NingF.GuoD.et al. (2023). Assessing public perception and willingness to pay for renewable energy in Pakistan through the theory of planned behavior. Front. Energy Res.11:1088297. doi: 10.3389/fenrg.2023.1088297
- CrossRef
- Google Scholar
51
VaziriP.SedaeeB. (2023). A machine learning-based approach to the multiobjective optimization of CO₂ injection and water production during CCS in a saline aquifer based on field data. Energy Sci. Eng.11, 1671–1687. doi: 10.1002/ese3.1412
- CrossRef
- Google Scholar
52
Vo ThanhH.SugaiY.NgueleR.SasakiK. (2020). Robust optimization of CO₂ sequestration through a water alternating gas process under geological uncertainties in Cuu Long Basin, Vietnam. J. Nat. Gas Sci. Eng.76:103208. doi: 10.1016/j.jngse.2020.103208
- CrossRef
- Google Scholar
53
WangF.PingS.YuanY.SunZ.TianH.YangZ. (2021). Effects of the mechanical response of low-permeability sandstone reservoirs on CO₂ geological storage based on laboratory experiments and numerical simulations. Sci. Total Environ.796:149066. doi: 10.1016/j.scitotenv.2021.149066,
54
WuM.XuJ.HuP.LuQ.XuP.ChenH.et al. (2022). An adaptive surrogate-assisted simulation-optimization method for identifying release history of groundwater contaminant sources. Water14:1659. doi: 10.3390/w14101659
- CrossRef
- Google Scholar
55
XuY.ZhaoX.ChenY.YangZ. (2019). Research on a mixed gas classification algorithm based on extreme random tree. Appl. Sci.9:1728. doi: 10.3390/app9091728
- CrossRef
- Google Scholar
56
XueL.LiD.DouH. (2023a). Artificial intelligence methods for oil and gas reservoir development: current progresses and perspectives. Adv. Geo-Energy Res.10, 65–70. doi: 10.46690/ager.2023.10.07
- CrossRef
- Google Scholar
57
XueL.WangJ.HanJ.YangM.MwasmwasaM.NangukaF. (2023b). Gas well performance prediction using deep learning jointly driven by decline curve analysis model and production data. Adv. Geo-Energy Res.8, 159–169. doi: 10.46690/ager.2023.06.03
- CrossRef
- Google Scholar
58
XueL.XuS.NieJ.QinJ.HanJ. X.LiuY. T.et al. (2024). An efficient data-driven global sensitivity analysis method of shale gas production through convolutional neural network. Pet. Sci.21, 2475–2484. doi: 10.1016/j.petsci.2024.02.010
- CrossRef
- Google Scholar
59
XueZ.ZhangK.ZhangC.MaH.ChenZ. (2023). Comparative data-driven enhanced geothermal systems forecasting models: a case study of Qiabuqia field in China. Energy280:128255. doi: 10.1016/j.energy.2023.128255
- CrossRef
- Google Scholar
60
XueL.ZhuY.RenJ.LiaoH.DaiQ.TuB. (2025). Coupled optimization method for CO₂-EOR and storage based on machine learning. J. Porous Media28, 37–53. doi: 10.1615/JPorMedia.2024052865
- CrossRef
- Google Scholar
61
YaoP.YuZ.ZhangY.XuT. (2023). Application of machine learning in carbon capture and storage: an in-depth insight from the perspective of geoscience. Fuel333:126296. doi: 10.1016/j.fuel.2022.126296
- CrossRef
- Google Scholar
62
YouJ.AmpomahW.SunQ. (2020). Co-optimizing water-alternating-carbon dioxide injection projects using a machine learning assisted computational framework. Appl. Energy279:115695. doi: 10.1016/j.apenergy.2020.115695
- CrossRef
- Google Scholar
63
ZhongZ.LiuS.CarrT. R.Takbiri-BorujeniA.KazemiM.FuQ. (2019). Numerical simulation of water-alternating-gas process for optimizing EOR and carbon storage. Energy Procedia158, 6079–6086. doi: 10.1016/j.egypro.2019.01.507
- CrossRef
- Google Scholar
64
ZubarevD. I. (2009). Pros and cons of applying proxy-models as a substitute for full reservoir simulations. SPE Annual Technical Conference and Exhibition, (SPE).
- Google Scholar

Summary

Keywords

ANN, CCUS, CO₂-WAG, CO₂-WAG parameters, machine learning, XGBoost

Citation

Ud Din S, Xue L, Dongdong G, Abu-Alam T, Ehsan M and Ahmed AA (2026) A robust deep learning framework for predicting carbon dioxide-water alternating gas injection performance and optimization. Front. Clim. 7:1710187. doi: 10.3389/fclim.2025.1710187

Received

29 September 2025

Revised

12 December 2025

Accepted

30 December 2025

Published

15 January 2026

Volume

7 - 2025

Edited by

Yanhui Han, Houston Research Center, United States

Reviewed by

Eslam Gomaa Al-Sakkari, Polytechnique Montréal, Canada

Mostafa Saghafi, Bruno Kessler Foundation (FBK), Italy

Updates

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Shahab Ud Din, engr.shahab.pg@gmail.com; Liang Xue, xueliang@cup.edu.cn; Tamer Abu-Alam, tamer.abu-alam@uit.no

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

ORIGINAL RESEARCH article

A robust deep learning framework for predicting carbon dioxide-water alternating gas injection performance and optimization

Abstract

1 Introduction

2 Methodology

2.1 Base model

2.2 Database generation

2.2.1 Cumulative oil production

2.2.2 Total CO2 stored

2.2.3 Storage efficiency

2.2.4 Coupled optimization

2.3 Proxy model development

2.4 Machine learning algorithms

2.4.1 Artificial neural network

2.4.2 XGBoost algorithm

2.5 Optimization algorithm

2.5.1 Particle swarm optimization

3 Results and discussion

3.1 Base case analysis

3.1.1 Effect of WAG cycle size and ratio

3.1.2 Effect of water and gas injection rate

3.2 Machine learning model’s performance

3.3 Coupled optimization

3.4 Research implications

3.5 Limitations and future work

4 Conclusion

Statements

Data availability statement

Author contributions

Funding

Conflict of interest

Generative AI statement

Publisher’s note

Supplementary material

References

Summary

Outline

Figures

Cite article

Share article

Article metrics

2.2.2 Total CO₂ stored