A Novel Shale Gas Production Prediction Model Based on Machine Learning and Its Application in Optimization of Multistage Fractured Horizontal Wells

Shale gas production prediction and horizontal well parameter optimization are significant for shale gas development. However, conventional reservoir numerical simulation requires extensive resources in terms of labor, time, and computations, and so the optimization problem still remains a challenge. Therefore, we propose, for the first time, a new gas production prediction methodology based on Gaussian Process Regression (GPR) and Convolution Neural Network (CNN) to complement the numerical simulation model and achieve rapid optimization. Specifically, through sensitivity analysis, porosity, permeability, fracture half-length, and horizontal well length were selected as influencing factors. Second, the n-factorial experimental design was applied to design the initial experiment and the dataset was constructed by combining the simulation results with the case parameters. Subsequently, the gas production model was built by GPR, CNN, and SVM based on the dataset. Finally, the optimal model was combined with the optimization algorithm to maximize the Net Present Value (NPV) and obtain the optimal fracture half-length and horizontal well length. Experimental results demonstrated the GPR model had prominent modeling capabilities compared with CNN and Support Vector Machine (SVM) and achieved the satisfactory prediction performance. The fracture half-length and well length optimized by the GPR model and reservoir numerical simulation model converged to almost the same values. Compared with the field reference case, the optimized NPV increased by US$ 7.43 million. Additionally, the time required to optimize the GPR model was 1/720 of that of numerical simulation. This work enriches the knowledge of shale gas development technology and lays the foundation for realizing the scale-benefit development for shale gas, so as to realize the integration of geological engineering.


INTRODUCTION
In recent years, the successful development of unconventional petroleum resources, especially shale gas, indicates that the oil and gas industry has made substantial breakthroughs and innovations in theory and technology, greatly expanding the field of petroleum exploration (He et al., 2020). At present, China has built hundreds of billion cubic meters of marine shale gas fields in the upper Yangtze region, such as Fuling, Changning, and Weiyuan (Jia et al., 2016). Shale gas exists in the micro nanopores of shale in the form of free adsorption, which requires horizontal wellbores and hydraulic fracture treatment to develop effectively (Chen et al., 2020;Hu et al., 2021;Yang et al., 2021). The economic benefit of many shale gas wells is often in the boundary benefit, so it is critical to balance the investment and profit. The fracture half-length and horizontal well length have a great influence on the entire investment and profit. Reservoir numerical simulation results show that revenue increases with the rise of fracture half-length, whereas the simulation results of hydraulic fracturing illustrate that the fracturing cost increases with the rise of the half-length ( Figure 1). When the NPV (income minus expenditure) reaches the maximum value, the half-length is optimal. There is a similar relationship between the horizontal well length and the NPV. Therefore, it is of great significance to determine the optimal fracture half-length and horizontal well length for oil and gas field development.
There are currently two main approaches for determining optimal fracture half-length and horizontal well length. The first one is to compare several design schemes and then determine optimal fracture half-length and horizontal well length empirically, which is often used in practical production. Although this method is simple and easy to implement, it relies heavily on human experience and is difficult to get the optimal solution due to many factors of reservoir geological uncertainty. The second one is the computation optimization method based on optimization theory and reservoir numerical simulation. The main idea is to set the objective function (e.g., the NPV) and then utilize an optimization algorithm to call the numerical simulation software to obtain the optimal solution, which has been studied extensively (Babaei and Pan, 2016;Jahandideh and Jafarpour, 2016). Xu et al. (2018) combined the embedded discrete fracture model (EDFM) and intelligent algorithm to maximize the NPV. Differential evolution (DE) has been employed on a shale gas reservoir simulation model to optimize fracture half-length and fracture spacing (Rammay and Awotunde, 2016). Wang and Chen (2019a) proposed a generalized differential evolution (GDE) algorithm to optimize the well spacing, fracture spacing, half-length, and conductivity of tight oil horizontal wells. Although it has high precision and is recognized as the standard decision-making technique, it requires significant resources in terms of labor, time, and computation. Because each single well optimization needs hundreds of iterations, and each iteration takes more than 10 min, the optimization of fracture parameters in the whole area needs tens of thousands of iterations, which means more than 2 months. The traditional computation optimization is completely dependent on numerical simulation, which is inefficient and difficult to converge quickly. Therefore, we need a shale gas production forecasting model with high efficiency and good performance to optimize fracture halflength and horizontal well length.
A practical alternative is to use a proxy model, which is well suited for repeated calculations. There are two main types of surrogate models, where one is the reduced physical model (Wilson and Durlofsky, 2013;Pouladi et al., 2017), and the other one is the data-driven model (Zhou et al., 2014;Kulga et al., 2017;Wang and Chen, 2019b;Wang et al., 2021;Xue et al., 2021). The data-driven model can quickly establish a mathematical model approaching the accuracy of the numerical simulation model by sampling the reservoir numerical simulator. It is reported that GA recursive sampling assisted a dynamically updated Artificial Neural Network (ANN) method to optimize production (Golzari et al., 2015). A datadriven forecasting technique is built based on ANN to complement the simulator-based model (Kulga et al., 2017). The oil production models are developed based on Random Forests (RFs), Support Vector Regression (SVR), and Gradient-Boosting Machine (GBM) (Schuetter et al., 2018). However, few researchers have optimized hydraulic fracture parameters assisting the data-driven model. Consequently, in this study, we propose adopting the Gaussian Process Regression (GPR) and Convolution Neural Network (CNN) in the machine learning field as the proxy models to predict production and utilize the GA and penalty function method to obtain the optimal solution based on these two proxy models.
GPR is a supervised learning model based on Bayesian theory, which has a strict statistical learning foundation and can solve the complex regression problems (Yu et al., 2017). Compared with other regression methods, GPR is characterized by high prediction accuracy, strong generality, and robust performance, which has been widely used in the machine learning field (Ganti and Khare, 2020;Zhang and Xu, 2020). As a new research direction in machine learning, deep learning has attracted growing attentions (Zheng et al., 2020). CNN is a kind of multilayer feedforward neural network with convolution operations. Unlike the BP, which only has a fully connected layer, the local connection, weight sharing, and pooling operation of CNN can enhance the representation ability and robustness of a network (Karpathy et al., 2014).
This research attempts to build a robust and rapid production prediction proxy model based on machine learning to complement the physical-driven model and obtain the optimal fracture parameters by applying the optimization algorithm on the proxy models. The article has been organized in the following way. Specifically, the sensitivity analysis of reservoir parameters and hydraulic fracturing parameters is carried out to study the contribution of each parameter to production. The n-factorial experimental design is used to design the initial experiments, and 256 cases are obtained. Then 256 cases are simulated by CMG, and the dataset is constructed by combining the simulation results with the case parameters. Subsequently, the gas production model is built by GPR, CNN, and SVM based on the dataset. Finally, the optimal model of the three models is combined with the optimization algorithm to maximize the NPV and obtain the optimal fracture half-length and horizontal well length. The performance of the proxy model is assessed in terms of time efficiency and prediction accuracy by comparing the results with the simulator-based approach.

DATA PREPARATION
Before establishing the data-driven model, it is necessary to collect the data of production and geological engineering parameters. It is better to collect real data, but if there is no high quality real data, numerical simulation data can be used. In this study, a numerical simulator is applied to obtain annual production data of shale gas under the given geological engineering parameters. At present, about 30 wells have been drilled in this area. The shale is mainly developed in the Lower Silurian Longmaxi Formation to the Upper Ordovician Wufeng Formation, with an effective thickness of 30-40 m. The buried depth of the gas reservoir is 2000-3000 m, the pressure coefficient is 1.1-1.3, and the formation temperature is about 80°C. The average matrix porosity, average matrix permeability, average matrix gas saturation, and the average adsorbed gas content are 4%, 0.00003 md, 65%, and 3 m 3 /t, respectively.
According to the reservoir geological parameters and horizontal well engineering parameters in the study area, a single-well numerical model of shale gas multistage fracturing horizontal well was established by CMG software ( Table 1).
The gas reservoir size was 4200 m × 500 m × 30 m, and the reservoir type was the dual-porosity and dual-permeability model. The established reservoir model was a rectangular grid system with horizontal wells in the center of the model. The horizontal well was placed in the middle layer in the reservoir model, and the production duration was set to 15 years. The hydraulic fracture propagated along the direction of the maximum principal stress. Figure 2 shows a top view of the hydraulic fracture model. The grid we set was 20 m. Local grid refinement (LGR) was applied to simulate fluid flow in hydraulic fractures accurately.
By analyzing the study area's geological engineering conditions, we selected four key influencing parameters, including matrix porosity, matrix permeability, fracture halflength, and horizontal well length. The n-factorial experimental design procedure was utilized to build a data set for building the proxy model. The range of four factors were divided into four equally-spaced levels ( Table 2), so 256 cases were obtained. Other experimental design method, such as response surface design, only needs 29 experiments, which cannot achieve the required accuracy. Therefore, an n-factorial experimental design was adopted. If the variables increase, we can use Latin hypercube design, response surface design, and so on. All cases were run using CMG-GEM numerical simulation software to obtain the corresponding performance (Supplementary Table S1).

Principle of GPR
In recent years, machine learning has been widely used in all walks of life. The purpose of machine learning is to use algorithms to build statistical models for the collected data and to solve practical problems by building good statistical models. There are many regression learning algorithms: nonlinear regression, Support Vector Machine regression (SVM), Gaussian Process Regression (GPR), Regression Tree (RT), Random Forest (RF), and so on. Considering the advantages of GPR model mentioned above, we use GPR model (Hamdi et al., 2017) to build production prediction model. GPR is a nonparametric probability model for regression analysis of data based on the Bayesian principle. The basic principle of GPR is as follows. Firstly, given the Gaussian Process (GP) prior distribution, the mean and covariance of the GP posterior distribution are calculated based on the prior and the hypothesis of training data (joint Gaussian distribution). When the input data (x i ) and the output data (y i ) of the training data are given, the trained GPR model can predict the new output data (y p ) according to input data (x p ).
Assuming that the noise between the observed value and the prediction function satisfies the normal distribution, the expressions are as follows : where x i is the input vector, f is the prediction function, andy i is the observation value polluted by noise. ε is noise, following the normal distribution (N) of mean value 0 and variance σ 2 . GP is a set of any finite number of random variables with a joint Gaussian distribution, whose properties are entirely determined by the mean function and the covariance function: where m(x) and k(x, x′) are the mean function and covariance function, respectively. The prior distributions of the observed values y are as follows: According to Bayes principle, the prior joint distribution of the observed value y and the predicted value f p (the output value corresponding to x p ) are where K(X, X) K n (k ij ) is the symmetric positive definite matrices of order n × n, the matrix element (k ij ) k(x i , x j ) is implemented to measure the correlation between x i and x j , and K(X, x p ) K(x p , X) T are the n × 1 covariance matrix between the test input values x p and the training input values. K(x p , x p ) is covariance matrix of test input values x p . I n is an n-dimensional identity matrix. According to the marginal distribution property of normal joint distribution, the posterior distribution of f p can be modeled as where f p and cov(f p ) are the mean and variance of the predicted value f p .

Covariance Functions Optimization
Since the covariance function in the GPR method is a symmetric function satisfying the Mercer condition, the covariance function is equivalent to a kernel function, which can be parameterized by vectors θ in kernel parameters. Therefore, the expression of a covariance function is k(x i , x j θ). Kernel functions can be divided into two types according to the characteristic length scale. If the length scale is the same, the kernel function encompasses squared exponential kernel, exponential kernel, Matern 3/2, Matern 5/2, and rational quadratic kernel. The kernel functions with a separate length scale realizing the automatic correlation determination include ARD squared exponential kernel, ARD exponential kernel, ARD Matern 3/2, ARD Matern 5/2, and ARD Rational Quadratic Kernel. The squared exponential kernel is the most commonly used kernel function, which is defined as where x i and x j are the training input data and σ f and σ l are the hyperparameters of the kernel function, which represent signal standard deviation and length scale. Kernel function directly affect the accuracy of the GPR algorithm. We need to optimize the kernel function. The commonly used effective method for the limited dataset is cross-validation, which comprises k-fold cross-validation and leave-one-out cross-validation. K-fold cross-validation randomly divides the sample data into k copies, each time selects k-1 as the training set and the remaining one as the test set. Leave-one-out cross-validation is a special case of k-fold cross validation, where k is equal to the number of samples. This method is suitable for the case of small data.
We applied MATLAB software to establish the GPR technique to forecast production. In order to ensure the reliability of the model, the scheme of 10-fold cross-validation was adopted. The performance was evaluated using the coefficient of determination (R 2 ) and root mean square error (RMSE).
where z i is the gas production predicted by the data-driven model, y i represents the gas production calculated by a numerical simulation model, z l represents the average gas production predicted by the data-driven model, y l represents the average gas production calculated by a numerical simulation model, and N is the number of samples.

Convolution Neural Network
Deep learning can achieve complex function approximation by learning a group of kernel parameters of a nonlinear network, which shows a strong ability to learn the essential characteristics from a small sample set. CNN is a multilayer feedforward neural network with a convolution structure and becomes one of the most concerned research hotspots Zheng et al., 2020). Unlike the traditional fully connected feedforward neural network, CNN has the characteristics of local connection and parameter sharing, which reduces the complexity of the network and improves computational efficiency. In image processing, 2D convolution is usually used to extract features layer by layer and then complete the corresponding visual tasks. The convolution kernel size is usually square N × N, and N represents the width or height of the convolution kernel. In addition, convolution neural network has 1D convolution operation, which can process sequence data. The convolution kernel used is N, and N represents the length of the convolution kernel. Similar to 2D convolution neural network, 1D convolution kernel also processes input data layer by layer to extract discriminative features, so as to complete classification, regression, and other tasks. As shown in Figure 3, the typical CNN mainly comprises the input layer, convolution layer, downsampling layer (pooling layer), full connection layer, and output layer. The convolution layer and pooling layer are the special network structure of CNN. The convolution layer is responsible for extracting the convolution features of the input data, and the pooling layer is used to down sample the convolution features to improve the robustness.
The role of the convolutional layer is to use convolution operations to extract features. The more the convolutional layers are, the stronger the expressive ability of features is. By designing multiple convolution kernels in the convolution layer, CNN can extract a variety of different types of local features and then extract higher-level features layer by layer. Finally, global features can be abstracted through deep learning of local features. A layer of CNN can contain several convolution filters (each containing a deviation coefficient). The filters in the upper left corner slide through each input image from left to right and from top to bottom, and convolution are performed in each iteration. Then, the sum of convolution result and deviation value is converted nonlinearly. Commonly used activation functions include ReLU, sigmoid, and tanh.
According to the local connection principle of CNN, each pooling layer is also composed of multiple sampling neurons, and each unit is also connected to the corresponding receptive field of the previous layer of network. But different from the convolutional layer, all the weights of the neurons in this layer and the local connection of the previous layer are fixed constants. Downsampling is performed by taking the local maximum value or the average value to generate all the local features in the same filter. The pooling operation not only further reduces the network scale and suppresses possible over-fitting, but also improves the robustness of the extracted features, while retaining the salient features and shift invariance.
The CNN proposed in this study mainly had five layers. The first layer was porosity, matrix permeability, fracture half-length, Frontiers in Earth Science | www.frontiersin.org August 2021 | Volume 9 | Article 726537 and horizontal well length, respectively. The second, third, and fourth layers were the convolutional layers, which had 5, 10, and 15 convolution kernels with a size of 3 × 1. The fifth layer was a fully connected layer with a total of 15 neurons. The output layer was the predicted gas production value. A stochastic gradient descent algorithm was used to train CNN, and the activation function was Sigmoid. The number of iterations was 5,000, and the weight values were updated repeatedly until the error meets the requirements.
1) First, input data were normalized to make all the data on the same magnitude.
where attr denotes porosity, matrix permeability, fracture halflength, and well length, x attr is the average attr value of all samples, x i,attr donates the value of the i-th sample, s attr is the variance of attr of all samples and X i,attr is the normalized value of the sample, and N is the number of samples.
2) Three convolution layers with a different number of kernels performed 1D convolution on the input data: where Y CNN 1 , Y CNN 2 , and Y CNN 3 are the outputs of the second, third, and fourth layers, and ReLU is the activation function. W 1 ∈ R 5×1×2 , W 2 ∈ R 10×5×2 , and W 3 ∈ R 15×10×2 are parameters of the convolution kernels of the second, third, and fourth layers, respectively. b 1, b 2, b 3 are the bias. The symbol ʘ denotes the convolution operation.
3) Next, the fully connected layer was utilized to predict the gas production.
where Q gas is cumulative gas production in 15 years. Averpool denotes the average pooling layer, W 4 ∈ R 15×1 denotes the parameters of the fully connected layer, b 4 is the bias. 4) Finally, the performance was evaluated using R 2 and RMSE.

Support Vector Machine Regression
SVM is based on the theory of small sample statistical learning proposed by Vapnik (2000), which focuses on the statistical learning rule under small sample data. Compared with neural network, SVM has strict theoretical and mathematical foundation and has strong generalization ability. It also has a weak dependence on the number of samples. The basic idea of support vector machine regression theory is to find a nonlinear mapping (ϕ) from input space to output space. Through this nonlinear mapping, the data (x) are mapped to a high-dimensional feature space (F). The nonlinear regression function of SVM is as follows: where a i and a p i are Lagrange multiplier and K(x i , x j ) is kernel function.
The kernel function of the SVM converts nonlinear separable samples into a linearly separable feature space. Different kernel functions produce different classification hyperplanes, so the kernel function has a direct impact on the performance of SVM. We applied MATLAB software to establish the SVM technique to forecast production. In order to ensure the Frontiers in Earth Science | www.frontiersin.org August 2021 | Volume 9 | Article 726537 reliability of the model, the scheme of 10-fold cross-validation was adopted. The performance of different kernel functions was evaluated using the R 2 and RMSE.

Economic Model
To improve the economic benefit of the study area, the economic NPV is considered the objective function to optimize the fracturing parameters of shale gas horizontal wells. NPV consists of two parts: cash inflow and cash outflow. According to the data of the study area, we adopt the following NPV mathematical model: where t is time, (year); b is discount rate, (%); P 0 is the gas price, (USD/m 3 ); Q gas is annual gas production, (m 3 /year); C operating is annual operating cost, (USD/year); C tax is tax, (USD/m 3 ); C ground is ground cost, (USD); C well is drilling cost, (USD); C fracture is fracturing cost, (USD); L is horizontal well length, (m); x f is fracture half-length, (m). L and x f are optimization variables. In the optimization process, the ranges of the two variables are 50-200 m, 1,000-4,000 m, respectively. Related parameters are shown in Table 3.
In the process of hydraulic fracturing, tens of thousands of cubic meters of fracturing fluid and thousands of tons of proppant are pumped into the ground to produce fractures and release natural gas. Therefore, fracturing cost mainly includes the cost of pumping fracturing fluid and proppant. A larger number of researches have been conducted on the relationship between the volume of fracturing fluid and fracture half-length (Rammay and Awotunde, 2016;Yang et al., 2017;Wang and Chen, 2019). Here, the relationship between them is obtained by referring to Rammay and Awotunde (2016) and Nordgren (1970).
where x f is fracture half-length, (m); Q is displacement, (m 3 /min); W is fracture width (m); S p is spurt loss volume, (m 3 /m 2 ); h is fracture height, (m); C is fluid loss coefficient, (m/ min √ ), n′ is rheological index, k′ is the consistency coefficient, (MPa·s); G is shear modulus, (MPa); v is Poisson's ratio; t is time, (min); erfc(x) is error compensation function of x; V is amount of fracturing fluid per stage (m 3 /stage). Table 4 shows the input parameters for calculating the fracture half-length according to the amount of fracturing fluid.
By analyzing the fracturing cost of horizontal wells in the study area, the relationship between fracturing fluid volume and fracturing cost is established, so as to finally obtain the relationship between fracture half-length and fracturing cost.
where C fracture is fracturing cost, (USD); T is construction time, (day); L is horizontal well length, (m); Q transport is transportation cost, (USD); Q tubing is tubing costs, (USD); P is the volume of proppant. The volume ratio of proppant to fracturing fluid is 0.035. The drilling cost is related to the length of the horizontal section. By analyzing the drilling cost data in the study area, the relationship between the length of the horizontal section and the drilling cost can be obtained.
The formula of operating cost is as follows: C operating 37, 000 + 0.05Q gas

Optimization Algorithm
Optimization algorithms are divided into traditional gradientbased optimization algorithms and evolutionary algorithms. Traditional gradient-based optimization algorithms, such as the penalty function method and the conjugate gradient method, have fast convergence speed, but they are easy to fall into the local optimum. Evolutionary algorithms such as particle swarm optimization (PSO) algorithm (Li et al., 2012) and genetic algorithm (GA) (Cen et al., 2016) have good global search capabilities, but their convergence speed is relatively slow. In practical application, we can employ these two optimization algorithms to benefit each other. In this article, we used MATLAB to build GA model and penalty function method.

Genetic Algorithm
As a classical global optimization algorithm, GA has been widely used in many fields, such as industrial design and economic management (Golzari et al., 2015;Zhang et al., 2019). GA starts the search process from a set of initial solutions, called population. Each individual in the population is a solution to the problem, called a chromosome. These chromosomes continue to evolve in subsequent iterations, known as heredity. GA is mainly realized by crossover, mutation, and selection. The next generation of chromosomes generated by crossover or mutation operations is called offspring. The quality of chromosomes is measured by fitness. According to the fitness, a certain number of individuals are selected from the previous generation and offspring as the next generation population and then continue to evolve. After several generations, the algorithm converges to the best chromosome, which is likely to be the optimal solution.

Penalty Function Method
The basic idea of the penalty function method is to transform the constraint into a kind of penalty function and add it to the objective function, thus transforming the constrained optimization problem into a series of unconstrained optimization problems (Hao et al., 2021). The penalty function method includes the interior point method, external penalty function method, and multiplier method. This research mainly adopts the interior point method, which has the advantages of simple structure and strong adaptability. Optimization problems with general constraints are where f (x) is the objective function, c i (x) is the constraint condition, and D 0 is the feasible region. The interior point method needs to construct the following augmented objective functions: where σ > 0 is penalty parameter and p(x) is obstacle function. It needs to satisfy the following properties: when x tends to the boundary at D 0 , at least one c i (x) tends to 0 and p(x) tends to infinity. There are two forms of obstacle function: In this way, when x is in D 0 , it is finite; when x is close to the boundary, p(x) → + ∞ the value of the augmented objective function tends to infinity, so it is severely punished.
Since the minimum point of constrained optimization problem is generally reached at the boundary of feasible region, the penalty factor in interior point method requires σ k → 0, so the solution problem is transformed into solving nonsequence constrained optimization subproblem.
The overall optimization framework is shown in Figure 4.

Sensitivity Analysis
Matrix porosity, matrix permeability, horizontal well length, and fracture half-length have an influence on gas production. In order to intuitively study the influence of various factors on the cumulative gas production, we made a correlation diagram. Matrix porosity and horizontal section length have linear relationships with gas production, while matrix permeability and fracture half-length are logarithmically related to the gas production ( Figure 5). The results of the sensitivity analysis show that these four factors have a significant influence on gas production.

Comparison of Gas Prediction Models
GPR model is parameterized by kernel function, which directly affects the prediction accuracy. In order to obtain a suitable kernel function, we experimented with the ten kernel functions. Through comparing different kernel functions (Table 5), we found that ARD squared exponential outperformed other kernel functions by providing higher R 2 and lower RMSE, so we chose it as the final kernel function. Different kernel functions will affect the prediction accuracy of SVM. As can be seen from Table 6, the polynomial kernel function provides higher R 2 and lower RMSE, so we chose it as the final kernel function. According to the optimal kernel function and kernel parameters of SVM, we can accurately establish the production prediction model.
In this section, the performances of the GPR model, CNN model, SVM model and model are compared. 256 experiments ( Table 2) were acquired from the initial experimental design for the shale gas estimation model. The cross-validation and kernel function  Frontiers in Earth Science | www.frontiersin.org August 2021 | Volume 9 | Article 726537 9 optimization were utilized to improve the prediction accuracy. R 2 and RMSE were used to compare the performance of three models on the validation set. As shown in Figure 6, R 2 of GPR, CNN, and SVM is 99.99, 98.07, 99.65, and 99.6%, respectively, and RMSE is 0.0127, 0.1690, and 0.0807, respectively. The GPR model reveals competitive modeling capabilities compared with CNN and SVM. This gives confidence in the model's predictive ability. Therefore, the GPR model is more reliable. The proxy model constructs an approximate model of production prediction by collecting a small number of samples. It is applied when calculating the production instead of directly calling the reservoir numerical simulation program, which effectively improves the optimization speed.
Additionally, to further study the performance of GPRR model to predict the gas production per year, we calculated the gas production of each year for 15 years under a given geological engineering parameter using surrogate model and compared it with the results obtained by a simulator-based approach. The fixed parameters are porosity, permeability, fracture half-length, and horizontal section length of 4%, 0.00003 md, 120 m, and 1,600 m respectively.
It is observed that the two production decline curves are relatively close, indicating that the GPR model is capable of estimating well performance (Figure 7). Therefore, the GPR model has high reliability. Besides, the GPR approach is faster than the numerical simulation technique. The numerical model takes about 10 min to complete each simulation on the machine with I7-6700 3.4 GHz CPU, while the datadriven model can calculate the output within a minute. Therefore, the GPR model is a good alternative to numerical models.

Comparison of Optimization Results
An important application of production prediction model is to optimize fracturing design and horizontal well parameters. a-exponential; b-squared exponential; c-Matern 32; d-Matern 52; e-rational quadratic; f-ARD exponential; g-ARD squared exponential; h-ARD Matern32; i-ARD Matern52; j-ARD rational quadratic. Therefore, in order to verify the effectiveness of the established production proxy model, we used GA and penalty function to optimize the fracture half-length and horizontal segment length based on GPR model and NPV model established in Methods. Because porosity and permeability were not involved in the optimization, the porosity and permeability were fixed at 4% and 0.00003 md, respectively. At the same time, the fracture half-length and horizontal section length are optimized based on the numerical simulation model. We used the optimization algorithm to iterate for 100 times and obtained the following results. The initial fracture half-length, horizontal well length, and NPV of field reference case are 100 m, 1,600 m, US$ -1.13 million, respectively. The fracture half-length, horizontal well length, and NPV optimized by GA based on GPR model and numerical simulation model converge to almost the same values ( Table 7). Compared with the field reference case, the optimized NPV increased by US$ 7.43 million. Therefore, according to the optimization results, the GPR model coupled with GA and penalty function method can meet the requirements of reservoir well layout optimization. We also noticed that the optimization time of GPR model is 1/720 of that of the numerical simulation model, which dramatically improves the optimization efficiency (Table 7). This proves the applicability and strength of the GPR model.
It is noted that the optimized horizontal well length reaches the maximum values (4000 m). Since drilling cost is a function of measuring depth and horizontal well length, there is an incentive to increase reservoir contact by extending the horizontal section to recover the cost of drilling vertical section. In addition, longer horizontal wells have created more room for hydraulic fracture treatments, and, thus, increasing shale gas production. This result is consistent with the industry trend toward a longer horizontal well (Wilson and Durlofsky, 2013). Meanwhile, it is worthwhile to point out that the optimized fracture half-length is not the maximum value in the range. This can be attributed to the fact that the rate of revenue increase is not as fast as the cost increase rate.
To prove the adaptability of the GPR model in the variable economic conditions, changes in natural gas prices are considered. The gas price ranges from 0.138 USD/m 3 to 0.261 USD/m 3 . Figure 8 displays the relationship between natural gas price and optimal fracture parameters. As the price of natural gas increases, the optimal fracture half-length gradually increases, but the optimal horizontal well length remains unchanged. The increase of optimal fracture half-length could be attributed to the increase in income exceeding the increase in expenditure. With a higher gas price, it is reasonable to use a longer horizontal well length and longer fracture half-length.