Sparse Constrained Least-Squares Reverse Time Migration Based on Kirchhoff Approximation

Least-squares reverse time migration (LSRTM) is powerful for imaging complex geological structures. Most researches are based on Born modeling operator with the assumption of small perturbation. However, studies have shown that LSRTM based on Kirchhoff approximation performs better; in particular, it generates a more explicit reflected subsurface and fits large offset data well. Moreover, minimizing the difference between predicted and observed data in a least-squares sense leads to an average solution with relatively low quality. This study applies L1-norm regularization to LSRTM (L1-LSRTM) based on Kirchhoff approximation to compensate for the shortcomings of conventional LSRTM, which obtains a better reflectivity image and gets the residual and resolution in balance. Several numerical examples demonstrate that our method can effectively mitigate the deficiencies of conventional LSRTM and provide a higher resolution image profile.


INTRODUCTION
Seismic migration is an inverse procedure of forward modeling, which can restore the interior of the earth medium with record data. Specifically, migration attempts to eliminate the effects caused by the process of physical propagation and obtain an image that clearly depicts the structural information of interest. Reverse time migration (RTM), a state-of-the-art seismic imaging method (Baysal et al., 1983;McMechan, 1983), identifies the aforementioned acausal procedure appropriately. Based on two-way wave equation, RTM is powerful for handling complex geological settings and velocity with dramatic variation in the lateral direction. Therefore, it can deal with steep dips and salt dome better than conventional migration (Zhu and Lines, 1998;Yoon et al., 2003;Liu et al., 2010). However, most migration methods, including RTM, use the adjoint operator to compute the image instead of the inverse operator (Tarantola, 1984). Practical data suffers from many factors, such as irregular acquisition geometry and limited aperture of the acquisition system. These deficiencies generate artifacts and degrade the resolution. To overcome these limitations, least-squares migration (LSM) was proposed to combine with RTM (Liu et al., 2011;Dai et al., 2012). Therefore, seismic imaging can be regarded as a linearized inverse problem. With a proper initial velocity model, seismic records can be inverted to a more accurate profile. LSRTM iteratively reduces the residual between predicted data and observed data in a least-squares framework; therefore, the adjoint operator can keep approaching the inverse operator. Many results have indicated that LSRTM has a better performance than conventional RTM and migration (Zhang et al., 2015;Dutta and Schuster, 2014;Liu et al., 2016).
The precondition of seismic inversion is forward modeling, which maps the parameter model to seismic data. There are two main approaches to build linear approximation between physical model and wavefield (Yang and Zhang, 2019). One is the most commonly used Born approximation based on small perturbation (Beylkin, 1985;Bleistein, 1987). This requires that high-order scattered wavefields are much weaker than primary field. The Born operator describes a linear relationship between model perturbation and primary reflected wave. It divides the wavefield into two parts: background wavefield and perturbation wavefield. LSRTM based on Born approximation can achieve model perturbation with these two fields. In addition, an alternative scheme for modeling is Kirchhoff approximation (Bleistein, 1987). Compared with Born modeling, the Kirchhoff operator delineates the connection between primary reflected wave and reflectivity. Different operators lead to distinctive results under these two physical contexts. However, neither Born nor Kirchhoff approximation can avoid the impact on seismic image in a least-squares sense. Because minimizing the L2 norm only provides an average solution (Wang, 2016;Wu et al., 2016). It is essential to seek a balance between the residual and resolution. According to geological recognition, the earth medium usually presents a layered spatial distribution. The reflection coefficient that mirrors strata attributes should be sparsity, that is, the part of model that does not generate reflected wave ought to be zero. Therefore, the inverted model needs a sparse limitation.
This study implements a Kirchhoff modeling formula for LSRTM promoted by sparsity. The reflectivity model should be regularized with L1 norm while minimizing the residual of wavefield in the form of the L2 norm. Referring to 'least absolute shrinkage and selection operator' (Lasso) problem (Tibshirani, 1996), this reformed LSRTM can be solved by the algorithm of spectral projected gradient for L1 minimization (SPGL1), which is designed to solve sparse least squares (van den Berg and Friedlander, 2011). Examples show that our method can effectively overcome the problems mentioned above.

METHOD
RTM has great advantages in imaging steep strctures such as salt dome. However, it suffers from low-frequency noise compared to conventional migration. Least-squares migration can get closer iteratively to the optimal solution and eventually obtain a relatively high signal-to-noise ratio, high resolution and amplitude equalized profile that eliminates the influence of the acquisition system. It contains three steps: constructing a linear modeling problem first, using the forward and backward propagation wavefields to image, and finally updating the physics model according to the residual.

Linear Modeling Operators
The linearization of nonlinear forward problem is essential to seismic inversion, making the physical progress more explicit; moreover, converting the medium parameter becomes easier. The choice of a linear operator will lead to different physical significance and images. It is a common way to use Born approximation to realize linearization. The real velocity model is divided into two parts: background velocity v 0 and velocity perturbation δv. Given a perturbation δv, it generates a corresponding wavefield perturbation δu. The Born operator describes the relationship between reflected wave and model perturbation. Specifically, the incident wave interacting with model perturbation becomes a new source, namely the Huygens principle, and then the new source generates wavefield perturbations. This can be expressed as follows in time domain: where u 0 represents the background field propagating in v 0 , f (t; x s ) is the source signature located at x s and excited at t, the model perturbation is denoted by m(x) 2δv(x)/v 0 (x), which describes velocity changes compared to background velocity.
x is a point in model. This study assumes that the density ρ is a constant (Eq. 1) and (Eq. 2) can be rewritten in form of an integral using Green's theorem: where G 0 (x, t; x s ) is the Green's function from x s to x, G 0 (x g , t; x) propagates from x to x g . Green's function is governed by: where the δ(t; x s ) is Dirac function. Born approximation represents scattered phenomenon caused by model perturbation, which could be a means of linearizing seismic inversion. However, this approximation is accurate when scattered field δu is much weaker than background field u 0 (Schuster, 2017), which is a disadvantage of Born approximation. It cannot describe kinematic and dynamic information of seismic waves well with strong reflector. And studies have shown that Born approximation has limited angle validity and it cannot appropriately predict the reflections generated with large incident angle (Yang and Zhang, 2019).
Compared to the Born operator, the Kirchhoff operator relates the reflectivity to wavefield perturbation. Therefore, it depicts the interaction between the incident field and reflectivity rather than velocity perturbation. There is a relationship between reflectivity and model perturbation when the perturbation and incident angle are small (Stolt and Weglein, 2012): where the r(x, α) is the reflection coefficient at point x with incident angle α between the incidence and the normal line ( Figure 1). This means that we can obtain the wavefield perturbation under the Kirchhoff approximation by substituting (Eq. 6) into (Eq. 4), and we have Here we turn Kirchhoff modeling equation into the same form as Born approximation. Then Eq. 7 can be rewritten as.
It should be noted that the term r(x, α) can be replaced by the generalized angle-dependent reflectivity model to get rid of the limitations of small perturbation and incident angle. Although there are some methods to solve the propagation direction of wave, such as Poynting vector and Plane Wave Decomposition (PWD), it is still tedious and time-consuming to obtain the angle term. Here we give an approximate scheme (Yang and Zhang, 2019).
Each shot can invert a reflectivity image, here we sum the images obtained by all shots. Then, we regard the summation as the final reflectivity model and use it to iterate. Approximately, we can get an averaged reflectivity model by multiple shots stacking. Therefore, we can get the predicted data by using this stacked reflectivity R(x) rather than the angle-dependent term r(x, α) cos(x, α). Note that R(x) is an averaged reflectivity over all illuminated angles.
With this approximate reflectivity R(x), we can express Eq. 9 as In sum, with the relationship of reflectivity and model perturbation, two linear approximations have a similar form, which expresses their common ground. The difference between two approximations is also evident. From Eq. 6, cos α approximately equals to one and can be ignored for a small incident angle. Therefore, reflectivity can be regarded as the spatial derivative of model perturbation. The inverted model after spatial derivation has a higher resolution, that is, the spectrum has been improved. More details are provided in the numerical tests.

Least-Squares With Sparse Optimization
In contrast to full waveform inversion (FWI) (Liu, et al., 2020), LSRTM first establishes a linear relationship between physical model and corresponding response (Tarantola, 1984), then it implements the inverse problem. The least-squares method (LSM) only requires the construction of a migration operator and inverse migration operator, which is conjugated to each other. It can reduce the residual between the observed and predicted data iteratively to approach the optimal solution of the inverse problem gradually. According to the linear approximation above, we can express them in the form of a matrix: d Lm (12) where the d is predicted data, such as background or perturbation fields. L represents modeling operator and m is the physics model. Usually, it is assumed that the background velocity has been obtained in advance, and then the data can be predicted. Hence, the misfit function can be expressed as: The model m, which makes zE(m) zm (the Jacobian matrix) equal to 0, is the optimal solution of Eq. 13. However, the computation of the Jacobian matrix is quite time consuming, particularly for seismic exploration. We adopt the adjoint-state method to calculate the adjoint operator L T of modeling operator L, Specifically, the gradient of E(m) can be obtained by back propagation of the wavefield residual and background field, here we give the gradient based on Kirchhoff approximation (Plessix, 2006;Wang et al., 2021): Where q(x, t; x s ) is the adjoint wavefield governed by: According to Eq. 13, we can obtain a least-squares solution m (L T L) −1 L T d. Note that LSM provides a smooth solution of the model, which is determined by the properties of the L2 norm. As a result, LSM has a limited ability to improve the quality of the image. Here we give a simple model to display the impact of LSM. In this example, we use the Ricker wavelet with a center frequency of 30 Hz and a time sampling interval of 1 ms. With convolution model theory, we can get seismic records via the convolution of Ricker wavelet and reflection coefficients, which can be obtained by d Lm. Conversely, reflection coefficients can be obtained by the deconvolution of seismic records and wavelet, that is, m mig L T d. Figure 2C is the result of deconvolution, and it is hard to identify the reflectors. Compared to deconvolution, Figure 2D shows that LSM improves the resolution obviously. However, many oscillations caused by (L T L) −1 near the real reflection coefficients should not exist. That's why we regard the least-squares solution as a smooth or average solution. The actual model indicates that the medium presents a layered spatial distribution, as shown in Figure 2A or Figure 2B, that is, the subsurfaces are sparse. In Figure 2E, the inversion result with sparse constrained LSM performs quite well, and these oscillations generated by LSM are suppressed; thus, the resolution and sparsity of the reflection coefficient series are improved effectively.
Due to the feasibility and sparse property of L1-norm, we modify the objective function with L1 norm to realize sparse reconstruction of the model in this study. Generally, Eq. 13 can be reformed with two new problems.
1 Basis Pursuit (BP) problem (BP) min m 1 , subject to Lm d Eq. 16 depicts a BP problem that comes from compressed sense theory, and it aims to seek a sparse solution that satisfies Lm d. However, practical seismic data inevitably contain noise, and Eq. 16 can be modified as a basis pursuit denoising (BPDN) problem: where the σ describes the noise level in the data, and Eq. 16 and Eq. 17 are equivalent to each other when σ 0.

Least Absolute Shrinkage and Selection Operator (LASSO) problem
where the τ ≥ 0 is an explicit limitation of sparsity on m. Problems (BP σ ) and (LS τ ) are different descriptions of the same question. They are equivalent in the sense that there exists a solution m * of (BP σ ) for a given σ, and there exists a corresponding τ that makes m * also be a solution of (LS τ ). Frontiers in Earth Science | www.frontiersin.org September 2021 | Volume 9 | Article 731697 Both problems mentioned above can be solved by the algorithm of spectral projected gradient for L1 minimization (SPGL1). Given a constraint τ, we can obtain the residual norm from Eq. 18: Eq. 20 recasts (LS τ ) as a problem of finding the root of a nonlinear equation and defines a continuous curve, the Pareto curve (Figure 3). For a given σ, SPGL1 uses the Newton method to approach the root, and as the τ updates iteratively, the optimal solution m τ σ of problems (BP σ ) and (LS τ ) can be obtained. Therefore, we balance the 2-norm of the residual against the 1-norm of the solution eventually (van den Berg and Friedlander, 2009). From Figure 3, the question is degraded to a simple Lasso problem (LS τ ) when the noise level factor σ is equal to 0. In this study, we set σ 0. Note that synthetic seismic records do not contain noise in general, so we set the noise level factor to be zero. Actually, the algorithm of SPGL1 can deal with noisy data, and we can add some random noise or set some traces to be zero in synthetic data. Besides, the determination of parameter τ is quite important. According to the theoretical model, we can calculate the perturbation model or reflectivity model and make a rough estimate of τ. In general, it is appropriate to set the value of tau to tens of times that of the calculated perturbation model or reflectivity model. Then, the parameter τ can be adjusted according to the inversion results.
Here we summarize the workflow of L1-regularized LSRTM as follows: 1) Obtain the predicted data d 0 with migration velocity v 0 , and get the d res with d res d obs − d 0 ; 2) Set the initial model m 0 and predict the data Lm 0 based on Born or Kirchhoff approximation, therefore we can get the residual r 0 Lm 0 − d res and gradient operator g 0 L T r 0 ; 3) Input the parameters of τ, σ and set k 0; 4) Solve the Lasso problem (LS τ ) with the algorithm of SPGL1, and update the m k , r k and g k until k k max ; 5) Output the result m k max .

NUMERICAL EXAMPLE
In this study, two theoretical models are used to test the validity of the proposed method, including a single diffraction point and complex fault model. Both are based on the two-way acoustic wave equation. Here we use the finite difference method on regular grid.

Single-Diffraction Point
To verify the effectiveness of this method, we first set a simple model with a diffraction point of 2000 m/s embedded in the background velocity of 1,000 m/s (Figure 4), and the entire model has been discretized into 201 × 201 grids in the horizontal and vertical directions, respectively, with the same interval of 5m. The geometry system is arranged as follows: a total of 21 shots are uniformly distributed on the surface of this model with an interval of 50m. Geophones are also placed on each grid point on the surface. We use the Ricker wavelet with a center frequency of 25 Hz for modeling, and the sampling interval is 0.5 ms. In this example, we set 1,000 m/s as the migration velocity.
As shown in Figure 5A, the image is obtained by LSRTM based on Born approximation. This is consistent with the actual situation to a certain degree. The single scatter point is blurred with a disturbing cross pattern (marked by a yellow arrow). However, it should be a dot on the image (Lecomte and Kjeller, 2008). This is because we use the adjoint operator to migrate rather than the inverse operator in Eq. 14. Specifically, Eq. 13 defines a normal equation with L T Lm L T d. The term L T L, Hessian matrix, is equivalent to a blur operator acting on the true image m. Furthermore, L T L includes the influence of irregular acquisition, limited acquisition aperture, band limited source, etc., which generate artifacts and degrade the resolution of the image (Jiang and Zhang, 2019). Compared to LSRTM, the same method in Figure 5C with the L1 constraint performs better; it mitigates the distortion caused by the blur operator. Therefore, with the promotion of sparsity, the resolution in the least-squares method has been improved significantly, and the image looks more like a scatter point.
The LSRTM based on the Kirchhoff operator inverts the reflectivity directly from the seismic records. Figure 5B displays the image produced by Kirchhoff approximation. Compared to Figure 5A, least-squares RTM based on Kirchhoff operator suffers from the same problems. Similarly, we implement the L1 norm on LSRTM, which is shown in Figure 5D. The cross pattern is eliminated clearly, and we  Frontiers in Earth Science | www.frontiersin.org September 2021 | Volume 9 | Article 731697 6 obtain an explicit dot rather than a blurred spot. Therefore, for a simple model, the sparsity-promoting LSRTM based on Kirchhoff approximation can effectively improve the resolution of the image. The results calculated by LSRTM in Figures 5A,B

Fault Model
We also test the other relative complex model. In this fault model ( Figure 6A), there are some classical geological structures, including folds, fault blocks, and depressions. Therefore, it appropriately shows the complex structure of near-surface media. The maximum and minimum velocities are 4,000 m/s and 1,500 m/s, respectively. Similarly, we discretize it into 265 × 367 grids with an interval of 5 m. Thus, a total of 25 shots are uniformly located at the surface of this model. The modeling seismic wavelet is same to last experiment.
As shown in Figures 7A,B, images inverted by LSRTM based on two approximations fit the fault model well, and the contact relationship between structures can be clearly depicted. To further improve the resolution of these images, we combined L1 norm regularization with LSRTM to reconstruct the model. From Furthermore, we enlarge the model framed by red rectangle in Figure 6A, which has step-like strata (marked by yellow arrows in Figure 8A). After inverting, the reflectivity image in Figure 8D produced by constrained LSRTM based on Kirchhoff operator agrees with the actual situation. Furthermore, images inverted by two different approximations have different phases. According to Eq. 6, reflectivity can be derived from model perturbation. With the assumption of a small incident angle, cos(x, α) is roughly equivalent to 1 and can be ignored. Then, Eq. 6 can be rewritten as r(x) ikm(x)/2, where the wavenumber k ω c . Therefore, reflectivity r(x) is the spatial derivative of model perturbation m(x). As a result, the image inverted by Kirchhoff performs sparser and sharper, and there is a phase shift of 90°b etween perturbation model and reflectivity model. Figure 9 shows the amplitude spectra of the images in Figures 7B-D, respectively. The spectra curves are the sum of each trace by the spatial Fourier transform along the depth. The Frontiers in Earth Science | www.frontiersin.org September 2021 | Volume 9 | Article 731697 red one is generated from the image inverted by L1-regularized LSRTM based on Born approximation. The blue and green spectrum curves are produced by unconstrained and constrained LSRTM with Kirchhoff approximation, respectively. Because of the spatial derivative and sparse constraint, the spectrum of the image inverted by L1-LSRTM with Kirchhoff approximation has more high-wavenumber components than that of Born approximation, which explains that Kirchhoff approximation improves the resolution of the image.

CONCLUSION
The LSRTM recasts classical seismic inversion as a linear inverse problem. By means of linear approximation, physical model is related to the corresponding wavefield. Thereafter, we can reduce the residual between predicted and observed data iteratively to directly invert the interest parameters. This study introduces two linearization methods. Born approximation obtains the relationship between the model and physical response based on perturbation theory. With the help of the Born operator, we derive another type of linear method, namely the Kirchhoff operator, which relates the reflectivity to wavefield explicitly. Moreover, these two methods have a relationship of a spatial derivative, and there is a phase shift  Frontiers in Earth Science | www.frontiersin.org September 2021 | Volume 9 | Article 731697 8 between perturbation model and reflectivity model. Although two operators are different physical quantities, the resolution can be improved by Kirchhoff approximation.
LSRTM can mitigate the shortcomings of other migration methods, while the solution is smooth and deviates from the true model. Specifically, there are redundant oscillatory axes in the strata that should be sparsely distributed. Therefore, we reform the question as a sparsity-promoting LSRTM. The SPGL1 algorithm can effectively solve this problem and invert a sparse image that matches the model well. Examples prove the validity of our study.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.