Least-Squares Reverse Time Migration Using the Gradient Preconditioning Based on Transmitted Wave Energy

A gradient preconditioning approach based on transmitted wave energy for least-squares reverse time migration (LSRTM) is proposed in this study. The gradient is preconditioned by using the energy of “approximate transmission wavefield,” which is calculated based on the non-reflecting acoustic equation. The proposed method can effectively avoid a huge amount of calculation and storage required by the Hessian matrix or approximated Hessian matrix and can overcome the influence of reflected waves, multiples, and other wavefields on the gradient in gradient preconditioning based on seismic wave energy (GPSWE). The numerical experiments, compared with that using GPSWE, show that LSRTM using the gradient preconditioning based on transmitted wave energy (GPTWE) can significantly improve the imaging accuracy of deep target and accelerate the convergence rate without trivial increased calculation.


INTRODUCTION
Compared with traditional migration techniques such as Kirchhoff integral migration, reverse time migration (RTM) based on the two-way wave equation is widely favored by researchers (Baysal et al., 1983;McMechan, 1983;Yoon and Marfurt, 2006;Symes, 2007;Fletcher et al., 2009;Liu et al., 2011;Sun et al., 2016) because of its obvious advantages in accurate imaging of complex media (especially high-steep structure and subsalt structure). However, RTM still belongs to the category of conventional migration, and its migration operator is the conjugate transposition of the forward-modeling operator, rather than the exact inverse operator (Claerbout, 1992). Therefore, conventional RTM produces blurring imaging of underground media under the influence of factors such as a complex structure, limited bandwidth, and under-sampled acquisition system, which is difficult to satisfy the current needs of oil and gas exploration and development (Nemeth et al., 1999). Dai et al. (2010) regarded the conventional RTM as an inversion problem under the framework of least squares, used the iterative method to obtain the reflection coefficient model, and developed a least-squares reverse time migration (LSRTM) method. Since the LSRTM can obtain the imaging results with high precision, high-amplitude preservation, and high resolution, it has become a research hotspot in the field of geophysics (Dai et al., 2012;Guo and Li, 2014;Huang et al., 2014;Yao and Jakubowicz, 2016;Ren et al., 2017;Rocha and Sava, 2018;Gong et al., 2019;Yang and Zhu, 2019;Li et al., 2020). Dai et al. (2012) proposed multisource LSRTM based on phase encoding, which improved the computational efficiency of the algorithm. Guo and Li (2014) implemented the true-amplitude imaging based on LSRTM and obtained the imaging with high resolution and true amplitude. Huang et al. (2014) achieved high-precision imaging of near surfaces based on LSRTM. Wong et al. (2015) proposed a joint LSRTM method by using primary and freesurface multiples and attenuated crosstalk artifacts in the image. Yao and Jakubowicz (2016) developed the LSRTM in a matrixbased formulation, which could obtain the high-precision section on the basis of effectively suppressing artifacts. Ren et al. (2017) developed elastic LSRTM, which provided more abundant and effective information for accurate imaging of complex media. Rocha and Sava (2018) proposed elastic LSRTM using the energy norm to improve imaging accuracy and speed up the convergence. Gong et al. (2019) applied a sparsitypromoting constraint to the LSRTM and obtained better imaging, especially for the metallogenetic geological model containing small-scale scatters.  used a high-order Born approximation algorithm to supplement the information of prismatic waves in conventional LSRTM and further enhanced the ability to finely characterize the steeply dipping structure. Yang and Zhu (2019) implemented a viscoacoustic LSRTM based on a time-domain complexvalued wave equation, which could improve imaging resolution and compensate attenuation effects effectively. Moreover, there were also some researchers focusing on computational efficiency (Dai and Schuster, 2013;Huang et al., 2015;Zhang et al., 2015;Hu et al., 2016;Liu et al., 2016;Li et al., 2018;Zhao and Sen, 2018;Gao et al., 2020) and extended applications (Wu et al., 2016;Zhang et al., 2016;Gu et al., 2017;Guo and McMechan, 2018;Fang et al., 2019;Liu and Liu, 2019;Qu et al., 2019;Yang et al., 2020).
As the gradient of traditional LSRTM is affected by geometric spreading and disproportioned illumination, the update of reflection coefficient model in the shallow depth has always been dominant, resulting in low imaging accuracy and slow convergence rate. At present, the gradient preconditioning algorithms such as the methods on the Hessian matrix (Hessian matrix, approximated Hessian matrix, and pseudo-Hessian) and gradient preconditioning based on seismic wave energy (GPSWE) are usually applied to improve the imaging accuracy of deep part. The algorithms based on the Hessian matrix (Pratt et al., 1998) usually require explicit calculation and storage of the Hessian matrix, which will inevitably bring huge computation and memory consumption. The algorithms based on the approximated Hessian matrix are also necessary to approximate the diagonal Hessian matrix to correct the energy of amplitude, which are still difficult to be applied to field data processing. The algorithms based on the pseudo-Hessian matrix (Shin et al., 2001;Choi et al., 2008) are popular preconditioning methods, and they are less computationally expensive, but these approaches only account for the geometrical spreading effect from the sources. GPSWE had been first proposed by Zhang et al. (2012) in full waveform inversion, which takes seismic wave energy as the correction factor and effectively eliminates the impact on gradient caused by geometric spreading and disproportioned illumination. Tan and Huang (2014), Zhang et al. (2016), and Gao et al. (2017) have applied this method to the LSRTM, which have significantly improved the imaging accuracy and convergence efficiency, especially for deep strata. However, GPSWE used in LSRTM still has the following problem. Seismic wavefield is divided into "transmitted wavefield" and "reflected wavefield." Theoretically, it is more accurate to characterize the geometric spreading and illumination effects of the gradient by the information of "transmitted wavefield." However, when adopting the operator of gradient preconditioning based on seismic wave energy (GPSWE), we discovered that the wavefield used to calculate the operator is simulated by the acoustic wave equation, which contains a lot of reflected waves besides transmitted waves. Therefore, the operator of GPSWE will be considerably influenced by the strong reflected wave energy and not accurate enough to estimate geometric spreading or illumination distribution (Song et al., 2019).
To solve the previous problem, we developed an LSRTM algorithm using the gradient preconditioning based on transmitted wave energy (GPTWE), which obtains the forwardand back-propagated "approximate transmission wavefield" based on the non-reflecting acoustic equation and applies the energy of "approximate transmission wavefield" to precondition the original gradient. This method requires neither the calculation nor storage of the Hessian matrix or the approximated Hessian matrix but can effectively improve the imaging accuracy without significantly increasing the amount of calculation.
In Principles of LSRTM, we introduce the principles and processing steps of LSRTM. In LSRTM Using the GPTWE, we expound the principles and procedures of LSRTM using GPTWE. In Marmousi Model Test and Pluto Data Example, we display the results of numerical simulation of the complex model. Finally, in Conclusion and Prospect, we present a summary of conclusions and the future research.

PRINCIPLES OF LSRTM
The two-dimensional scalar constant density acoustic wave equation is expressed as follows: where v represents the velocity model, P signifies the stress, S represents the source, t denotes the time, and ∇ 2 stands for the Laplacian operator. According to the perturbation theory and principle of Born approximation, we can obtain the following equation: Frontiers in Earth Science | www.frontiersin.org November 2021 | Volume 9 | Article 732425 2 where v 0 is a background velocity model, v s is the perturbation in the velocity model, P 0 represents the background wavefield, and P s represents the perturbed wavefield. Here, reflection coefficient model can be defined as follows (Dai and Schuster, 2013): where m is the reflection coefficient model. Therefore, the Born modeling of acoustic LSRTM can be expressed as follows: It can be seen from Eq. 4 that Born modeling can be calculated in two steps. At the background velocity, the background wavefield is calculated by using the seismic wavelet as the source first and then the perturbed wavefield is calculated by using the background wavefield and the reflection coefficient model as the perturbed term. Eq. 4 is also written as the matrix form as follows: where d refers to the matrix form of seismic record (P 0 +P s ) obtained by Born modeling, m denotes the matrix form of the reflection coefficient model, and L represents the Born modeling operator. Since the Born modeling operator L is independent of the reflection coefficient model m, the Born modeling can also be described as linearized modeling. The L2 norm is used to construct the objective function of LSRTM, which can be defined as follows: where d obs is the matrix form of observation seismic record. We usually apply the gradient algorithms to implement the iterative of the reflection coefficient model. Here, we used the adjoint state method (Plessix, 2006) to calculate the gradient and can obtain the following equation: where g represents the gradient, and λ represents the adjoint wavefield, which satisfies the adjoint equation as follows: where d refers to the difference between the simulated seismic records obtained by Born modeling and observed seismic records. Similarly, Eq. 7 can be described in the matrix form, which is simplified as follows: where g denotes the matrix form of the gradient and the superscript "T" represents the transpose of a matrix. The conjugate gradient algorithm based on gradient preconditioning is used to update the reflection coefficient model; the model update process can be expressed as follows: where k represents the number of iterations, β is the correction factor of conjugate gradient, y is the matrix form of conjugate gradient, α denotes the step length, and Q stands for the gradient preconditioning operator. And as we all know, an accurate and easy-to-calculate gradient preconditioning operator can significantly improve the imaging accuracy and accelerate the convergence rate of LSRTM.

LSRTM USING THE GPTWE
The Hessian matrix can accurately reflect the geometric spreading of wavefield and the degree of illumination (Pratt et al., 1998). Theoretically, applying the Hessian matrix to precondition, the original gradient is able to eliminate the impact caused by geometric spreading and disproportioned illumination on gradient. Therefore, the imaging accuracy and convergence rate of LSRTM are greatly improved. However, the storage and calculation required by the method of conventional gradient preconditioning based on the Hessian matrix are usually unbearable for the LSRTM of massive data. GPSWE (Zhang et al., 2012) can directly avoid the calculation and storage of the Hessian matrix or approximated Hessian matrix, which has received extensive attention from  scholars. W s (x) is the energy of forward-propagated wavefield and is represented as follows: where P s (x, t, x s ) is the forward-propagated wavefield value at x, which is obtained by the forward modeling based on the acoustic wave equation (as shown in Eq. 1) with the source disturbance at x s . x represents the one-dimensional space vector. Analogously, W r (x) is the energy of back-propagated wavefield and is defined as follows: where P r (x, t, x r ) stands for the back-propagated wavefield value at x, which is obtained by the reverse time extrapolation based on the acoustic wave equation (inverse process of Eq. 1) with the impulse disturbance at x r . Then we used the energy of forwardand back-propagated wavefields to precondition the original gradient and obtain the following equation:   Frontiers in Earth Science | www.frontiersin.org November 2021 | Volume 9 | Article 732425 5 where g p is the operator of GPSWE. In essence, this method is not a direct approximation of the Hessian matrix but uses the distribution of the energy of seismic wavefield to correct the geometric spreading and proportioned illumination, which avoids the storage and calculation of large matrix. The research of Song et al. (2019) in full waveform inversion shows that only the transmission wavefield contains accurate information of geometric spreading and illumination. However, the conventional seismic wavefield also contains a large number of reflected waves, multiples, and other wavefields in addition to the transmission wavefield, which makes the operator of GPSWE not accurate enough. For this reason, this study develops a highly efficient LSRTM algorithm using GPTWE. The implementation steps of this method are described in detail later, which are similar to the one of GPSWE.
First, the forward modeling based on the non-reflecting acoustic wave equation (Baysal et al., 1984) (as shown in Eq. 14) is used to obtain the forward-propagated "approximate transmission wavefield" with the seismic wavelet as the source.
where x and z denote the space coordinates, respectively, and U stands for the "approximate transmission wavefield." E s (x) is the energy of forward-propagated "approximate transmission wavefield" and is represented as follows: where U s (x, t, x s ) is the forward-propagated "approximate transmission wavefield" value at x, which is obtained by the forward modeling based on the non-reflecting acoustic wave equation (as shown in Eq. 14) with the source disturbance at x s . Similarly, E r (x) is the energy of back-propagated "approximate transmission wavefield" and is defined as follows: where U r (x, t, x r ) stands for the backward-propagated wavefield value at x, which is obtained by the reverse time extrapolation based on the non-reflecting acoustic wave equation (inverse process of Eq. 14) with the impulse disturbance at x r . Then we used the energy of forward-and back-propagated "approximate transmission wavefield" to precondition the original gradient and can obtain the following: where g t is the operator of GPTWE. The flowchart of the LSRTM using GPTWE is shown in Figure 1.
It should be noted that in the implementation process of the LSRTM using GPTWE, all the steps are the same as the LSRTM using GPSWE, except that the acoustic wave equation in calculating preconditioning operator is replaced with the non-reflecting acoustic wave equation, so the calculation of the LSRTM using GPTWE is essentially in agreement with the one using GPSWE. The LSRTM using GPSWE only needs to add one additional forward modeling and one additional reverse time continuation in the first iteration in comparison with the conventional LSRTM without gradient preconditioning. And the additional calculation is negligible compared with the LSRTM itself, which often needs hundreds of wavefield continuation. Therefore, theoretically, the computational efficiency of the three   methods including the conventional LSRTM without gradient preconditioning, the LSRTM using GPSWE, and the LSRTM using GPTWE are basically equivalent.
To test the suppressing effect of the non-reflecting acoustic wave equation on reflected waves, the forward modeling experiment was carried out based on the Marmousi model. The size of model is 4,600 m in length and 3,000 m in depth (as shown in Figure 2). The grid interval in the x and z directions is 8 m. A Ricker wavelet with a dominant frequency of 20 Hz is used as the source, which is excited at (2,300 m, 0 m). The accuracy in finite difference wavefield modeling is eighth order in space and second order in time. The time sampling step is 0.5 ms, and the maximum recording time is 3 s. The hybrid absorbing boundary condition (Xie et al., 2020) is used for boundary processing. Figure 3 illustrates the wavefront snapshots at 1.05 s simulated by the acoustic wave equation and the non-reflecting acoustic wave equation.
From Figure 3, we can observe that the reflected waves in the wavefield simulated by the non-reflecting acoustic wave equation have been suppressed effectively, and the simulated wavefield is closer to a pure transmitted wavefield than that simulated by the acoustic wave equation. Therefore, in theory, it is more accurate to precondition the gradient using "approximate transmission wavefield" information simulated by the non-reflecting acoustic wave equation.

MARMOUSI MODEL TEST
The Marmousi model in LSRTM Using the GPTWE is used to test the feasibility and accuracy of the algorithm in complex media conditions. The background velocity model used for LSRTM imaging is shown in Figure 4, which is the result of Gaussian smoothing of the original velocity model in Figure 2. A total of 116 shots are considered for imaging, and the shots are evenly distributed on the surface with the interval of 40 m. A total of 451 receivers are allotted for each shot, and the receivers are evenly distributed on both sides of each shot with 8-m interval. The observation data are generated by the full-waveform modeling (Eq. 1). The remaining experimental parameters are the same as those in LSRTM Using the GPTWE. Figure 5 shows the preconditioners using GPSWE and GPTWE. In Figures 5A,B, we can observe that the preconditioning operator of GPTWE is more related to the model, and the deep illumination compensation is stronger. Figure 6 illustrates the imaging result of LSRTM after 60 iterations. Through the compassion between Figures 6A-C (marked by the dashed red circle), it can be seen that the imaging results of LSRTM based on gradient preconditioning are better than those without gradient preconditioning. Specifically, after 60 iterations, the LSRTM using GPTWE has the best amplitude preservation, the highest spatial resolution, and the highest imaging accuracy of deep part, followed by the LSRTM using GPSWE, and the LSRTM without gradient preconditioning has the worst imaging result.
In order to compare the imaging effects of the previous three methods more clearly, we extract the imaging curves at x 1960 m from the sections in Figure 6 and compare them with theoretical reflection coefficient curve, which is calculated using Eq. 3; Figure 7 is the single-trace display of imaging results after 60 iterations, where blue, green, red, and black are the singletrace curves of LSRTM without gradient preconditioning, using GPSWE, using GPTWE, and theoretical reflection coefficient, respectively. As observed from Figure 7, the imaging result of LSRTM using GPTWE is closest to the theoretical reflection coefficient curve, while the imaging result of LSRTM without gradient preconditioning is the different from the theoretical reflection coefficient curve at different imaging positions. Therefore, the amplitude preservation of the LSRTM using GPTWE is the highest, followed by the LSRTM using GPSWE, and the lowest without gradient preconditioning. The convergence curves are shown in Figure 8 for this example, where blue, green, and red are the convergence curves of LSRTM without gradient preconditioning, using GPSWE, and using GPTWE, respectively. In Figure 8, it can be seen that the LSRTM using GPTWE has the fastest convergence rate and the smallest residual error, followed by the LSRTM using GPSWE, while the LSRTM without gradient preconditioning has the slowest convergence rate and converges to the largest value after 60 iterations.
Furthermore, the computational efficiency of the previous three methods is analyzed, as shown in Table 1. (The GPU used in this experiment is GeForce RTX 2080 Ti.) Table 1 shows that computational efficiency of those methods is basically the

PLUTO DATA EXAMPLE
Due to the poor illumination beneath the salt bodies, the subsalt imaging problem has always been a challenging issue. So, in the second example, we performed LSRTM on the Pluto model to check the ability of LSRTM using GPTWE in recovering the weak events in the deep part and accelerating the convergence rate. Figure 9A shows the Pluto model, which is 7.33 km in length and 2.5 km in depth with a 10-m grid interval in the horizontal and vertical directions. And Figure 9B shows the smoothed background velocity model. The line involves 147 shots, and a total of 734 receivers are allotted for each shot. The observation data are generated by the full-waveform modeling (Eq. 1). The interval between shots is 50 m, and the interval between receivers is 10 m. The depth of shots and receivers are both 0 m. A Ricker wavelet with a 20 Hz dominant frequency is used to generate the data. The time sampling step is 1 ms, and the maximum recording time is 6 s. The accuracy in finite difference wavefield modeling is tenth order in space and second order in time. The hybrid absorbing boundary condition (Xie et al., 2020) is used for boundary processing.
The inverted images after 60 iterations with three different methods for the Pluto model are shown in Figure 10. As shown in Figure 10 (marked by the dashed red circle), the LSRTM without gradient preconditioning is difficult to image the structure below the salt bodies because of the poor illumination; the LSRTM using GPSWE is helpful for imaging the subsalt structures, but the event is weak and the horizontal balance is poor; the LSRTM using GPTWE can effectively improve the imaging accuracy of deep target, and the subsalt structures are clearer and more continuous than those of other images.
Analogously, the imaging curves and theoretical reflection coefficient curve at x 3,500 m are displayed in Figure 11. As observed from Figure 11, the imaging result of LSRTM using GPTWE is closest to the theoretical reflection coefficient curve, especially in the deep part.
The convergence curves are plotted in Figure 12. It can be seen that after 60 iterations, the red curve for LSRTM using GPTWE has the fastest convergence rate and converges to the smallest value.

CONCLUSION AND PROSPECT
Based on the calculation characteristics of LSRTM, this study proposes a gradient preconditioning approach using transmitted wave energy for LSRTM. In comparison with conventional methods, the imaging results of theoretical model tests show that the LSRTM using GPTWE can improve the imaging accuracy of deep target and speed up the convergence rate without significantly increasing the amount of calculation. In addition, this study only implements the two-dimensional LSRTM using GPTWE, and further extending the algorithm to three-dimensional migration will be the focus of subsequent research.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material; further inquiries can be directed to the corresponding author.

AUTHOR CONTRIBUTIONS
CX contributed to writing-original draft. PS helped with conceptualization and project administration. XL assisted with formal analysis. JT helped with software. SW framed the methodology. BZ provided suggestions. Frontiers in Earth Science | www.frontiersin.org November 2021 | Volume 9 | Article 732425 9