Study on parameter choice methods for the RFMP with respect to downward continuation

Recently, the regularized functional matching pursuit (RFMP) was introduced as a greedy algorithm for linear ill-posed inverse problems. This algorithm incorporates the Tikhonov-Phillips regularization which implies the necessity of a parameter choice. In this paper, some known parameter choice methods are evaluated with respect to their performance in the RFMP and its enhancement, the regularized orthogonal functional matching pursuit (ROFMP). As an example of a linear inverse problem, the downward continuation of gravitational field data from the satellite orbit to the Earth's surface is chosen, because it is exponentially ill-posed. For the test scenarios, different satellite heights with several noise-to-signal ratios and kinds of noise are combined. The performances of the parameter choice strategies in these scenarios are analyzed. For example, it is shown that a strongly scattered set of data points is an essentially harder challenge for the regularization than a regular grid. The obtained results yield a first orientation which parameter choice methods are feasible for the RFMP and the ROFMP.


Introduction
The gravitational field of the Earth is an important reference in the geosciences. It is an indicator for mass transports and mass reallocation on the Earth's surface. These displacements of masses can be caused by ocean currents, evaporation, changes of the groundwater level, ablating of continental ice sheets, changes in the mean sea level or climate change (see e.g. [18,19]). However, it is difficult to model the gravitational field, because terrestrial measurements are not globally available. In addition, the points of measurement on the sea are more scattered than those on the continents. This has motivated the launch of satellite missions with a focus on the gravitational field (see e.g. [8,31,37,41]). Naturally, those data are given at a satellite orbit, not on the Earth's surface. Additionally, the measurements are only given pointwise and are afflicted with noise. The problem of getting the potential from the satellite orbit onto the Earth's surface is the so-called downward continuation problem, which is a severely unstable and, therefore, ill-posed inverse problem (see e.g. [13,40]). Traditionally, the gravitational potential of the Earth has been represented in terms of orthogonal spherical polynomials (i.e. spherical harmonics Y n,j , see e.g. [12,23,30]) as in the case of the Earth Gravitational Model 2008 (EGM2008, see [33]). An advantage of this representation is that the upward continuation operator Ψ which maps a potential F from the Earth's surface (which we assume here to be the unit sphere Ω) to the orbit rΩ with r > 1 has the singular value decomposition where Y r n,j (x) := 1 r Y n,j x r , x ∈ rΩ. Its inverse is, therefore, given by n j=−n G, Y r n,j L 2 (rΩ) r n Y n,j = ∞ n=0 n j=−n G, Y r n,j L 2 (rΩ) σ −1 n Y n,j (2) in the sense of L 2 (Ω) and for all G ∈ Ψ(L 2 (Ω)) ⊂ L 2 (rΩ). Note that the singular values of Ψ + , which are given by (σ −1 n ) n = (r n ) n , increase exponentially. For details, see [40,42].
The outline of this paper is as follows. Section 2 deals with the RFMP and its enhancement, the ROFMP, which are used here for the regularization of Ψ + . For both algorithms, the essential theoretical results are recapitulated. In Section 3, the parameter choice methods under consideration for the RFMP and ROFMP are summarized and details of their implementation for the test cases are explained. In Section 4, the relevant details of the considered test scenarios are outlined. Section 5 analyzes and compares the results for the various parameter choice strategies.

RFMP
In this section, we briefly resume the regularized functional matching pursuit (RFMP), which was introduced in [9,10,24,26], and an orthogonalized modification of it (see [27,42]). It is an algorithm for the regularization of linear inverse problems.
According to [25,26,42], we use an arbitrary Hilbert space H ⊂ L 2 (Ω). Let an operator F : H → R l be given which is continuous and linear. Concerning the downward continuation, we have a vector y ∈ R l of measurements at a satellite orbit, that means our data are given pointwise. The inverse problem consists of the determination of a function F ∈ H such that FF = y = ((ΨF )(x j )) j=1,...,l , where (x j ) j=1,...,l is a set of points at satellite height. In the following, we use bold letters for vectors in R l . To find an approximation for our function F, we need to have a set of trial functions D ⊂ H \ {0}, which we call the dictionary. Our unknown function F is expanded in terms of dictionary elements, that means we can represent it as F = ∞ k=1 α k d k with α k ∈ R and d k ∈ D for all k ∈ N.

The algorithm
The idea of the RFMP is the iterative construction of a sequence of approximations (F n ) n . This means that we add a basis function d k of the dictionary to the approximation in each step. This basis function is furthermore equipped with a coefficient α k .
Since the considered inverse problem is ill-posed, we use the Tikhonov-Phillips regularization, that is, our task is to find a function F which minimizes That means, if we have the approximation F n up to step n, our greedy algorithm chooses α n+1 ∈ R and d n+1 ∈ D such that is minimized. Here, λ > 0 is the regularization parameter. We can state the following algorithm for the RFMP.
Algorithm 2.1. Let y ∈ R l and an operator F : H → R l (linear and continuous) be given.
(3) Stopping criterion F n+1 is the output, if the stopping criterion is fulfilled. Otherwise, increase n and go to step 2.
The maximization, which is necessary to get d n+1 , is implemented by evaluating the fraction for all d ∈ D in each iteration and picking a maximizer. Since many involved terms can be calculated in a preprocessing, the numerical expenses can be kept low (see [24]). For a convergence proof of the RFMP, we refer to [25]. Briefly, under certain conditions, one can show that the sequence (F n ) n converges to the solution F ∞ of the Tikhonov-regularized normal equation where I is the identity operator and F * is the adjoint operator to F.

ROFMP
The regularized orthogonal functional matching pursuit (ROFMP) is an advancement of the RFMP from the previous section. The basic idea is to project the residual onto the span of the chosen vectors, i.e., and then adjust the previously chosen coefficients in such a way that the residual is afterwards contained in the orthogonal complement of the span. Since this so-called backfitting (cf. [22,32]) might not be optimal, we implement the socalled prefitting (cf. [45]), where the next function and all coefficients are chosen simultaneously to guarantee optimality at every single stage of the algorithm. Moreover, let W n := V ⊥ n and the orthogonal projections on V n and W n are denoted by P Vn and P Wn , respectively. All in all, our aim is to find Here, and, thereby, we set The updated coefficients for the expansion at step n + 1 are given by The ROFMP algorithm can be summarized as follows.
Algorithm 2.2. Let a dictionary D ⊂ H, a data vector y ∈ R l and an operator F : H → R l (linear and continuous) be given.
(2) Iteration Choose a function and calculate the corresponding coefficient where B n (d) is defined according to (11) and (12). With the updated coefficients α (3) Stopping criterion F n+1 is the output, if the stopping criterion is fulfilled.
Otherwise, increase n and go to step 2.
For practical details of the implementation, see [42].
Remark 2.3. If we choose d i ∈ D and α i as in Algorithm 2.2 and update α as in (13) and (14), we obtain for the regularized case (λ > 0) that R n is, in general, not orthogonal to V n for all n ∈ N 0 , that means In [42], it was shown that, with the assumptions from Remark 2.3, there exists a number N := N (λ) such that That means we get a stagnation of the residual. This is a problem for the ROFMP, because we cannot reconstruct a certain part of the signal which lies in V n . Therefore, we have to modify the algorithm to an iterated Tikhonov-Phillips regularization. That means we run the algorithm for a given number of iterations (in our case K > 0), then break up the process and start the algorithm again with the previous residual R K . This is called the restart or repetition. For this process, we first need an additional notation: we add a further subscript j to the expansion F n . Note that we have two levels of iterations here. The upper level is associated to the restart procedure and is enumerated by the second subscript j. The lower iteration level is the previously described ROFMP iteration with the first subscript n. We denote the current expansion by where F 0,1 := 0 and F 0,j := F K,j−1 . In analogy to the previous definitions, the residual can be defined in the following way: That means, after K iterations, we keep the previously chosen coefficients fixed and restart the ROFMP with the residual of the step before. All in all, we have to solve and update the coefficients in the following way We summarize for the expansion F K,m , which is the approximation produced by the ROFMP after m restarts: In analogy to the RFMP, we obtain a similar convergence result for the ROFMP. That is, under certain technical conditions, the sequence (T m ) m converges in the Sobolev space H. For further details, we refer to [27,42].

Introduction
The Earth Gravitational Model 2008 (EGM2008, see [33]) is a spherical harmonics model of the gravitational potential of the Earth up to degree 2190 and order 2159. We use this model up to degree 100 for the solution F in our numerical tests. For checking the parameter choice methods, we generate different test cases that means test scenarios which vary in the satellite height, the noise-tosignal ratio and the data grid. Based on the chosen function F , our dictionary contains all spherical harmonics up to degree 100 that means our approximation F from the algorithm has the following representation This is a strong limitation, but higher degrees would essentially enlarge the computational expenses. Moreover, for the stabilization of the solution, we use the norm of the Sobolev space H := H((a n ) n ; Ω) which is constructed with see [11]. This Sobolev space contains all functions F on Ω which fulfil The inner product of functions F, G ∈ H is given by The particular sequence (a n ) n = ((n + 1 2 ) 2 ) was chosen, because preliminary numerical experiments showed that the associated regularization term yielded results with an appropriate smoothness.
In our test scenarios, we use a finite set {λ k } k=1,...,100 of 100 regularization parameters (for details, see Section 4.4). The approximate solution of the inverse problem as an output of the RFMP/ROFMP corresponding to the regularization parameter λ k and the data vector y is denoted by x k . This notation is introduced to avoid confusions with the functions F n which occur at the n-th step of the iteration within the RFMP.
In practice, we deal with noisy data y ε where the noise level ε is defined by where l is the length of the data vector y and N2S is called the noise-to-signal ratio. The corresponding result of the RFMP/ROFMP for the regularization parameter λ k and the noisy data vector y ε is called x ε k . Due to the convergence results for the RFMP/ROFMP (see (8)), we introduce the linear regularization operators R k : R l → H, and assume x k to be R k y and x ε k to be R k y ε , though this could certainly only be guaranteed for an infinite number of iterations.
Due to the importance of the regularization parameter, we summarize in the next section some methods for the choice of this parameter λ. For the comparison of the methods, we have to define the optimal regularization parameter λ kopt . We do this by minimizing the difference between the exact solution x and the regularized solution x ε k corresponding to the parameter λ k and noisy data.
Then we evaluate the results by computing the so-called inefficiency by where λ k * is the regularization parameter selected by the considered parameter choice method. For the computation of the inefficiency, we use the L 2 (Ω)-norm, since our numerical results led to a better distinction of the different inefficiencies than by using the H-norm. However, the tendency regarding 'good' and 'bad' parameters were the same in both cases. The closer the obtained inefficiency is to 1, the better the parameter choice method performs. The norms which occur in the several parameter choice methods can be computed with the help of the singular value decomposition. However, we use the singular values of Ψ (see (1) and (3)) for this purpose, because the singular value decomposition of F is unavailable. This certainly causes an inaccuracy in our calculations, but appears to be unavoidable for the sake of practicability. Table 1 shows the different parameter choice methods we tested. The tuning parameters are chosen in accordance to [1,2]. For the choice of the maximal indexK, see Section 4.5.

Specifications for the algorithm
In Sections 2.1 and 2.2, we mentioned that we need to define stopping criteria for our algorithm. We state the following stopping criteria for the RFMP and ROFMP (see also Algorithms 2.1 and 2.2).

Selection criterion
Specifications Discrepancy Principle (DP) Choose the first k such that Tuning parameter τ > 1.
In the case of the ROFMP, we choose K = 200 for the restart. Fig. 1 shows two data grids which we use for our experiments. First of all, the Reuter grid (see [38]) is an example of a regular data grid on the sphere. Second, we have a set of irregularly distributed data points on a grid which we refer to as the scattered grid in the following and which was first used in [42]. The latter grid tries to imitate the distribution of measurements along the tracks of a satellite. It possesses additional shorter tracks and, thus, a higher accumulation of data points at the poles and only fewer tracks in a belt around the equator.

Noise generation
For our various scenarios, we get our noisy data if we add white noise to our data values or we add coloured noise that is obtained by an autoregression process. Additionally, we test some local noise.

White noise scenario
For white noise, we add Gaussian noise corresponding to a certain noise-to-singal ratio N2S to the particular value of each datum, that means we get our noisy data by where y i are the components of y and i ∼ N (0, 1), that means every i is a standard normally distributed random variable.

Coloured noise scenario
Since our scattered grid tries to imitate tracks of satellites, we can assume that we have a chronology of the data points for each track. To obtain some sort of coloured noise, we use an autoregression process of order 1 (briefly: AR(1)process, see [6]) with whom we simulate correlated noise. A stochastic process { i , i ∈ Z} is called an autoregressive process of order 1, N (0, 1). In the case of our simulation, we start with 1 ∼ N (0, 1) and run the recursion for a fixed α ∈ (−1, 1), which we determined at random.
For each track of the scattered grid, we apply this autoregression process (for the tracks, see Fig. 2) and obtain y ε i as in (32) using the i from above.

Local noise scenario
For the local noise, we choose a certain area and add white noise with an N2S = 5% relative to the particular value to each data point. To the values of the remaining data points we add white noise with an N2S = 1%. We choose this area as illustrated in Fig. 3. The choice of this area is a very rough approximation of the domain of the South Atlantic Anomaly, where a dip in the Earth's magnetic field exists (see e.g. [17]). Since only a few points of our grid would have been in the actual domain, we extended the area towards the South pole.

Regularization parameters
We constructed the admissible values λ k for the parameter choice as a monotonically decreasing sequence with 100 values from λ 1 = 1 to λ 100 = 10 −14 and a logarithmically equal spacing in the following way Here, λ 0 = 1.3849 and q λ = 0.7221. The test scenarios are chosen such that the parameter range lies between 1 and 10 −14 and includes the optimal parameter away from the boundaries λ 1 and λ 100 . For the choice of the parameters λ k , we refer to [1,2].

Maximal index
Most parameter choice methods either increase the index k until a certain condition is satisfied or minimize a certain function for all regularization parameters λ, i.e. after our discretization (see (33)) they minimize for all k (see Table 1). For some methods like the quasi-optimality criterion, the values of k have to be constrained by a suitable maximal indexK which must be chosen such that k opt <K. To increase computational efficiency, such a maximal index can be used for other methods as well without changing their performance. As in [1,2], we define this maximal index bŷ where E x k − x ε k 2 = ε 2 ρ 2 (k) is the variance of the regularized solution corresponding to noisy data and ε 2 ρ 2 (∞) is its largest value. It is well-known that, in the case of white noise, ρ(k) for the Tikhonov-Phillips regularization is generally given by Since our singular values occur with a multiplicity of 2n + 1 and we restrict our tests to n = 0, . . . , 100, the sum above in our tests is given by For any coloured noise, we use the estimate (cf. [2]) with two independent data sets y ε 1 , y ε 2 for the same regularization parameter λ k . Note that x ε k,1 , x ε k,2 are the regularized solutions corresponding to the parameter λ k and the noisy data sets y ε 1 , y ε 2 .

Comparison of the methods
For the error comparison, we compute the inefficiency (see (31)) in each scenario (see Table 2 for an overview) for each parameter choice method and compare the inefficiencies. We generate 32 data sets for each of the eleven scenarios, i.e. we run each algorithm for 352 times for a single regularization parameter. Figs. 4 to 14 show the inefficiencies, collected based on the parameter choice methods. The red middle band in the box is the median and the red + symbol shows outliers. The boxplots of our results are plotted at a logarithmic scale.

Discrepancy principle (DP)
We can see from Fig. 4 that the DP leads to results which are in the range from good to acceptable in all test cases. It yields better results with a more uniformly distributed grid.  The results for the TDP (see Fig. 5) for all test cases are rather poor. We can remark that the results get better with a more uniformly distributed data grid. Furthermore, the coloured noise leads to slightly bigger boxes than the white noise.

Quasi-optimality criterion (QOC)
In Fig. 6, the inefficiencies of the QOC show that the performance of this method is rather poor. In the case of the Reuter grid, the results reach from good to mediocre in contrast to the scattered grid.

L-curve method (LC)
The LC (see Fig. 7) yields good results in all test cases. We can remark that there are a few outliers and bigger boxes for the test cases with coloured noise and the scattered grid.

Extrapolated Error method (EEM)
The EEM yields acceptable to rather poor results (see Fig. 8). We cannot observe any dependency on the grid or the kind of noise related to the acceptable results. Moreover, in the test case with a height of 300km and an N2S of 5% with coloured noise we have some outliers for the RFMP and a large box for the ROFMP.

Residual method (RM)
The results for the RM (see Fig. 9) are good to acceptable in all test cases. We only have a few minor outliers.

Generalized maximum likelihood (GML)
In Fig. 10, we can see that the GML leads to acceptable results only in the case of the Reuter grid. In all cases of the scattered grid, its performance is rather bad.

Generalized cross validation (GCV)
From Fig. 11, we can observe that the GCV yields good results in all test cases. We only have, in the case of the ROFMP, some minor outliers. It yields the best results with a more regularly distributed data grid.

Robust generalized cross validation (RGCV)
The RGCV yields good to acceptable results (see Fig. 12) which get slightly worse and show a larger variance for a higher N2S or coloured noise scenarios.

Strong robust generalized cross validation (SRGCV)
The SRGCV (see Fig. 13) has good to acceptable results in all the test cases which are a little bit worse than for the RGCV. The Reuter grid leads to good results whereas the scattered grid seems to be more difficult to handle by the method.

Modified generalized cross validation (MGCV)
The inefficiencies for the MGCV (see Fig. 14) for the test cases with white noise and the Reuter grid are good. In particular, in several of the cases with coloured   noise the boxes are so big that they partially do not fit in the figure. Obviously, we get here a very large distribution of the inefficiencies. These cases seem to be very hard to handle for this method.

Plots of the results
In this section, we show briefly the approximations of the gravitational potential which we obtain by the RFMP and the ROFMP for one typical noisy data set considering a good or a rather poor parameter choice. For the test case (500km, 5%, coloured noise, scattered grid) with α = 0.54 for the AR(1)-process, Fig. 15 shows the approximation which we obtain by the RFMP for the optimal regularization parameter λ 29 and the difference to the EGM2008 up to degree 100. Fig. 16 shows the approximation belonging to the regularization parameter λ 22 which is chosen by the GCV. In Fig. 17, we can see the approximation belonging to the parameter λ 43 which is chosen by the MGCV. We can see that the MGCV chooses the regularization parameter too small and with this choice we obtain a solution which is underregularized. North-South oriented anomalies occur in the reconstruction which appear to be artefacts due to the noise along the simulated satellite tracks. In contrast, the approximation of the potential for the GCV-based parameter is only slightly worse than the result for the optimal parameter. Furthermore, we show the same test case as above but with the approximation from the ROFMP with α = 0.56 in the AR(1)-process. Fig. 18 shows the approximation for the optimal parameter λ 29 and the difference to EGM2008. In Fig. 19, we see the approximation which belongs to the parameter λ 22 which is chosen by the GCV. Fig. 20 shows the approximation with the regularization parameter λ 9 which is chosen by the GML.
Here, the GML chooses a regularization parameter which is too large that means our approximation is overregularized. We get less information and details about the gravitational potential. Essential details such as signals due to the Andes or the region around Indonesia occur in the difference plot -much    for the different regularization parameters which were chosen by the considered strategies. The horizontal axis states the index k of the regularization parameter λ k . The plots refer to the same scenario as Figures 15 to 20. The arrows show the parameters which are chosen by the methods. The diagrams confirm our observations that the GCV and the LC yield parameters which are closest to the (theoretical) optimal parameter. We obtain almost equally good results for the DP, the RM and the RGCV.

Conclusion and outlook
We tested parameter choice methods for the regularized (orthogonal) functional matching pursuit (RFMP/ROFMP). For the evaluation of the parameter choice methods, we constructed eleven different test cases with different satellite heights, data grids, noise types and noise-to-signal ratios (see Table 2) for the RFMP and ROFMP. For each test case, we generated 32 noisy data sets. Altogether we ran each algorithm for 352 data sets and for each data set for 100 different regularization parameters, that means each algorithm was applied  35200 times.
Our study shows that the GCV, the LC, the RM, the RGCV and the SRGCV yield the best results in all test cases. The DP provides good to acceptable results. The performance of the QOC seriously depends on the data grid, that means a less regularly distributed grid does not lead to good results. In our experiments, the QOC had good results with the Reuter grid. The MGCV also obtains both good and rather poor results in dependency on the grid and kind of noise we used. Here, the irregularly distributed scattered grid and the coloured noise did not yield good results. At last, the TDP, the EEM and the GML did not always lead to good results in our test cases.
We want to remark that in average our results were better than in [1] and [2] for all methods. Some possible reasons for that can be: the coloured noise in our test cases was different and maybe easier to handle for the methods than in the two papers, because we only had an AR(1)-process. There is a further difference to the other cases in relation to the problem itself. Here we had a data grid given which corresponds to a spatial discretization of the problem. Furthermore, the RFMP and the ROFMP are iterative methods and use stopping criteria which are also some kind of regularization. Since we stop the algorithm at a certain point we do not obtain the approximation of the potential in the limit. For these reasons, the outcomes of our experiments and of those in [1,2] are not really comparable.
The purpose of this paper is to provide a first guideline for the parameter choice for the RFMP and the ROFMP. Certainly, further experiments should be designed in the future. Maybe, the distribution of our regularization parameters λ k could be improved such that the relevant parameters themselves are not too wide apart. Perhaps, the interval from 1 to 10 −14 should be chosen smaller such that the parameters are closer together.
Future changes in the implementation could also be the use of other stopping criteria for the RFMP. Furthermore, an enhancement could be the extension of the dictionary to localized trial functions. In addition, the generation of the coloured noise can, for example, use an AR(k)-process for k > 1 or completely different types of noise can be considered. Finally, we can test other tuning parameters for the methods as far as these are required. Besides, it is possible that the performance of the investigated parameter choice methods in the RFMP/ROFMP depends on the considered inverse problem.