Performance Analysis of Trust Region Subproblem Solvers for Limited-Memory Distributed BFGS Optimization Method

Gao, Guohua; Florez, Horacio; Vink, Jeroen C.; Wells, Terence J.; Saaf, Fredrik; Blom, Carl P. A.

doi:10.3389/fams.2021.673412

ORIGINAL RESEARCH article

Front. Appl. Math. Stat., 24 May 2021

Sec. Optimization

Volume 7 - 2021 | https://doi.org/10.3389/fams.2021.673412

This article is part of the Research TopicData Science Applications to Inverse and Optimization Problems in Earth ScienceView all 9 articles

Performance Analysis of Trust Region Subproblem Solvers for Limited-Memory Distributed BFGS Optimization Method

Guohua Gao¹*

Horacio Florez¹

Jeroen C. Vink²

Terence J. Wells²

Fredrik Saaf¹

Carl P. A. Blom²

¹Shell Global Solutions (United States) Inc., Houston, TX, United States
²Shell Global Solutions International B.V., Amsterdam, Netherlands

The limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) optimization method performs very efficiently for large-scale problems. A trust region search method generally performs more efficiently and robustly than a line search method, especially when the gradient of the objective function cannot be accurately evaluated. The computational cost of an L-BFGS trust region subproblem (TRS) solver depend mainly on the number of unknown variables (n) and the number of variable shift vectors and gradient change vectors (m) used for Hessian updating, with m << n for large-scale problems. In this paper, we analyze the performances of different methods to solve the L-BFGS TRS. The first method is the direct method using the Newton-Raphson (DNR) method and Cholesky factorization of a dense n × n matrix, the second one is the direct method based on an inverse quadratic (DIQ) interpolation, and the third one is a new method that combines the matrix inversion lemma (MIL) with an approach to update associated matrices and vectors. The MIL approach is applied to reduce the dimension of the original problem with n variables to a new problem with m variables. Instead of directly using expensive matrix-matrix and matrix-vector multiplications to solve the L-BFGS TRS, a more efficient approach is employed to update matrices and vectors iteratively. The L-BFGS TRS solver using the MIL method performs more efficiently than using the DNR method or DIQ method. Testing on a representative suite of problems indicates that the new method can converge to optimal solutions comparable to those obtained using the DNR or DIQ method. Its computational cost represents only a modest overhead over the well-known L-BFGS line-search method but delivers improved stability in the presence of inaccurate gradients. When compared to the solver using the DNR or DIQ method, the new TRS solver can reduce computational cost by a factor proportional to $n^{2} / m$ for large-scale problems.

Introduction

Decision-making tools based on optimization procedures have been successfully applied to solve practical problems in a wide range of areas. An optimization problem is generally defined as minimizing (or maximizing) an objective function $f (x)$ within a user defined search domain $x \in Ω$ , and subject to some linear or nonlinear constraints, where x is an n-dimensional vector that contains all controllable variables.

In the oil and gas industry, an optimal business development plan requires robust production optimization because of the considerable uncertainty of subsurface reservoir properties and volatile oil prices. Many papers have been published on the topic of robust optimization and their applications to cyclic CO₂ flooding through the Gas-Assisted Gravity Drainage process [1], well placement optimization in geologically complex reservoirs [2], optimal production schedules of smart wells for water flooding [3] and in naturally fractured reservoirs [4], just mentioning a few of them as examples.

Some researchers formulated the optimization problem under uncertainty as a single objective optimization problem, e.g., only maximizing the mean of net present value (NPV) for simplicity by neglecting the associated risk. However, it is recommended to formulate robust optimization as a bi-objective optimization problem for consistency and completeness. A bi-objective optimization problem is generally defined as minimizing (or maximizing) two different objective functions, $f_{1} (x)$ and $f_{2} (x)$ , within a user-defined search domain $x \in Ω$ , and subject to some linear or nonlinear constraints. For example, we may maximize the mean value, denoted by $f_{1} (x)$ , and minimize the standard deviation, denoted by $f_{2} (x)$ , of NPV. For a bi-objective optimization problem, the optimal solutions are defined as the Pareto optimal solutions (or Pareto front). It is a very challenging task to find multiple optimal solutions on the Pareto front [5, 6].

It is also a very challenging task to properly characterize the uncertainty of reservoir properties (e.g., porosity and permeability in each grid-block) and reliably quantify the ensuing uncertainty of production forecasts (e.g., production rates of oil, gas, and water phases) by conditioning to historical production data and 4D seismic data [7, 8], which requires generating multiple conditional realizations by minimizing a properly defined objective function, e.g., using the randomized maximum likelihood (RML) method [9].

When an adjoint-based gradient of the objective function is available [10–12], we may apply a gradient-based optimization method, e.g., the Broyden–Fletcher–Goldfarb–Shanno (BFGS) optimization method [13, 14]. However, the adjoint-based gradient is not available for many commercial simulators and not available for integer type or discrete variables (e.g., well locations defined by grid-block indices). In such a case, we must apply a derivative-free optimization (DFO) method [15–18]. For problems with smooth objective functions and continuous variables, model-based DFO methods using either radial-basis function [19] or quadratic model [16, 20, 21] performs more efficiently than other DFO methods such as direct pattern search methods [22] and stochastic search methods [23].

Traditional optimization methods only locate a single optimal solution, and they are referred to as single-thread optimization methods in this paper. It is unacceptably expensive to use a single-thread optimization method to locate multiple optimal solutions on the Pareto front or generate multiple RML samples [24]. To overcome the limitations of single-thread optimization methods, Gao et al. [17] developed a local-search distributed Gauss-Newton (DGN) DFO method to find multiple best matches concurrently. Later, Chen et al. [25] modified the local-search DGN optimization method and generalized the DGN optimizer for global search. They also integrated the global-search DGN optimization method with the RML method to generate multiple RML samples in parallel.

Because multiple search points are generated in each iteration, finally, multiple optimal solutions can be found in parallel. These distributed optimization methods are referred to as multiple-thread optimization methods. Both the local- and global-search DGN optimization methods are only applicable to a specific optimization problem, i.e., history matching problem or least-squares optimization problem, of which the objective function can be expressed as a form of least-squares, $f (x) = \frac{1}{2} α x^{T} x + \frac{1}{2} y^{T} (x) y (x)$ , but they cannot be applied to other type or generic optimization problems where the objective function cannot be expressed as a form of least-squares. Furthermore, the DGN optimization methods may become less efficient for history matching problems when both the number of variables (n) and the number of observed data (m) are large (e.g., in the order of thousands or more).

Recently, Gao et al. [18] developed a well-paralleled distributed quasi-Newton (DQN) DFO method for generic optimization problems by introducing a generalized form of the objective function, $F (x, y) = f (x)$ . Here, x represents the vector of controllable variables or model parameters to be optimized (explicit variables) and y(x) denotes the vector of simulated responses (implicit variables) of a reservoir using explicit variables x. Using the generalized form of the objective function, the gradient of the objective function can be evaluated analytically by,

g (x) = \nabla_{x} F (x, y) + J^{T} (x) \nabla_{y} F (x, y) (1)

In Eq. 1, $J^{T} (x) = \nabla_{x} y^{T} (x)$ is the transpose of the sensitivity matrix. Using the generalized expression of the objective function F(x,y), different objective functions can be evaluated using the same y(x) that is simulated from the same reservoir model, e.g., for multi-objective optimization problems. A specific case of the generalized expression is $F (x, y) = y = f (x)$ , where the transpose of the sensitivity matrix becomes the gradient of the objective function.

The DQN optimization algorithm runs N_e optimization threads in parallel. In the k-th iteration, there are $N_{LC}^{(k)} \leq N_{e}$ non-converged threads. The DQN optimizer generates a new search points $x_{T}^{(i, k)} = x^{(i, k)} + s^{(i, k)}$ for each non-converged thread, where the search step $s^{(i, k)}$ is solved from a trust region subproblem (TRS). The $N_{LC}^{(k)}$ simulation cases are submitted to a high-performance computing (HPC) cluster in parallel, to generate corresponding simulated responses $y_{T}^{(i, k)} = y (x_{T}^{(i, k)})$ . Then, we evaluate the objective function $f_{T}^{(i, k)} = F (x_{T}^{(i, k)}, y_{T}^{(i, k)})$ and its associated partial derivatives $\nabla_{x} F^{(i, k)}$ and $\nabla_{y} F^{(i, k)}$ . If the new search point $x_{T}^{(i, k)}$ improves the objective function, i.e., $f_{T}^{(i, k)} < f^{(i, k)} = f (x^{(i, k)})$ , then we update $x^{(i, k + 1)} = x_{T}^{(i, k)}$ and $f^{(i, k + 1)} = f_{T}^{(i, k)}$ .

All simulation results generated by all DQN optimization threads in previous and current iterations are recorded in a training data set, and they are shared among all DQN optimization threads. The sensitivity matrix $J^{(i, k + 1)} = J (x^{(i, k + 1)})$ is approximated using a modified QR-method proposed by Gao, et al. [18] by linear interpolation of training data points that are closest to x^(i,k+1). Then, we approximate the gradient $g^{(i, k + 1)} = g (x^{(i, k + 1)})$ using Eq. 1. Finally, the Hessian H^(i,k+1) for each thread is updated using either the BFGS method or the symmetric rank-1 (SR1) method [14]. For simplicity, we will drop the superscript “i” (the optimization thread index) in the following discussions.

Both DGN and DQN optimization methods approximate the objective function by a quadratic model, and they are designed for problems with smooth objective function and continuous variables. Although their convergence is not guaranteed for problems with integer type variables (e.g., well location optimization), if those integers can be treated as truncated continuous variables, our numerical tests indicate that these distributed optimization methods can improve the objective function significantly and locate multiple suboptimal solutions for problems with integer type variables in only a few iterations.

The number of variables for real-world problems may vary from a few to thousands or even more, depending on the problem and the parameterization techniques employed. For example, the number of variables could be in the order of millions if we do not apply any parameter reduction techniques to reduce the dimension of some history matching problems (e.g., to tune permeability and porosity in each grid block). Because reservoir properties are generally correlated with each other with long correlation lengths, we may reduce the number of variables to be tuned to only a few hundred, e.g., using the principal component analysis (PCA) or other parameter reduction techniques [26].

In this paper, our focus is on performance analysis of different methods to solve the TRS formulated with the limited-memory BFGS (L-BFGS) Hessian updating method for unconstrained optimization problems. In the future, we will further integrate the new TRS solver with our newly proposed limited-memory distributed BFGS (L-DBFGS) optimization algorithm and then apply the new optimizer to some realistic oil/gas field optimization cases and benchmark its overall performance against other distributed optimization methods such as those only using the gradient [27]. We will also continue our investigation in the future for constrained optimization problems (e.g., variables with lower- and/or upper-bounds and problems with nonlinear constraints). Gao et al. [18] applied the popular Newton-Raphson method [28] to directly solve the TRS using matrix factorization, the DNR method in short. The DNR method is quite expensive when it is applied to solve $N_{e}$ TRSs of a distributed optimization method, especially for large-scale problems with thousands or more variables [29]. This paper follows the ideas and concepts presented in the book “Matrix Computation” [30]. Flops are used to quantify the volume of work associated with a computation, a count for floating-point operations of add, subtract, multiply, or divide. Computational cost (flops) of some commonly used algebraic operations and numerical methods are summarized in Table 1 for reference.

TABLE 1

TABLE 1. A summary of computational costs of commonly used algebraic operations.

For completeness, we discuss the compact representation of the L-BFGS Hessian updating formulation [31] and the algorithm to directly update the Hessian in the next section directly. In the third section, we present three different methods to solve the L-BFGS TRS: the DNR method, the direct method using inverse quadratic (DIQ) interpolation approach proposed by Gao et al. [32], and the technique using matrix inversion lemma (MIL) together with an efficient matrix updating algorithm. Some numerical tests and performance comparisons are discussed in the fourth section. We finally draw some conclusions in the last section.

The Limited-Memory Hessian Updating Formulation

Let x^(k) be the best solution obtained in the current (k-th) iteration, and f^(k), g^(k), and H^(k), respectively, the objective function, its gradient and Hessian evaluated at x^(k). The Hessian H^(k+1) is updated using the BFGS formulation as follows,

H^{(k +1)} = H^{(k)} + \frac{z^{(k)} {[z^{(k)}]}^{T}}{{[z^{(k)}]}^{T} s^{(k)}} - \frac{H^{(k)} s^{(k)} {[s^{(k)}]}^{T} H^{(k)}}{{[s^{(k)}]}^{T} H^{(k)} s^{(k)}} (2)

In Eq. 2, $s^{(k)} = x^{(k + 1)} - x^{(k)}$ and $z^{(k)} = g^{(k + 1)} - g^{(k)}$ . The Hessian H^(k+1) updated using Eq. 2 is guaranteed positive definite if the condition $ε^{(k)} = {[z^{(k)}]}^{T} s^{(k)} > c_{3} z^{(k)} s^{(k)}$ is satisfied where c₃>0 is a small positive number.

Compact Representation

To save memory usage and computational cost, the L-BFGS Hessian updating method limits the maximum history size or the maximum number of pairs (s^(k), z^(k)) used for the BFGS Hessian updating to L_M > 1. The recommended value of L_M ranges from 5 to 20, using smaller number for problems with more variables.

Let 1 ≤ l_k ≤ L_M denote the number of pairs of variable shift vectors and gradient change vectors, (s^(j), z^(j)) for j = 1,2,...l_k, used to update the Hessian using the L-BFGS method in the k-th iteration. Let $S^{(k)} = [s^{(1)}, s^{(2)}, \dots s^{(l_{k})}]$ be the n × l_k variable shift matrix and $Z^{(k)} = [z^{(1)}, z^{(2)}, \dots z^{(l_{k})}]$ the n × l_k gradient change matrix. Both matrices S^(k) and Z^(k) are updated iteration by iteration. Let m=2l_k and $V^{(k)} = [S^{(k)}, Z^{(k)}]$ is an n × m matrix. Let $A^{(k)} = {[S^{(k)}]}^{T} S^{(k)}$ and $B^{(k)} = {[S^{(k)}]}^{T} Z^{(k)}$ . We decompose B^(k) into three parts: the strictly lower triangular part L^(k), the strictly upper triangular part U^(k), and the diagonal part E^(k) that contains the main diagonals of B^(k), i.e.,

B^{(k)} = L^{(k)} + E^{(k)} + U^{(k)} (3)

The Hessian can be updated using the compact form of the L-BFGS Hessian updating formulation [31, 33–35] as follows,

H^{(k + 1)} = α^{(k)} I_{n} - V^{(k)} W^{(k)} {[V^{(k)}]}^{T} (4)

In Eq. 4, I_n is an n × n identity matrix and W^(k) is an m×m symmetric matrix defined as,

{[W^{(k)}]}^{- 1} = \frac{1}{α^{(k)}} {\begin{matrix} A^{(k)} L^{(k)} \\ {[L^{(k)}]}^{T} - α^{(k)} E^{(k)} \end{matrix}} (5)

In Eq. 4 and Eq. 5, $α^{(k)} = \frac{{‖ z^{(l_{k})} ‖}^{2}}{{[z^{(l_{k})}]}^{T} s^{(l_{k})}}$ is a scaling factor. It is required that a^(k) > a_cr > 0 using Eq. 4 and Eq. 5 to update the Hessian, where a_cr is a small positive number.

The Algorithm to Directly Update the Hessian Using the L-BFGS Compact Representation

Given L_M ≥ 1, k ≥ 1, l_k−1 ≥ 1, the thresholds c₃ > 0 and a_cr > 0, search step $s = x^{(k + 1)} - x^{(k)}$ and gradient change $z = g^{(k + 1)} - g^{(k)}$ , the two n × l_k−1 matrices, S^(k−1) and Z^(k−1), we can update S^(k), Z^(k) and directly compute the Hessian H^(k+1) using the compact representation of the L-BFGS Hessian updating Eq. 4, as summarized in Algorithm-1.

Algorithm-1: Updating the Hessian directly using the L-BFGS compact representation

1. Compute ||s||, ||z||, $ε^{(k)} = z^{T} s$ and $α^{(k)} = \frac{‖ z^{2} ‖}{ε^{(k)}}$ if $ε^{(k)} > c_{3} ‖ s ‖ z ‖$ ;

2. If $ε^{(k)} \leq c_{3} ‖ s ‖ z ‖$ or $α^{(k)} \leq α_{cr}$ , set $l_{k} = l_{k - 1}$ , $S^{(k)} = S^{(k - 1)}$ , $Z^{(k)} = Z^{(k - 1)}$ , and goto step 4;

3. Otherwise,

a. Set $l_{k} = L_{M}$ and remove the first column fromS^(k−1)andZ^(k−1)if $l_{k - 1} = L_{M}$ ;else set $l_{k} = l_{k - 1} + 1$ ;

b. Update $S^{(k)} = [S^{(k - 1)} s]$ and $Z^{(k)} = [Z^{(k - 1)} z]$ ;

4. Form $V^{(k)} = [S^{(k)} Z^{(k)}]$ ;

5. Compute $A^{(k)} = {[S^{(k)}]}^{T} S^{(k)}$ and $B^{(k)} = {[S^{(k)}]}^{T} Z^{(k)}$ ;

6. Form the matrix $W_{I}^{(k)} = {[W^{(k)}]}^{- 1}$ usingEq. 5;

7. Compute $W^{(k)} = {[W_{I}^{(k)}]}^{- 1}$ using Cholesky factorization;

8. Compute $U^{(k)} = W^{(k)} {[V^{(k)}]}^{T}$ ;

9. Compute $T^{(k)} = V^{(k)} U^{(k)}$ ;

10. Update $H^{(k + 1)} = α^{(k)} I_{n} - T^{(k)}$ .

11. Stop.

The computational cost to directly update the Hessian using the L-BFGS method as described in Algorithm-1 mainly depends on the number of variables (n) and the number of vectors used to update the Hessian (m = 2l_k). We should reemphasize that m is updated iteratively but limited to m ≤ 2L_k. The computational cost is $c_{1} = 2 m n^{2} + (7 + 3 m^{2}) n + O (m^{3})$ (flops), which includes the following seven operations: 1) computing ‖s‖, ‖z‖, and $ε^{(k)} = z^{T} s$ in step 1 (6n flops); 2) computing A^(k) and B^(k) in step 5 (m²n flops); 3) forming the matrix $W_{I}^{(k)}$ in step 6 (m² flops); 4) computing W^(k) in step 7 (O(m³) flops); 5) compute U^(k) in step 8 (2m²n flops); 6) compute T^(k) in step 9 ( $2 m n^{2}$ flops); and 7) updating $H^{(k + 1)}$ in step 10 (n flops).

Solving the L-BFGS Trust Region Subproblem

The Trust Region Subproblem

In the neighborhood of the best solution x^(k), the objective function f(x) can be approximated by a quadratic function of $s = x - x^{(k)}$ ,

q^{(k)} (s) = f^{(k)} + {[g^{(k)}]}^{T} s + \frac{1}{2} s^{T} H^{(k)} s (6)

If the Hessian $H^{(k)}$ is positive definite, then $q^{(k)} (s)$ has a unique minimum s^*^(k), which is the solution of $H^{(k)} s = - g^{(k)}$ ,

s^{* (k)} = - {[H^{(k)}]}^{- 1} g^{(k)} (7)

When a line search strategy is applied, we accept the full search step $s^{* (k)}$ by setting $x^{(k + 1)} = x^{(k)} + s^{* (k)}$ if it improves the objective function sufficiently (e.g., satisfying the two Wolfe conditions). Otherwise, we can find a better search step size $0 < γ^{(k)} \leq 1$ such that $x^{(k + 1)} = x^{(k)} + γ^{(k)} s^{* (k)}$ improves the objective function sufficiently. If the gradient $g^{(k)}$ can be accurately evaluated, it has theoretically proved that a line search strategy can converge [14]. However, for most real-world optimization problems, the gradient cannot be accurately evaluated, and as discussed by Gao et al. [16], a trust region search strategy performs more robustly and more efficiently than a line search strategy.

The trust region search step $s^{* (k)}$ for the next iteration is the global minimum of the quadratic model $q^{(k)} (s)$ within a ball-shaped trust region with radius $Δ^{(k)} \geq Δ_{\min} > 0$ which is solved from the following trust region subproblem (TRS) [28],

\min_{s} q^{(k)} (s) = f^{(k)} + {[g^{(k)}]}^{T} s + \frac{1}{2} s^{T} H^{(k)} s subject to s_{2} \leq Δ^{(k)} (8)

If the solution $s^{* (k)}$ given by Eq. 7 satisfies ${‖ s^{* (k)} ‖}_{2} \leq Δ^{(k)}$ , then $s^{* (k)}$ is the solution of the TRS defined in Eq. 8. Otherwise, the solution of the TRS must lie on the boundary of $‖ s ‖_{2} = Δ^{(k)}$ , and we should solve the Lagrange multiplier $λ^{* (k)}$ together with the search step $s^{* (k)}$ from the following nonlinear TRS equations,

[H^{(k)} + λ I_{n}] s (λ) = - g^{(k)} (9)

π (λ) = {[s (λ)]}^{T} s (λ) = {[Δ^{(k)}]}^{2} (10)

For the trivial case with $H^{(k)} = α I_{n}$ , we have $s (λ) = - \frac{g^{(k)}}{α + λ}$ and $π (λ) = \frac{{‖ g^{(k)} ‖}^{2}}{{(α + λ)}^{2}}$ . The solution of the TRS defined in Eq. 8 is $λ^{*} = 0$ and $s^{* (k)} = - g^{(k)} / α$ if ${‖ g^{(k)} ‖}^{2} \leq α Δ^{(k)}$ or $λ^{*} = \frac{{‖ g^{(k)} ‖}^{2}}{Δ^{(k)}} - α$ and $s^{* (k)} = - g^{(k)} \frac{Δ^{(k)}}{{‖ g^{(k)} ‖}^{2}}$ otherwise.

We may apply the popular DNR method to solve the TRS when $H^{(k)}$ is a dense matrix. However, the DNR method is quite expensive for large-scale problems and we should seek and apply a more efficient method to solve the TRS.

Solving the L-BFGS TRS Directly Using the Newton-Raphson Method

The DNR method solves the TRS defined in Eq. 8 using the Cholesky decomposition. It applies the Newton-Raphson method to directly solve the nonlinear TRS equation [28], $ϕ (λ) = \frac{1}{s (λ)} - \frac{1}{Δ} = 0$ , iteratively, which requires computing the first order derivative $ϕ^{'} (λ) = - \frac{v {(λ)}^{2}}{s {(λ)}^{3}}$ where v is solved from $L^{T} v = s$ together with the Cholesky factorization of $D = H^{(k)} + λ I_{n} = L L^{T}$ or LU-decomposition.

Given the trust region size $Δ = Δ^{(k)} > 0$ , threshold of convergence $δ_{cr} > 0$ , maximum number of iterations allowed $N_{TRS, \max} > 0$ , Hessian $H = H^{(k)}$ and gradient $g = g^{(k)}$ evaluated at the current best solution $x^{(k)}$ , both the Lagrange multiplier $λ^{*}$ and the trust region search step $s^{*} = s (λ^{*})$ can be solved from the TRS defined in Eq. 8, using the DNR method as summarized in Algorithm-2.

Algorithm-2: Solving the L-BFGS TRS Using the DNR Method

1. Initialize $l = 0$ , $λ_{0} = 0$ ;

2. Compute $D = H + λ_{0} I_{n}$ and Cholesky decomposition $D = L L^{T}$ ;

3. Solve $u^{*}$ from $Lu = - g$ , $s^{*}$ from $L^{T} s = u^{*}$ , and $v^{*}$ from $L^{T} v = s^{*}$ ;

4. Compute $‖ s^{*} ‖$ and $‖ v^{*} ‖$ ;

5. If $‖ s^{*} ‖ \leq Δ$ , then accept $λ^{*} = λ_{0}$ and go to step 8;

6. Set $δ = | 1 - \frac{‖ s^{*} ‖}{Δ} |$ ;

7. Repeat steps (a) through (f) below, until convergence ( $δ \leq δ_{cr}$ or $l > N_{TRS, \max}$ ):

a. Update $λ_{l + 1} = λ_{l} + {(\frac{‖ s^{*} ‖}{‖ v^{*} ‖})}^{2} \frac{‖ s^{*} ‖ - Δ}{Δ}$ ;

b. Compute $D = H + λ_{l + 1} I_{n}$ and Cholesky decomposition $D = L L^{T}$ ;

c. Solve $u^{*}$ from $Lu = - g$ , $s^{*}$ from $L^{T} s = u^{*}$ , and $v^{*}$ from $L^{T} v = s^{*}$ ;

d. Compute $‖ s^{*} ‖$ and $‖ v^{*} ‖$ ;

e. Update $δ = | 1 - \frac{‖ s^{*} ‖}{Δ} |$ ;

f. Set $l = l + 1$ .

8. Stop.

Let $N_{DNR}$ denote the number of iterations required for the DNR TRS solver to converge. The total computational cost (flops) to solve the TRS using Algorithm-2 is, $c_{2} = (5 n + 3 n^{2} + \frac{n^{3}}{3}) (N_{DNR} + 1)$ , including the following three operations: (1) computing $D = H + λ_{k + 1} I_{n}$ and Cholesky decomposition $D = L L^{T}$ in step 2 and step 7(b) ( $n + \frac{n^{3}}{3}$ flops); (2) solving $u^{*}$ from $Lu = - g$ , $s^{*}$ from $L^{T} s = u^{*}$ , and $v^{*}$ from $L^{T} v = s^{*}$ in step 3 and step 7(c) ( $3 n^{2}$ flops); (3) computing $‖ s^{*} ‖$ and $‖ v^{*} ‖$ in step 4 and step 7(d) ( $4 n$ flops).

The total computational cost (flops) used to solve the L-BFGS TRS using the DNR method (Algorithm-2) together with the L-BFGS Hessian updating method (Algorithm-1) is,

c_{DNR} = c_{1} + c_{2} = 2 m n^{2} + (7 + 3 m^{2}) n + (5 n + 3 n^{2} + \frac{n^{3}}{3}) (N_{DNR} + 1) + O (m^{3}) (11)

Our numerical results indicate that the DNR TRS solver may fail when tested on the well-known Rosenbrock function, especially for problems with large $n$ . The root cause for failure of convergence using the DNR method is the same as in the GNTRS solver using the traditional Newton-Raphson method as discussed by Gao et al. [36]. Very small value of $| ϕ^{'} (λ) |$ may result in a very large search step and thus result in failure of converging. Gao et al. [36] proposed integrating the DNR method with a bisection line search to overcome this issue.

The Inverse Quadratic Model Interpolation Method to Directly Solve the L-BFGS TRS

Instead of applying the Newton-Raphson method which requires evaluating $ϕ^{'} (λ)$ , Gao et al. [32] proposed a method to directly solve the TRS using inverse quadratic model interpolation (called the DIQ method), i.e., approximating $π (λ) = {[s (λ)]}^{T} s (λ)$ by the following inverse quadratic function,

q_{INV} (λ) = \frac{b}{{(λ + a)}^{2}} (12)

Both coefficients of $a$ and $b$ in Eq. 12 can be determined by interpolating the values of $π (λ)$ evaluated at two different points $π_{1} = π (λ_{1})$ and $π_{2} = π (λ_{2})$ with $0 \leq λ_{1} < λ_{2}$ .

In the first iteration, we set $λ_{1} = λ_{\min} = 0$ and accept $λ_{\min} = 0$ as the solution if $π_{1} \leq Δ^{2}$ . Otherwise, $π_{1} > Δ^{2}$ holds and we set $λ_{2} = λ_{\max} = \max [0, \frac{‖ g^{(k)} ‖}{Δ} - α]$ and $π_{2} < Δ^{2}$ holds if it is not accepted as the solution. In the following iteration, either $λ_{1}$ or $λ_{2}$ will be updated accordingly such that the two conditions $π_{1} > Δ^{2}$ and $π_{2} < Δ^{2}$ always hold.

Let $ρ = \frac{\sqrt{π_{1}}}{\sqrt{π_{1}} - \sqrt{π_{2}}} \geq 1$ and the following solution of $q_{INV} (λ) = Δ^{2}$ will be used as the trial search point for the next iteration,

λ^{*} = λ_{2} - ρ (1 - \frac{\sqrt{π_{2}}}{Δ}) (λ_{2} - λ_{1}) (13)

We either update $λ_{1} = λ^{*}$ if $π (λ^{*}) > Δ^{2}$ or update $λ_{2} = λ^{*}$ if $π (λ^{*}) < Δ^{2}$ iteratively until convergence. We accept $λ^{*}$ as the desired solution and terminate the iterative process either when $| \frac{\sqrt{π (λ^{*})}}{Δ} - 1 | < δ_{cr}$ , the tolerance for convergence, or when the number of iterations used to solve the TRS ( $N_{DIQ}$ ) reaches the user specified maximum iteration number ( $N_{TRS, \max}$ ), i.e., $N_{DIQ} \geq N_{TRS, \max}$ .

Given the scaling factor $α > 0$ , trust region size $Δ > 0$ , threshold of convergence $δ_{cr} > 0$ , maximum number of iterations allowed $N_{TRS, \max} > 0$ , Hessian $H$ and gradient $g$ , both the Lagrange multiplier $λ^{*}$ and the trust region search step $s^{*} = s (λ^{*})$ can be solved from the TRS defined in Eq. 8, using the DIQ method as summarized in Algorithm-3.

Algorithm-3: Solving the L-BFGS TRS Using the DIQ Method

(1) Compute $‖ g ‖$ ;

(2) Calculate $λ_{\max} = \max {0, \frac{g}{Δ} - α}$ ;

(3) Initialize $k = 0$ , $λ_{1} = λ_{\min} = 0$ and $λ_{2} = λ_{\max}$ ;

(4) Compute $D = H + λ_{1} I_{n}$ and Cholesky decomposition $D = L L^{T}$ ;

(5) Solve $u_{1}$ from $Lu = - g$ and $s_{1}$ from $L^{T} s = u_{1}$ ;

(6) Compute $π_{1} = π (λ_{1}) = s_{1}^{T} s_{1}$ ;

(7) If $π_{1} \leq Δ^{2}$ , then accept $λ^{*} = λ_{1}$ and go to step 12;

(8) If $π_{1} > Δ^{2}$ , then

a. Compute $D = H + λ_{2} I_{n}$ and Cholesky decomposition $D = L L^{T}$ ;

b. Solve $u_{2}$ from $Lu = - g$ and $s_{2}$ from $L^{T} s = u_{2}$ ;

c. Compute $π_{2} = π (λ_{2}) = s_{2}^{T} s_{2}$ ;

(9) If $| 1 - \frac{\sqrt{π_{1}}}{Δ} | < | 1 - \frac{\sqrt{π_{2}}}{Δ} |$ , then $δ = | 1 - \frac{\sqrt{π_{1}}}{Δ} |$ , $λ^{*} = λ_{1}$ ;

(10) Otherwise, $δ = | 1 - \frac{\sqrt{π_{2}}}{Δ} |$ , $λ^{*} = λ_{2}$ ;

(11) Repeat steps (a) through (i) below, until convergence ( $δ \leq δ_{cr}$ or $k > N_{TRS, \max}$ ):

a. Calculate $ρ = \frac{\sqrt{π_{1}}}{\sqrt{π_{1}} - \sqrt{π_{2}}}$ ;

b. Calculate $λ^{*} = λ_{2} - ρ (1 - \frac{\sqrt{π_{2}}}{Δ}) (λ_{2} - λ_{1})$ ;

c. Compute $D = H + λ^{*} I_{n}$ and Cholesky decomposition $D = L L^{T}$ ;

d. Solve $u^{*}$ from $Lu = - g$ and $s^{*}$ from $L^{T} s = u^{*}$ ;

e. Compute $π^{*} = π (λ^{*}) = {[s^{*}]}^{T} s^{*}$ ;

f. Update $δ = | 1 - \frac{\sqrt{π^{*}}}{Δ} |$ ;

g. If $\sqrt{π^{*}} > Δ$ , update $λ_{1} = λ^{*}$ and $π_{1} = π^{*}$ ;

h. Otherwise, update $λ_{2} = λ^{*}$ and $π_{2} = π^{*}$ ;

i. $k \leftarrow k + 1$

(12) Stop.

Let $N_{DIQ}$ denote the number of iterations required for the DIQ TRS solver to converge. The total computational cost to solve the L-BFGS TRS using Algorithm-3 together with Algorithm-1 is,

c_{IQ} = 2 m n^{2} + (7 + 3 m^{2}) n + (5 n + 2 n^{2} + \frac{n^{3}}{3}) (N_{DIQ} + 1) + O (m^{3}) (14)

Generally, the DIQ method converges faster than the DNR method and it performs more efficiently and robustly than the DNR method, especially for large scale problems. Both the DNR method and the DIQ method become quite expensive for large-scale problems.

Using Matrix Inversion Lemma (MIL) to Solve the L-BFGS TRS

To save both memory usage and computational cost, Gao, et al. [32, 36, 37] proposed an efficient algorithm to solve the Gauss-Newton TRS (GNTRS) for large-scale history matching problems using the matrix inversion lemma (or the Woodbury matrix identity). With appropriate normalization of both parameters and residuals, the Hessian of the objective function for a history matching problem can be approximated by the well-known Gauss-Newton equation as follows,

H^{(k + 1)} = α I_{n} + V^{(k)} {[V^{(k)}]}^{T} (15)

In Eq. 15, $V^{(k)} = {[J^{(k + 1)}]}^{T}$ is an $n \times m$ matrix, the transpose of the sensitivity matrix $J^{(k + 1)}$ that is evaluated at the current best solution $x^{(k + 1)}$ . Here, $m$ denotes the number of observed data to be matched and it is not the same $m$ as used in the L-BFGS Hessian updating Eq. 4.

The GNTRS solver proposed by Gao, et al. [32, 36, 37] using the matrix inversion lemma (MIL) has been implemented and integrated with the distributed Gauss-Newton (DGN) optimizer. Because the compact representation of the L-BFGS Hessian updating formula Eq. 4 is similar to Eq. 15, we follow the similar idea as proposed by Gao, et al. [32, 36, 37] to compute ${[H^{(k + 1)} + λ I_{n}]}^{- 1}$ by applying the matrix inversion lemma and then solve the L-BFGS trust region search step $s (λ)$ .

{[H^{(k + 1)} + λ I_{n}]}^{- 1} = \frac{1}{α^{(k)} + λ} I_{n} - \frac{1}{{[α^{(k)} + λ]}^{2}} V^{(k)} {\frac{1}{α^{(k)} + λ} {[V^{(k)}]}^{T} V^{(k)} - {[W^{(k)}]}^{- 1}}^{- 1} {[V^{(k)}]}^{T} (16)

Let $P^{(k)} = {[V^{(k)}]}^{T} V^{(k)}$ and $u^{(k)} = {[V^{(k)}]}^{T} g^{(k)}$ . We first solve $v (λ)$ from,

{\frac{1}{α^{(k)} + λ} P^{(k)} - {[W^{(k)}]}^{- 1}} v (λ) = u^{(k)} . (17)

Then, we compute the trust region search step $s (λ)$ and $π (λ) = ‖ s (λ) ‖^{2}$ as,

s (λ) = - {[H^{(k + 1)} + λ I_{n}]}^{- 1} g^{(k)} = - \frac{1}{α^{(k)} + λ} g^{(k)} + \frac{1}{{[α^{(k)} + λ]}^{2}} V^{(k)} v (λ), (18)

π (λ) = \frac{{‖ g^{(k)} ‖}^{2}}{{[α^{(k)} + λ]}^{2}} + \frac{{[v (λ)]}^{T} P^{(k)} v (λ)}{{[α^{(k)} + λ]}^{4}} - \frac{2 {[u^{(k)}]}^{T} v (λ)}{{[α^{(k)} + λ]}^{3}} . (19)

From Eq. 15, we have $P^{(k)} v (λ) = [α^{(k)} + λ] {u^{(k)} + {[W^{(k)}]}^{- 1} v (λ)}$ and Eq. 19 can be rewritten as,

π (λ) = \frac{{‖ g^{(k)} ‖}^{2}}{{[α^{(k)} + λ]}^{2}} + \frac{{[v (λ)]}^{T} {[W^{(k)}]}^{- 1} v (λ) - {[u^{(k)}]}^{T} v (λ)}{{[α^{(k)} + λ]}^{3}} . (20)

We first try $λ = λ_{\min} = 0$ . If $π (0) \leq Δ^{2}$ holds, we accept $λ^{*} = 0$ and $s^{*} = s (0)$ as the solution of the TRS defined in Eq. 8. Otherwise, the solution is on the boundary defined in Eq. 10, which can be solved by the Newton-Raphson (NR) method iteratively.

Let $ϕ (λ) = \frac{1}{\sqrt{π (λ)}} - \frac{1}{Δ}$ and we have $ϕ^{'} (λ) = - \frac{1}{2} \frac{π^{'} (λ)}{{[π (λ)]}^{3 / 2}}$ where $π^{'} (λ)$ is computed by,

π^{'} (λ) = \frac{{‖ g^{(k)} ‖}^{2}}{{[α^{(k)} + λ]}^{3}} - \frac{3 π (λ)}{α^{(k)} + λ} + \frac{2 {[v (λ)]}^{T} {[W^{(k)}]}^{- 1} w (λ) - {[u^{(k)}]}^{T} w (λ)}{{[α^{(k)} + λ]}^{3}} . (21)

In Eq. 21, $w (λ) = v^{'} (λ)$ is solved from,

{\frac{1}{α^{(k)} + λ} P^{(k)} - {[W^{(k)}]}^{- 1}} w (λ) = \frac{1}{{[α^{(k)} + λ]}^{2}} P^{(k)} v (λ) . (22)

Because ${\frac{1}{α^{(k)} + λ} P^{(k)} - {[W^{(k)}]}^{- 1}}$ is an $m \times m$ symmetric and positive definite matrix, it only requires. $\frac{m^{3}}{3} + 6 m^{2}$ flops to solve $v (λ)$ and $w (λ$ ) from Eq. 17 and Eq. 22 using the Cholesky factorization.

Given the $n \times l_{k}$ variable shift matrix $S^{(k)}$ and gradient change matrix $Z^{(k)}$ , we may directly compute the following three $l_{k} \times l_{k}$ matrices, $A^{(k)} = {[S^{(k)}]}^{T} S^{(k)}$ , $B^{(k)} = {[S^{(k)}]}^{T} Z^{(k)}$ and $C^{(k)} = {[Z^{(k)}]}^{T} Z^{(k)}$ , and then form the $m \times m$ matrix $P^{(k)} = {[V^{(k)}]}^{T} V^{(k)}$ as,

P^{(k)} = {\begin{matrix} A^{(k)} B^{(k)} \\ {[B^{(k)}]}^{T} C^{(k)} \end{matrix}} (23)

Directly computing the three $l_{k} \times l_{k}$ matrices, $A^{(k)}$ , $B^{(k)}$ and $C^{(k)}$ , requires $6 l_{k}^{2} n = 1.5 m^{2} n$ flops. We also need to compute the vector $u^{(k)} = {[V^{(k)}]}^{T} g^{(k)}$ ( $2 mn$ flops) and ${‖ g^{(k)} ‖}^{2}$ ( $2 n$ flops). To further reduce the computational cost, the three matrices $A^{(k)}$ , $B^{(k)}$ and $C^{(k)}$ and the vector $u^{(k)}$ should be updated iteratively instead.

Updating Matrices and Vectors Used for Solving the L-BFGS TRS

We first initialize $k = 0$ and $H^{(0)} = I_{n}$ . If the trial search point $x_{T}^{(0)} = x^{(0)} + s^{* (0)}$ does not improve the objective function, i.e., $f (x_{T}^{(0)} (\geq f (x^{(0)})$ , we set $x^{(k + 1)} = x^{(0)}$ and $H^{(k + 1)} = H^{(0)}$ , recompute the sensitivity matrix $J^{(k + 1)} = J (x^{(0)})$ and the gradient $g^{(k + 1)} = g (x^{(0)})$ when needed (e.g., using updated training data points), and shrink the trust region size $Δ^{(k + 1)} = γ_{d} Δ^{(k)}$ with $0 < γ_{d} < 1$ . We repeatedly generate trial search point $x_{T}^{(k)} = x^{(k)} + s^{* (k)}$ until $x_{T}^{(k)}$ improves the objective function, i.e., $f (x_{T}^{(k)}) < f (x^{(0)})$ .

If $f (x_{T}^{(k)}) < f (x^{(0)})$ , we set $l_{k} = 1$ , $x^{(k + 1)} = x_{T}^{(k)}$ , evaluate $g^{(k + 1)} = g (x^{(k + 1)})$ , and compute $s = x^{(k + 1)} - x^{(k)}$ and $z = g^{(k + 1)} - g^{(k)}$ . In the following iterations, we update $l_{k} \geq 1$ and associated matrices and vectors used for solving the L-BFGS TRS for different cases accordingly. We use $Sfla g^{(k)}$ to indicate whether the new search point $x_{T}^{(k)}$ improves the objective function ( $Sfla g^{(k)} = " True "$ ) or not ( $Sfla g^{(k)} = " False "$ ).

The L-BFGS Hessian updating formulation of Eq. 4 requires $ε^{(k)} = z^{T} s > c_{3} ‖ z ‖ ‖ s ‖$ and $α^{(k)} = \frac{z^{T} z}{z^{T} s} > α_{cr} > 0$ where $c_{3}$ and $α_{cr}$ are small positive numbers. If $ε^{(k)} \leq c_{3} ‖ z ‖ ‖ s ‖$ or $α^{(k)} \leq α_{cr}$ , we simply set $l_{k} = l_{k - 1}$ , $s^{(l_{k})} = s^{(l_{k - 1})}$ , $z^{(l_{k})} = z^{(l_{k - 1})}$ , $α^{(k)} = α^{(k - 1)}$ , $S^{(k)} = S^{(k - 1)}$ , $Z^{(k)} = Z^{(k - 1)}$ , $A^{(k)} = A^{(k - 1)}$ , $B^{(k)} = B^{(k - 1)}$ , $C^{(k)} = C^{(k - 1)}$ , and $P^{(k)} = P^{(k - 1)}$ , with no updating. However, we need to compute $u^{(k)} = {[S^{(k)}, Z^{(k)}]}^{T} g^{(k + 1)}$ using the new gradient $g^{(k + 1)}$ . In the following cases, we assume that both conditions $ε^{(k)} > c_{3} ‖ z ‖ ‖ s ‖$ and $α^{(k)} > α_{cr}$ are satisfied.

Case-1: Sflag^(k) = “False”

Although the best solution $x^{(k + 1)}$ does not change when Sflag^(k) = “False”, the sensitivity matrix $J^{(k + 1)}$ and thus the gradient of the objective function $g^{(k + 1)}$ evaluated at $x^{(k + 1)}$ using simulation results and training data points updated in the $k + 1$ (current) iteration may be different from those evaluated at the same point but using simulation results and training data points obtained in the $k$ (previous) iteration. To use the right gradient, we should replace the last column in $Z^{(k)}$ with $z$ .

Let ${\tilde{Z}}^{(k - 1)}$ be the submatrix of $Z^{(k - 1)}$ by removing its last column ${\tilde{z}}^{(l_{k - 1})}$ , ${\tilde{B}}^{(k - 1)} = {[S^{(k - 1)}]}^{T} {\tilde{Z}}^{(k - 1)}$ the submatrix of $B^{(k - 1)} = {[S^{(k - 1)}]}^{T} Z^{(k - 1)}$ by removing its last column, and ${\tilde{C}}^{(k - 1)} = {[{\tilde{Z}}^{(k - 1)}]}^{T} {\tilde{Z}}^{(k - 1)}$ the submatrix of $C^{(k - 1)} = {[Z^{(k - 1)}]}^{T} Z^{(k - 1)}$ by removing its last row and last column. We first compute the following two vectors $p_{2}^{(k)} = {[S^{(k - 1)}]}^{T} z$ and ${\tilde{p}}_{4}^{(k)} = {[{\tilde{Z}}^{(k)}]}^{T} z$ and two scalars $μ^{(k)} = z^{T} z$ and $τ^{(k)} = z^{T} g^{(k + 1)} - {[{\tilde{z}}^{(l_{k - 1})}]}^{T} g^{(k)}$ , with computational cost $2 mn$ flops.

We set $l_{k} = l_{k - 1}$ , $s^{(l_{k})} = s^{(l_{k - 1})}$ , $z^{(l_{k})} = z$ , $S^{(k)} = S^{(k - 1)}$ , and $A^{(k)} = A^{(k - 1)}$ , and Update $Z^{(k)}$ as,

Z^{(k)} = [{\tilde{Z}}^{(k - 1)} z] . (24)

It is straightforward to derive the following equations to update $B^{(k)}$ and $C^{(k)}$ accordingly,

B^{(k)} = {[S^{(k - 1)}]}^{T} [{\tilde{Z}}^{(k - 1)} z] = [{\tilde{B}}^{(k - 1)} p_{2}^{(k)}], (25)

C^{(k)} = {\begin{matrix} {[{\tilde{Z}}^{(k - 1)}]}^{T} \\ z^{T} \end{matrix}} [{\tilde{Z}}^{(k - 1)} z] = {\begin{matrix} {\tilde{C}}^{(k - 1)} {\tilde{p}}_{4}^{(k)} \\ {[{\tilde{p}}_{4}^{(k)}]}^{T} μ^{(k)} \end{matrix}} . (26)

By definition,

u^{(k - 1)} = {[S^{(k - 1)}, Z^{(k - 1)}]}^{T} g^{(k)} = {[S^{(k - 1)}, {\tilde{Z}}^{(k - 1)}, {\tilde{z}}^{(l_{k - 1})}]}^{T} g^{(k)}

u^{(k)} = [S^{(k)}, Z^{(k)}] g^{(k + 1)} = {[S^{(k - 1)}, {\tilde{Z}}^{(k - 1)}, z]}^{T} g^{(k + 1)}

Thus, we have,

u^{(k)} = u^{(k - 1)} + {\begin{matrix} {[S^{(k - 1)}, {\tilde{Z}}^{(k)}]}^{T} z \\ z^{T} g^{(k + 1)} - {[{\tilde{z}}^{(l_{k - 1})}]}^{T} g^{(k)} \end{matrix}} = u^{(k - 1)} + [\begin{matrix} p_{2}^{(k)} \\ {\tilde{p}}_{4}^{(k)} \\ τ^{(k)} \end{matrix}] . (27)

Case-2: Sflag^(k) = “True” and l_k−1 < L_M

We set $l_{k} = l_{k - 1} + 1$ and $s^{(l_{k})} = s$ and $z^{(l_{k})} = z$ , compute four $l_{k - 1}$ -dimensional vectors $p_{1}^{(k)} = {[S^{(k - 1)}]}^{T} s$ and $p_{2}^{(k)} = {[S^{(k - 1)}]}^{T} z$ and ${[p_{3}^{(k)}]}^{T} = s^{T} Z^{(k - 1)}$ and $p_{4}^{(k)} = {[Z^{(k - 1)}]}^{T} z$ and five scalars, $β^{(k)} = s^{T} s$ and $ε^{(k)} = s^{T} z$ and $μ^{(k)} = z^{T} z$ and $γ_{s}^{(k)} = s^{T} g^{(k + 1)}$ and $γ_{z}^{(k)} = z^{T} g^{(k + 1)}$ , with computational cost $4 mn$ flops. Both $S^{(k)}$ and $Z^{(k)}$ are updated by,

S^{(k)} = [S^{(k - 1)} s] (28)

Z^{(k)} = [Z^{(k - 1)} z] (29)

It is straightforward to derive the following equations to update A^(k), B^(k), and C^(k),

A^{(k)} = {[S^{(k - 1)} s]}^{T} [S^{(k - 1)} s] = {\begin{matrix} A^{(k - 1)} p_{1}^{(k)} \\ {[p_{1}^{(k)}]}^{T} β^{(k)} \end{matrix}} (30)

B^{(k)} = {[S^{(k - 1)} s]}^{T} [Z^{(k - 1)} z] = {\begin{matrix} B^{(k - 1)} p_{2}^{(k)} \\ {[p_{3}^{(k)}]}^{T} ε^{(k)} \end{matrix}} (31)

C^{(k)} = {[Z^{(k - 1)} Z]}^{T} [Z^{(k - 1)} z] = {\begin{matrix} C^{(k - 1)} p_{4}^{(k)} \\ {[p_{4}^{(k)}]}^{T} μ^{(k)} \end{matrix}} (32)

We first split the $2 l_{k - 1}$ -dimensional vector $u^{(k - 1)}$ into two $l_{k - 1}$ -dimensional sub-vectors $u_{S}^{(k - 1)}$ and $u_{Z}^{(k - 1)}$ , i.e.,

u^{(k - 1)} = {[S^{(k - 1)}, Z^{(k - 1)}]}^{T} g^{(k)} = [\begin{matrix} u_{S}^{(k - 1)} \\ u_{Z}^{(k - 1)} \end{matrix}] .

where $u_{S}^{(k - 1)} = {[S^{(k - 1)}]}^{T} g^{(k)}$ is composed of the first $l_{k - 1}$ rows in $u^{(k - 1)}$ and $u_{Z}^{(k - 1)} = {[Z^{(k - 1)}]}^{T} g^{(k)}$ is composed of the last $l_{k - 1}$ rows in $u^{(k - 1)}$ .

Because $u^{(k)} = [S^{(k)}, Z^{(k)}] g^{(k + 1)} = {[S^{(k - 1)}, s Z^{(k - 1)}, z]}^{T} g^{(k + 1)}$ and $g^{(k + 1)} = g^{(k)} + z$ , thus we have,

u^{(k)} = [\begin{matrix} u_{S}^{(k - 1)} + p_{2}^{(k)} \\ γ_{s}^{(k)} \\ u_{Z}^{(k - 1)} + p_{4}^{(k)} \\ γ_{z}^{(k)} \end{matrix}] . (33)

Case-3: Sflag^(k) = “True” and l_k−1 = L_M

We set $l_{k} = l_{k - 1} = L_{M}$ . Let ${\hat{S}}^{(k - 1)}$ and ${\hat{Z}}^{(k - 1)}$ denote the submatrices of $S^{(k - 1)}$ and $Z^{(k - 1)}$ by removing the first column $s^{(1)}$ and $z^{(1)}$ from $S^{(k - 1)}$ and $Z^{(k - 1)}$ , respectively. Let ${\hat{A}}^{(k - 1)}$ and ${\hat{B}}^{(k - 1)}$ and ${\hat{C}}^{(k - 1)}$ denote the submatrices of A^(k), B^(k), and C^(k) by removing their first row and first column.

$We$ apply Eq. 28 through Eq. 32 to update S^(k), Z^(k), A^(k), B^(k), and C^(k) by simply replacing S^(k−1), Z^(k−1), A^(k−1), B^(k−1), and C^(k−1) with ${\hat{S}}^{(k - 1)}$ and ${\hat{Z}}^{(k - 1)}$ and ${\hat{A}}^{(k - 1)}$ and ${\hat{B}}^{(k - 1)}$ and ${\hat{C}}^{(k - 1)}$ , respectively. We can also apply Eq. 33 to update $u^{(k)}$ by replacing $u_{S}^{(k - 1)}$ with ${\hat{u}}_{S}^{(k - 1)} = {[{\hat{S}}^{(k - 1)}]}^{T} g^{(k)}$ that is composed of the second row through the $l_{k - 1}$ -th rows in $u^{(k - 1)}$ and replacing $u_{Z}^{(k - 1)}$ with ${\hat{u}}_{Z}^{(k - 1)} = {[{\hat{Z}}^{(k - 1)}]}^{T} g^{(k)}$ that is composed of the last $l_{k - 1} - 1$ rows in $u^{(k - 1)}$ .

The Algorithm to Update Matrices and Vectors Used for Solving the L-BFGS TRS

Given L_M ≥ 1, k > 1, l_k−1 ≥ 1, the thresholds c₃ > 0 and a_cr > 0, $n$ -dimensional vectors g^(k+1), s and z, l_k−1-dimensional vector u^(k−1), n × l_k−1 matrices S^(k−1) and Z^(k−1), l_k−1 × l_k−1 matrices A^(k−1), B^(k−1) and C^(k−1), 2l_k−1 × 2l_k−1 matrices P^(k−1) and $W_{I}^{(k - 1)} = {[W^{(k - 1)}]}^{- 1}$ , we can update $l_{k}$ and the $l_{k}$ -dimensional vector u^(k), n × l_k matrices S^(k) and Z^(k), l_k × l_k matrices A^(k), B^(k), and C^(k), and 2l_k × 2l_k matrices $P^{(k)}$ and $W_{I}^{(k)} = {[W^{(k)}]}^{- 1}$ , using the algorithm as summarized in Algorithm-4.

Algorithm-4: Updating Matrices and Vectors Used for Solving the L-BFGS TRS

1. Compute $ε^{(k)} = z^{T} s$ , $μ^{(k)} = z^{T} z$ , $β^{(k)} = s^{T} s$ ;

2. Compute the scaling factor $α^{(k)} = μ^{(k)} / ε^{(k)}$ if $ε^{(k)} > c_{3} \sqrt{β^{(k)} μ^{(k)}}$ , or set $α^{(k)} = 0$ otherwise;

3. If $α^{(k)} \leq α_{cr}$ or $ε^{(k)} \leq c_{3} \sqrt{β^{(k)} μ^{(k)}}$ , then

a. Set $α^{(k)} = α^{(k - 1)}$ and $l_{k} = l_{k - 1}$ ;

b. Update $S^{(k)} = S^{(k - 1)}$ and $Z^{(k)} = Z^{(k - 1)}$ ;

c. Update $A^{(k)} = A^{(k - 1)}$ , $B^{(k)} = B^{(k - 1)}$ , and $C^{(k)} = C^{(k - 1)}$ ;

d. Update $P^{(k)} = P^{(k - 1)}$ and $W I^{(k)} = W I^{(k - 1)}$ ;

e. Compute $u^{(k)} = {[S^{(k)}, Z^{(k)}]}^{T} g^{(k + 1)}$ ;

f. Goto step 10.

4. Else if $Sfla g^{(k)} = " False "$ (for Case-1), then

a. Set $l_{k} = l_{k - 1}$ ;

b. Update $S^{(k)} = S^{(k - 1)}$ ;

c. Set ${\tilde{z}}^{(l_{k - 1})}$ to the last column of $Z^{(k - 1)}$ ;

d. Set ${\tilde{Z}}^{(k)}$ by removing the last column ${\tilde{z}}^{(l_{k - 1})}$ from $Z^{(k - 1)}$ ;

e. Update $Z^{(k)} = [{\tilde{Z}}^{(k)} z]$ ;

f. Compute $τ^{(k)} = z^{T} g^{(k + 1)} - {[{\tilde{z}}^{(l_{k - 1})}]}^{T} g^{(k)}$ ;

g. Compute $p_{2}^{(k)} = {[S^{(k - 1)}]}^{T} z$ and ${\tilde{p}}_{4}^{(k)} = {[{\tilde{Z}}^{(k)}]}^{T} z$ ;

h. Compute $u^{(k)}$ usingEq. 27;

i. Update $A^{(k)} = A^{(k - 1)}$ ;

j. Set ${\tilde{B}}^{(k - 1)}$ by removing the first column from $B^{(k - 1)}$ ;

k. Set ${\tilde{C}}^{(k - 1)}$ by removing the first row and the first column from $C^{(k - 1)}$ ;

l. Update $B^{(k)}$ and $C^{(k)}$ usingEq. 25andEq. 26;

5. Else if $l_{k - 1} < L_{M}$ (for Case-2), then

a. Set $l_{k} = l_{k - 1} + 1$ ;

b. Compute $p_{1}^{(k)} = {[S^{(k - 1)}]}^{T} s$ , $p_{2}^{(k)} = {[S^{(k - 1)}]}^{T} z$ , $p_{3}^{(k)} = {[Z^{(k - 1)}]}^{T} s$ , and $p_{4}^{(k)} = {[Z^{(k - 1)}]}^{T} z$ ;

c. Compute $γ_{s} = s^{T} g^{(k + 1)}$ and $γ_{z} = z^{T} g^{(k + 1)}$ ;

d. Update $S^{(k)}$ , $Z^{(k)}$ , $A^{(k)}$ , $B^{(k)}$ , and $C^{(k)}$ usingEq. 28throughEq. 32, respectively;

e. Set $u_{S}^{(k - 1)}$ to the first $l_{k - 1}$ rows in $u^{(k - 1)}$ and $u_{Z}^{(k - 1)}$ to the last $l_{k - 1}$ rows in $u^{(k - 1)}$ ;

f. Update $u^{(k)}$ usingEq. 31.

6. Else (for Case-3)

a. Set $l_{k} = L_{M}$ ;

b. Remove the first column from $S^{(k - 1)}$ and $Z^{(k - 1)}$ ;

c. Remove the first row and first column from $A^{(k - 1)}$ , $B^{(k - 1)}$ , and $C^{(k - 1)}$ ;

d. Compute $p_{1}^{(k)} = {[S^{(k - 1)}]}^{T} s$ , $p_{2}^{(k)} = {[S^{(k - 1)}]}^{T} z$ , $p_{3}^{(k)} = {[Z^{(k - 1)}]}^{T} s$ , and $p_{4}^{(k)} = {[Z^{(k - 1)}]}^{T} z$ ;

e. Compute $γ_{s} = s^{T} g^{(k + 1)}$ and $γ_{z} = z^{T} g^{(k + 1)}$ ;

f. Update $S^{(k)}$ , $Z^{(k)}$ , $A^{(k)}$ , $B^{(k)}$ , and $C^{(k)}$ usingEq. 28 throughEq. 32present, respectively;

g. Set $u_{S}^{(k - 1)}$ to the second row through the $l_{k - 1}$ -th rows in $u^{(k - 1)}$ ;

h. Set $u_{Z}^{(k - 1)}$ to the last $l_{k - 1} - 1$ rows in $u^{(k - 1)}$ ;

i. Update $u^{(k)}$ usingEq. 33.

7. Form the matrix $P^{(k)}$ usingEq. 23;

8. Decompose $B^{(k)} = L^{(k)} + E^{(k)} + U^{(k)}$ ;

9. Form the matrix $W_{I}^{(k)}$ usingEq. 5;

10. End

The computational cost to update matrices and vectors used for solving the L-BFGS TRS as described in Algorithm-4 is at most $c_{4} = 2 n + 4 mn$ (taking Case-2 in step 5 as an example), including the following 4 operations: (1) computing $ε^{(k)} = z^{T} s$ , $μ^{(k)} = z^{T} z$ , and $β^{(k)} = s^{T} s$ in step 1 ( $6 n$ flops); (2) computing $p_{1}^{(k)} = {[S^{(k - 1)}]}^{T} s$ , $p_{2}^{(k)} = {[S^{(k - 1)}]}^{T} z$ , $p_{3}^{(k)} = {[Z^{(k - 1)}]}^{T} s$ , and $p_{4}^{(k)} = {[Z^{(k - 1)}]}^{T} z$ in step 5(b) ( $8 l_{k - 1} n = 4 mn - 8 n$ flops),; (3) computing $γ_{s} = s^{T} g^{(k + 1)}$ and $γ_{z} = z^{T} g^{(k + 1)}$ in step 5(c) ( $4 n$ flops).

In this paper, we apply the same idea of reducing the dimension from $n$ to $m$ using the matrix inversion lemma as presented in the three papers [32, 36, 37]. However, the implementation is quite different. For the DGN optimization method, the $n \times m$ matrix $V^{(k)}$ (or the transpose of the sensitivity matrix) in Eq. 15 is directly computed. Its dimensions (both the number of variables $n$ and the number of observed data $m$ ) are fixed, i.e., they remain the same for all iterations. Therefore, we have to directly compute the $m \times m$ matrix $P^{(k)} = {[V^{(k)}]}^{T} V^{(k)}$ and the $m$ -dimensional vector $u^{(k)} = {[V^{(k)}]}^{T} g^{(k)}$ , with the computational cost $2 m^{2} n + 2 mn$ flops. Using the algorithm as summarized in Algorithm-4, the computational cost can be further reduced to $2 n + 4 mn$ , by a factor of $l_{k} = m / 2$ , 5 to 20 roughly.

The Algorithm to Solve the L-BFGS TRS Using the Matrix Inversion Lemma

Given the trust region size $Δ = Δ^{(k)} > 0$ , threshold of convergence $δ_{cr} > 0$ , maximum number of iterations allowed $N_{TRS, \max} > 0$ , the scaling factor $α > α_{cr} > 0$ , $m \geq 0$ , $n \times m$ matrix $V = [S^{(k)}, Z^{(k)}]$ , $m \times m$ matrices $P = P^{(k)}$ and $W_{I} = {[W^{(k)}]}^{- 1}$ , $m$ -dimensional vector $u = u^{(k)}$ and $n$ -dimensional gradient vector $g = g^{(k)}$ evaluated at the current best solution $x^{(k)}$ , both the Lagrange multiplier $λ^{*}$ and the trust region search step $s^{*} = s (λ^{*})$ can be solved from the TRS defined in Eq. 8 using the matrix inversion lemma (MIL) as summarized in Algorithm-5.

Algorithm-5: Solving the L-BFGS TRS Using the Matrix Inversion Lemma

1. Compute $‖ g ‖$ ;

2. Initialize $l = 0$ , $λ_{0} = 0$ ;

3. If $m = 0$ , then

a. Set $λ^{*} = 0$ and $s^{*} = - g / α$ if $‖ g ‖ \leq α Δ$ ;

b. Set $λ^{*} = \frac{‖ g ‖}{Δ} - α$ and $s^{*} = - g \frac{Δ}{‖ g ‖}$ if $‖ g ‖ > α Δ$ ;

c. Goto step 11.

4. Compute $D = \frac{1}{α + λ_{0}} P - W_{I}$ and Cholesky decomposition $D = L L^{T}$ ;

5. Solve $p^{*}$ from $Lp = u$ and $v^{*}$ from $L^{T} v = p^{*}$ ;

6. Compute $u^{T} v^{*}$ , $p = W_{I} v^{*}$ , $p^{T} v^{*}$ , and $π_{0} = π (λ_{0})$ usingEq. 20;

7. If $π_{0} \leq Δ^{2}$ , then accept $λ^{*} = λ_{0}$ and goto step 10;

8. Set $δ = | 1 - \frac{\sqrt{π_{0}}}{Δ} |$ ;

9. Repeat steps (a) through (j) below, until convergence ( $δ \leq δ_{cr}$ or $l > N_{TRS, \max}$ ):

a. Compute $q = \frac{1}{{(α + λ_{l})}^{2}} P v^{*}$ ;

b. Solve $p^{*}$ from $Lp = q$ and $w^{*}$ from $L^{T} w = p^{*}$ ;

c. Compute $u^{T} w^{*}$ , $p = W_{I} w^{*}$ , $p^{T} v^{*}$ , and $π_{l}^{'} = π^{'} (λ_{l})$ usingEq. 21;

d. Compute $ϕ_{l} = \frac{1}{\sqrt{π_{l}}} - \frac{1}{Δ}$ and $ϕ_{l}^{'} = - \frac{1}{2} \frac{π_{l}^{'}}{{(π_{l})}^{3 / 2}}$ ;

e. Update $λ_{l + 1} = λ_{l} - \frac{ϕ_{l}}{ϕ_{l}^{'}}$ ;

f. Set $l = l + 1$ ;

g. Compute $D = \frac{1}{α + λ_{l}} P - W_{I}$ and Cholesky decomposition $D = L L^{T}$ ;

h. Solve $p^{*}$ from $Lp = u$ and $v^{*}$ from $L^{T} v = p^{*}$ ;

i. Compute $u^{T} v^{*}$ , $p = W_{I} v^{*}$ , $p^{T} v^{*}$ , and $π_{l} = π (λ_{l})$ usingEq. 20;

j. Set $δ = | 1 - \frac{\sqrt{π_{l}}}{Δ} |$ .

10. Compute $p^{*} = V v^{*}$ and $s^{*} = - \frac{1}{α + λ^{*}} g + \frac{1}{{(α + λ^{*})}^{2}} p^{*}$ ;

11. End

Let $N_{MIL}$ denote the number of iterations required for the L-BFGS TRS solver to converge. The total computational cost (flops) to solve the L-BFGS TRS using Algorithm-5 is, $c_{5} = 4 n + 2 mn + (9 m + 12 m^{2} + \frac{m^{3}}{3}) (N_{MIL} + 1)$ , including the following eight operations: (1) computing $‖ g ‖$ in step 1 ( $2 n$ flops); (2) computing $D = \frac{1}{α + λ_{l}} P - W_{I}$ and Cholesky decomposition $D = L L^{T}$ in step 4 or step 9(g) ( $2 m^{2} + \frac{m^{3}}{3}$ flops); (3) solving $p^{*}$ from $L p = u$ , and $v^{*}$ from $L^{T} v = p^{*}$ in step 5 or step 9(h) ( $2 m^{2}$ flops); (4) computing $u^{T} v^{*}$ , $p = W_{I} v^{*}$ and $p^{T} v^{*}$ in step 6 or step 9(i) ( $4 m + 2 m^{2}$ flops); (5) computing $q = \frac{1}{{(α + λ_{l})}^{2}} P v^{*}$ in step 9(a) ( $m + 2 m^{2}$ flops); (6) Solving $p^{*}$ from $Lp = q$ and $w^{*}$ from $L^{T} w = p^{*}$ in step 9(b) ( $2 m^{2}$ flops); and (7) computing $u^{T} w^{*}$ , $p = W_{I} w^{*}$ , and $p^{T} v^{*}$ in step 9(c) ( $4 m + 2 m^{2}$ flops); and (8) computing $p^{*} = V z^{*}$ and $s^{*} = - \frac{1}{α + λ^{*}} g + \frac{1}{{(α + λ^{*})}^{2}} p^{*}$ in step 10 ( $2 mn + 2 n$ ).

The total computational cost (flops) used to solve the L-BFGS TRS using Algorithm-5 together with Algorithm-4 is,

c_{MIL} = c_{4} + c_{5} = 6 n + 6 mn + (9 m + 12 m^{2} + \frac{m^{3}}{3}) (N_{MIL} + 1) (34)

It is straightforward to compute the normalized computational cost, or the ratio of computational cost of the DNR method (Algorithm-2) or the DIQ method (Algorithm-3) together with the algorithm that directly updates Hessian (Algorithm-1) over the MIL TRS solver using the matrix inversion lemma (Algorithm-5) and the efficient matrix updating algorithm (Algorithm-4),

β = \frac{c_{DIQ}}{c_{MIL}} = \frac{2 m n^{2} + (7 + 3 m^{2}) n + (5 n + 2 n^{2} + \frac{n^{3}}{3}) (N_{DIQ} + 1) + O (m^{3})}{6 n + 6 mn + (9 m + 12 m^{2} + \frac{m^{3}}{3}) (N_{MIL} + 1)} (35)

Because $n ≫ m$ for large-scale problems, Eq. 35 can be further simplified as,

β = \frac{c_{DIQ}}{c_{MIL}} \approx \frac{N_{DIQ} + 1}{18} \frac{n^{2}}{m} (36)

For some problems the solution $s^{*} = s (0)$ is accepted with $N_{DIQ} = N_{MIL} = 0$ . For other problems, it takes roughly $N_{DIQ} \approx N_{MIL} = 5 \sim 15$ iterations for a TRS solver to converge. Figure 1A,B illustrate the plots of $β$ vs. $n^{2} / m$ for different $m$ by fixing $N_{DIQ} = N_{MIL} = 0$ and $N_{DIQ} = N_{MIL} = 10$ , respectively. As shown in both Figure 1A and Figure 1B, the MIL TRS solver may reduce the computational cost by a factor $β = 10 \sim 10^{5}$ for $\frac{n^{2}}{m} = 100 \sim 10^{6}$ .

FIGURE 1

FIGURE 1. Plots of the ratio of computational cost β vs. n²/m, computed using Eq. 35 by fixing $N_{DIQ} = N_{MIL} = 0$ in (A) and $N_{DIQ} = N_{MIL} = 10$ in (B).

Numerical Validation

We benchmarked the MIL TRS solver against the DNR (or DIQ) TRS solver on two well-known analytic optimization problems using analytical gradients: the Rosenbrock function and the Sphere function defined as follows,

f {(x)}_{R o s e n b r o c k} = \sum_{i = 1}^{n - 1} [100 {(x_{i + 1} - x_{i}^{2})}^{2} + {(1 - x_{i})}^{2}], (37)

f {(x)}_{S p h e r e} = \sum_{i = 1}^{n} x_{i}^{2} (38)

The Rosenbrock function defined in Eq. 37 has one local minimum located at $x_{1} = - 1$ and $x_{i} = 1$ for $i = 2,3, \dots n$ with the objective function being 4 (approximately), and one global minimum located at $x_{i} = 1$ for $i = 1,2, \dots n$ with the objective function being 0. Occasionally, a local-search optimizer may converge to the local minimum.

We implemented different algorithms described in this paper in our in-house C++ optimization library (OptLib). We employ “Armadillo” as our foundational template-based C++ library for linear algebra (http://arma.sourceforge.net/). We ran all test cases reported on Tables 2–5 on a virtual machine computer equipped with an Intel Xeon Platinum Processor at 2.60 MHz and 16.0 GB of RAM. We varied the number of parameters, $n$ as powers of two and fixed twice the maximum number of snapshots, $M = 2 L_{M} = 10$ . Tables 2–5 summarize the preliminary numerical results, including the value of the objective function, the gradient norm, the number of iterations, and the elapsed CPU time (s). The last column displays the resulting speedup, namely, the ratio of CPU times using the DNR (or DIQ) method over the MIL method. We prescribe the tolerance to stop the L-BFGS TRS solver for all the numerical experiments in this subsection as $δ_{cr} = 10^{- 4}$ and the maximum number of iterations as $N_{TRS, \max} = 16$ . We set the initial trust region size at ∆ = 0.5.

TABLE 2

TABLE 2. Performance comparison of DNR vs. MIL methods testing on Rosenbrock function.

TABLE 3

TABLE 3. Performance comparison of DIQ vs. MIL methods testing on Rosenbrock function.

TABLE 4

TABLE 4. Performance comparison of DNR vs. MIL methods testing on Sphere function.

TABLE 5

TABLE 5. Performance comparison of DIQ vs. MIL methods testing on Sphere function.

We noticed that the matrix $D$ may become ill-conditioned for a few cases. Indeed, we could observe that the first entries on the diagonal are nearly zero. The same situation occurs for different optimization steps where $m$ grows from 0, 2, 4, 6, up to ten. The Cholesky decomposition of such a matrix will fail, i.e., Armadillo throws an exception. We advise replacing the former with an LU-decomposition to handle the pivots, namely, entries on the diagonal of the lower triangular matrix, to be nearly zero. The rest of the algorithm remains the same.

Table 2 compares performance of the DNR method vs. the MIL methods for the Rosenbrock problem encompassing 8, 16, 32, and 48 parameters, respectively. Beyond those problem sizes, the DNR method could not converge to the solution. Indeed, the DNR method becomes relatively slow, i.e., renders very large lambda values that imply tiny shifts. Often, after 2,048 iterations, the objective function is still greater than one. It seems that the DNR method is not robust enough to tackle the Rosenbrock function with more than 48 parameters. It appears that the MIL method slightly outperforms the DNR method for problems with fewer parameters.

Table 3 compares performance of the DIQ method vs. the MIL method. These numerical results confirm that the MIL method outperforms the DIQ method. As expected, the speedup improves with the increase of the problem size n.

Table 4 shows similar results for the Sphere function problem. We could benchmark the DNR method against the MIL method with ranks up to 2,048 parameters. Again, the speedup increases with problem size. Finally, Table 5 benchmarks the DIQ method against the MIL method testing on the Sphere function. The speedup reduces when compared to Table 4 for the same ranks, but we still clearly see that the performance of the MIL solver exceeds its DIQ counterpart. Results shown in Table 4 and Table 5 also confirm that the DIQ method performs better than the DNR method. We believe that the DIQ TRS solver converges faster with smaller number of iterations than the DNR TRS solver (i.e., $N_{DIQ} < N_{DNR}$ ).

The computational costs (in terms of CPU time) summarized in Tables 2–5 are the overall costs for all iterations, including the cost of computing both value and gradient of the objective function in addition to the cost of solving the L-BFGS TRS. In contrast, the theoretical analysis results of computational cost for β in Eq. 35 only count for the cost of solving the L-BFGS TRS in just one iteration. Therefore, numerical results listed in Tables 2–5 are not quantitatively in agreement with the theoretically derived β.

Conclusion

We can draw the following conclusions based on theoretical analysis and numerical tests:

1) Using the compact representation of the L-BFGS formulation in Eq. 4, the updated Hessian $H^{(k + 1)}$ is composed of a diagonal matrix $λ I_{n}$ and a matrix $V^{(k)} W^{(k)} {[V^{(k)}]}^{T}$ with low rank when m < n.

2) Using the matrix inversion lemma, we are able to reduce the cost of solving the L-BFGS TRS by transforming the original equation with $n$ variables to a new equation with $m$ variables, by taking advantage of the low rank matrix updating feature.

3) Further reduction in computational cost is achieved by updating the vector u^(k) and matrices such as S^(k) and Z^(k), A^(k), B^(k), and C^(k), P^(k), and ${[W^{(k)}]}^{- 1}$ iteratively, which effectively avoids expensive operations such as matrix-matrix and matrix-vector multiplications.

4) Through seamless integration of matrix inversion lemma with the iterative matrix and vector updating approach, the newly proposed TRS solver for the limited-memory distributed BFGS optimization method performs much more efficiently than the DNR TRS solver.

5) The MIL TRS solver can obtain the right solution with acceptable accuracy, without increasing memory usage but reducing the computational cost significantly, as confirmed by numerical results on the Rosenbrock function and Sphere function.

Data Availability Statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Author Contributions

GG, HF, and JV contributed to conception and algorithmic analysis of the study. HF, TW, FS, and CB contributed to design and numerical validations. GG and FH wrote the first draft of the article. All authors contributed to article revision, read, and approved the submitted version.

Conflict of Interest

Authors GG, HF, and FS were employed by Shell Global Solutions (United States) Inc. Authors JV, TW and CB were employed by Shell Global Solutions International B.V.

References

1. Ai-Mudhafar, WJ, Rao, DN, and Srinivasan, S. Robust Optimization of Cyclic CO2 Flooding through the Gas-Assisted Gravity Drainage Process under Geological Uncertainties. J Pet Sci Eng (2018). 166:490–509. doi:10.1016/j.petrol.2018.03.044

Google Scholar

2. Alpak, FO, Jin, L, and Ramirez, BA. Robust Optimization of Well Placement in Geologically Complex Reservoirs. in: Paper SPE-175106-MS Presented at the SPE Annual Technical Conference and Exhibition; September28–30 2015; Houston, TX (2015).

Google Scholar

3. van Essen, GM, Zandvliet, MJ, Den Hof, PMJ, Bosgra, OH, Jansen, JD, et al. Robust Optimization of Oil Reservoir Flooding. in: Paper Presented at the IEEE International Conference on Control Applications; Oct 4-6 2006; Munich, Germany (2006). p. 4–6.

Google Scholar

4. Liu, Z, and Forouzanfar, F. Ensemble Clustering for Efficient Robust Optimization of Naturally Fractured Reservoirs. Comput Geosci (2018). 22:283–96. doi:10.1007/s10596-017-9689-1

CrossRef Full Text | Google Scholar

5. Liu, X, and Reynolds, AC. Augmented Lagrangian Method for Maximizing Expectation and Minimizing Risk for Optimal Well-Control Problems with Nonlinear Constraints. SPE J (2016). 21(5). doi:10.1007/s10596-017-9689-1

CrossRef Full Text | Google Scholar

6. Liu, X, and Reynolds, AC. Gradient-based Multi-Objective Optimization with Applications to Waterflooding Optimization. Comput Geosci (2016). 20:677–93. doi:10.1007/s10596-015-9523-6

CrossRef Full Text | Google Scholar

7. Oliver, DS, Reynolds, AC, and Liu, N. Inverse Theory for Petroleum Reservoir Characterization and History Matching. Cambridge, United Kingdom: Cambridge University Press (2008). doi:10.1017/cbo9780511535642

CrossRef Full Text

8. Tarantola, A. Inverse Problem Theory and Methods for Model Parameter Estimation. Philadelphia, PA: SIAM (2005). doi:10.1137/1.9780898717921

CrossRef Full Text

9. Oliver, DS. On Conditional Simulation to Inaccurate Data. Math Geol (1996). 28:811–7. doi:10.1007/bf02066348

CrossRef Full Text | Google Scholar

10. Jansen, JD. Adjoint-based Optimization of Multi-phase Flow through Porous Media - A Review. Comput Fluids (2011). 46(1):40–51. doi:10.1016/j.compfluid.2010.09.039

CrossRef Full Text | Google Scholar

11. Li, R, Reynolds, AC, and Oliver, DS. History Matching of Three-phase Flow Production Data. SPE J (2003). 8(4):328–40. doi:10.2118/87336-pa

CrossRef Full Text | Google Scholar

12. Sarma, P, Durlofsky, L, and Aziz, K. Implementation of Adjoint Solution for Optimal Control of Smart Wells. in: Paper SPE 92864 Presented at the SPE Reservoir Simulation Symposium Held in the Woodlands; Jan.-2 Feb; Woodlands, TX (2005). p. 31.

Google Scholar

13. Gao, G, and Reynolds, AC. An Improved Implementation of the LBFGS Algorithm for Automatic History Matching. SPE J (2006). 11(1):5–17. doi:10.2118/90058-pa

CrossRef Full Text | Google Scholar

14. Nocedal, J, and Wright, SJ. Numerical Optimization. New York City: Springer (1999).

15. Chen, C, et al. Assisted History Matching Using Three Derivative-free Optimization Algorithms. in: Paper SPE-154112-MS Presented at the SPE Europec/EAGE Annual Conference Held in Copenhagen; June 4–7 2012; Copenhagen, Denmark (2012). p. 4–7.

Google Scholar

16. Gao, G, Vink, JC, Chen, C, Alpak, FO, and Du, K. A Parallelized and Hybrid Data-Integration Algorithm for History Matching of Geologically Complex Reservoirs. SPE J (2016). 21(6):2155–74. doi:10.2118/175039-pa

CrossRef Full Text | Google Scholar

17. Gao, G, Vink, JC, Chen, C, El Khamra, Y, Tarrahi, M, et al. Distributed Gauss-Newton Optimization Method for History Matching Problems with Multiple Best Matches. Comput Geosciences (2017). 21(5-6):1325–42. doi:10.1007/s10596-017-9657-9

CrossRef Full Text | Google Scholar

18. Gao, G, Wang, Y, Vink, J, Wells, T, Saaf, F, et al. Distributed Quasi-Newton Derivative-free Optimization Method for Optimization Problems with Multiple Local Optima. in: 17th European Conference on the Mathematics of Oil Recovery ECMOR XVII; September 2020; Edinburgh, United Kingdom (2020). p. 14–7.

Google Scholar

19. Wild, SM. Derivative Free Optimization Algorithms for Computationally Expensive Functions. [PhD thesis]. Ithaca (new york): Cornell University (2009).

20. Powell, MJD. Least Frobenius Norm Updating of Quadratic Models that Satisfy Interpolation Conditions. Math Programming (2004). 100(1):183–215. doi:10.1007/s10107-003-0490-7

CrossRef Full Text | Google Scholar

21. Zhao, H, Li, G, Reynolds, AC, and Yao, J. Large-scale History Matching with Quadratic Interpolation Models. Comput Geosci (2012). 17:117–38. doi:10.1007/s10596-012-9320-4

CrossRef Full Text | Google Scholar

22. Audet, C, and Dennis, JE Mesh Adaptive Direct Search Algorithms for Constrained Optimization. SIAM J Optim (2006). 17:188–217. doi:10.1137/040603371

CrossRef Full Text | Google Scholar

23. Spall, JC. Introduction to Stochastic Search and Optimization. Hoboken, NJ: John Wiley & Sons (2003). doi:10.1002/0471722138

CrossRef Full Text

24. Chen, C, et al. EUR Assessment of Unconventional Assets Using Parallelized History Matching Workflow Together with RML Method. In: Paper Presented the 2016 Unconventional Resources Technology Conference; August 1–3 2016; San Antonio, TX(2016).

CrossRef Full Text | Google Scholar

25. Chen, C, Gao, G, Li, R, Cao, R, Chen, T, Vink, JC, et al. Global-Search Distributed-Gauss-Newton Optimization Method and its Integration with the Randomized-Maximum-Likelihood Method for Uncertainty Quantification of Reservoir Performance. SPE J (2018). 23(05):1496–517. doi:10.2118/182639-pa

CrossRef Full Text | Google Scholar

26. Chen, C, Gao, G, Ramirez, BA, Vink, JC, and Girardi, AM. Assisted History Matching of Channelized Models by Use of Pluri-Principal-Component Analysis. SPE J (2016). 21(05):1793–812. doi:10.2118/173192-pa

CrossRef Full Text | Google Scholar

27. Armacki, A, Jakovetic, D, Krejic, N, Krklec Jerinkic, N, et al. Distributed Trust-Region Method with First Order Models. in Paper Presented at the IEEE EUROCON-2019, 18th International Conference on Smart Technologies July 1–14 2019 Novi Sad, Serbia (2019). p. 1–4.

Google Scholar

28. Moré, JJ, and Sorensen, DC. Computing a Trust Region Step. SIAM J Sci Stat Comput (1983). 4(3):553–72. doi:10.1137/0904038

CrossRef Full Text | Google Scholar

29. Mohamed, AW. Solving Large-Scale Global Optimization Problems Using Enhanced Adaptive Differential Evolution Algorithm. Complex Intell Syst (2017). 3:205–31. doi:10.1007/s40747-017-0041-0

CrossRef Full Text | Google Scholar

30. Golub, GH, and Van Loan, CF. Matrix Computations. 4th ed. The Johns Hopkins University Press (2013).

31. Brust, JJ, Leyffer, S, and Petra, CG. Compact Representations of BFGS Matrices. Preprint ANL/MCS-P9279-0120. Lemont, IL: ARGONNE NATIONAL LABORATORY (2020).

32. Gao, G, Jiang, H, Vink, JC, van Hagen, PPH, Wells, TJ, et al. Performance Enhancement of Gauss-Newton Trust-Region Solver for Distributed Gauss-Newton Optimization Method. Comput Geosci (2020). 24(2):837–52. doi:10.1007/s10596-019-09830-x

CrossRef Full Text | Google Scholar

33. Burdakov, O, Gong, L, Zikrin, S, and Yuan, Y. On Efficiently Combining Limited-Memory and Trust-Region Techniques. Math Prog Comp (2017). 9:101–34. doi:10.1007/s12532-016-0109-7

CrossRef Full Text | Google Scholar

34. Burke, JV, Wiegmann, A, and Xu, L. Limited Memory BFGS Updating in a Trust-Region framework Tech. Rep. Seattle, WA: University of Washington (2008).

35. Byrd, RH, Nocedal, J, and Schnabel, RB. Representations of Quasi-Newton Matrices and Their Use in Limited Memory Methods. Math Programming (1994). 63:129–56. doi:10.1007/bf01582063

CrossRef Full Text | Google Scholar

36. Gao, G, Jiang, H, van Hagen, P, Vink, JC, and Wells, T. A Gauss-Newton Trust Region Solver for Large Scale History Matching Problems. SPE J (2017b). 22(6):1999–2011. doi:10.2118/182602-pa

CrossRef Full Text | Google Scholar

37. Gao, G, Saaf, F, Vink, J, Krymskaya, M, Wells, T, et al. Gauss-Newton Trust Region Search Optimization Method for Ill-Conditioned Least Squares Problem. in: ECMOR XVII 17th European Conference on the Mathematics of Oil Recovery; 14 September 2020; Edinburgh, UK (2020). p. 14–17.

Google Scholar

Keywords: limited-memory BFGS method, trust region search optimization method, trust region subproblem, matrix inversion lemma, low rank matrix update

Citation: Gao G, Florez H, Vink JC, Wells TJ, Saaf F and Blom CPA (2021) Performance Analysis of Trust Region Subproblem Solvers for Limited-Memory Distributed BFGS Optimization Method. Front. Appl. Math. Stat. 7:673412. doi: 10.3389/fams.2021.673412

Received: 27 February 2021; Accepted: 29 April 2021;
Published: 24 May 2021.

Edited by:

Olwijn Leeuwenburgh, Netherlands Organisation for Applied Scientific Research, Netherlands

Reviewed by:

Andreas Størksen Stordal, Norwegian Research Institute (NORCE), Norway
Sarfraz Ahmad, COMSATS University Islamabad, Pakistan

Copyright © 2021 Gao, Florez, Vink, Wells, Saaf and Blom. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Guohua Gao, Z3VvaHVhLmdhb0BzaGVsbC5jb20=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Performance Analysis of Trust Region Subproblem Solvers for Limited-Memory Distributed BFGS Optimization Method

Introduction

The Limited-Memory Hessian Updating Formulation

Compact Representation

The Algorithm to Directly Update the Hessian Using the L-BFGS Compact Representation

Solving the L-BFGS Trust Region Subproblem

The Trust Region Subproblem

Solving the L-BFGS TRS Directly Using the Newton-Raphson Method

The Inverse Quadratic Model Interpolation Method to Directly Solve the L-BFGS TRS

Using Matrix Inversion Lemma (MIL) to Solve the L-BFGS TRS

Updating Matrices and Vectors Used for Solving the L-BFGS TRS

Case-1: Sflag(k) = “False”

Case-2: Sflag(k) = “True” and lk−1 < LM

Case-3: Sflag(k) = “True” and lk−1 = LM

The Algorithm to Update Matrices and Vectors Used for Solving the L-BFGS TRS

The Algorithm to Solve the L-BFGS TRS Using the Matrix Inversion Lemma

Numerical Validation

Conclusion

Data Availability Statement

Author Contributions

Conflict of Interest

References

Case-1: Sflag^(k) = “False”

Case-2: Sflag^(k) = “True” and l_k−1 < L_M

Case-3: Sflag^(k) = “True” and l_k−1 = L_M