An Ultraweak Variational Method for Parameterized Linear Differential-Algebraic Equations

We investigate an ultraweak variational formulation for (parameterized) linear differential-algebraic equations with respect to the time variable which yields an optimally stable system. This is used within a Petrov-Galerkin method to derive a certified detailed discretization which provides an approximate solution in an ultraweak setting as well as for model reduction with respect to time in the spirit of the Reduced Basis Method. A computable sharp error bound is derived. Numerical experiments are presented that show that this method yields a significant reduction and can be combined with well-known system theoretic methods such as Balanced Truncation to reduce the size of the DAE.


I
Differential-Algebraic Equations (DAEs) are widely used to model several processes in science, engineering, medicine and other fields.Theory and numerical approximation methods have intensively been studied in the literature, see e.g.[5,14,18,29], or [13,24], which are the first two books in a forum series on DAEs.Quite often, the dimension of DAEs modeling realistic problems is so large that an efficient numerical solution (in particular in realtime environments or within optimal control) is impossible.To address this issue, Model Order Reduction (MOR) techniques have been developed and successfully applied.There is a huge amount of literature, we just mention [3,4,8,20,25,26].
All methods described in those references address a reduction of the dimension of the system, whereas the temporal discretization is untouched.This paper starts at this point.We have been working on space-time variational formulations for (parameterized) partial differential equations (PPDEs) over the last decade.One particular issue has been the stability of the arising discretization which admits tight error-residual relations and thus builds the backbone for model reduction.It turns out that an ultraweak formulation is the right tool to achieve this goal.In [10], we have used this framework for deriving an optimally stable variational formulation of linear time-invariant systems (LTIs).In this paper, we extend the ultraweak framework to (parameterized) DAEs and show that this can be combined with system theoretic methods such as Balanced Truncation (BT, [25]) to derive a reduction in the system dimension and time discretization size.
In order to ensure well-posedness (in an appropriate manner), we shall always assume that the initial value 0 is consistent with the right-hand side , which means that there exists some ̂ 0 ∈ ℝ such that ̂ 0 − 0 = lim →0+ ( ) holds.Finally, we assume that the matrix pencil { , − } is regular (i.e.det( − ) ≠ 0 for some ∈ ℝ) with index ind{ , − } =∶ ∈ ℕ, [12]. 1   1.2.Parameterized DAEs (PDAEs).We are particularly interested in the situation, where one does not only have to solve the above DAE once, but several times and highly efficient (e.g. in realtime, optimal control or cold computing devices) for different data.In order to describe that situation, we are considering a parameterized DAE (PDAE) as follows.For some parameter vector ∈ , ⊂ ℝ being a compact set, we are seeking ∶ → ℝ such that where , and 0, are a parameter-dependent matrix, a right-hand side and an initial condition, respectively, whereas is assumed to be independent of , see below.In order to be able to solve such a PDAE highly efficient for many parameters, it is quite standard to assume that parameters and variables can be separated, see e.g.[17].This is done by assuming a so-called affine decomposition of the data, i.e., is (for simplicity of exposition) assumed to be parameter-independent and If such a decomposition is not given, we may produce an affinely decomposed approximation by means of the (Discrete) Empirical Interpolation Method ((D)EIM, [2,7]; see also [8] for a system theoretic MOR for such PDAEs).For well-posedness, we assume that the matrix pencil { , − } is regular with index ind{ , − } = for all ∈ .
1.3.Reduction to homogeneous initial conditions.Using some standard arguments, (1.1) can be reduced to homogeneous initial conditions (0) = 0. To this end, construct some smooth extension of the initial data 2), it is readily seen that the modified right-hand side ̂ also admits an affine decomposition.Hence, we can always restrict ourselves to the case of homogeneous initial conditions (0) = 0, keeping in mind that variable initial conditions can be realized by different right-hand sides.
1.4.Organization of the material.The remainder of this paper is organized as follows.In Section 2, we derive an ultraweak variational formulation of (1.1) and prove its well-posedness.Section 3 is devoted to a corresponding Petrov-Galerkin discretization and the numerical solution, which is then used in Section 4 to derive a certified reduced model.In Section 5, we report results of our numerical experiments and end with conclusions and an outlook in Section 6.

A
It is well-known that, for any fixed parameter ∈ , the problem (1.1) admits a unique classical solution ∈ ( ̄ ) for consistent initial conditions provided that ∈ −1 ( ̄ ) , e.g.[18,Lemma 2.8.].This is a severe regularity assumption, which is one of the reasons why we are interested in a variational formulation.As we shall see, an ultraweak setting is appropriate in order to prove well-posedness, in particular stability.It turns out that this setting is also particularly useful for model reduction of (1.1) w.r.t. the time variable in the spirit of the reduced basis method, see §4 below.
2.1.Ultraweak formulation of PPDEs.In order to describe an ultraweak variational formulation for the above PDAE, we will review such formulations for parametric partial differential equations (PPDEs).In particular, we are going to follow [9] in which well-posed (ultraweak) variational forms for transport problems have been introduced, see also [6,16,30].We will then transfer this framework to PDAEs in §2.2.
Let Ω ⊂ ℝ be some open and bounded domain.We consider a classical2 linear operator ;• on Ω with classical domain and aim at solving Note that the definition of ;• also incorporates essential homogeneous boundary conditions (in case of a PDAE described below this is the initial condition, which is independent of the parameter Proof.See [9, Proposition 3.1 and Corollary 3.2].

An ultraweak formulation of PDAEs.
We are now going to apply the framework of §2.1 to the classical form (1.1) of the PDAE.Again, w.l.o.g.we restrict ourselves to homogeneous initial conditions (0) = 0, as stated in §1.3.
Hence, we set ⦀⋅⦀ ∶= ‖ * ⋅‖ 2 and choose trial and test spaces as 3) and obtain the following result.

Lemma 2.4. Under the above assumptions, we have for all ∈ that
≡ , where Proof.Clearly 1 ( ) ⊂ 1 ( ) , so that ⊆ for all ∈ .Now, let ∈ = 1 ( ) , then, by density, there is a sequence ( ) ∈ℕ ⊂ 1 ( ) such that ‖ − ‖ 1 ( ) → 0 as → 0. Since is compact, we have that The latter result must be properly interpreted.It says that and coincide as sets.However, the norm ⦀⋅⦀ (and thus the topology) still depends on the parameter.The same holds true for the dual space of induced by the 2 -inner product and normed by In particular, we have a generalized Cauchy-Schwarz inequality ( , Proof.The existence of a unique solution ∈ (as well as = = * = 1) is an immediate consequence of Lemma 2.1.It only remains to show that satisfying (2.6) is a weak solution of (1.1).To this end, let ̃ ∈ ( ) be given such that there exists a classical solution ̃ ∈ 1 ( ̄ ) with ;• ̃ ( ) = ̃ ( ), ∀ ∈ and ̃ (0) = 0.Then, define ∈ by ( ) ∶= ( ̃ , ) 2 .We need to show that the classical solution ̃ of (1.1) is also the unique solution of (2.6).First, for ∈ 1 ( ) , integration by parts yields ( ̃ , so that (2.6) holds for ̃ .
For the ultraweak PDAE (2.6), we need a right-hand side ∈ .However, typically, the right-hand side is given within context of (1.1) as a function of time, i.e., ∶ → ℝ .Then, we simply define ∈ by

P -G
The next step towards a numerical method for solving an ultraweak operator equation is to introduce finite-dimensional trial and test spaces yielding a Petrov-Galerkin discretization.In this section, we shall first review Petrov-Galerkin methods in general terms and then detail the specification for PDAEs.
3.1.Petrov-Galerkin method.In order to determine a numerical approximation,we are going to construct an appropriate finite-dimensional trial space ⊂ = 2 ( ) and a parameter-independent test space ⊂ of finite (but possibly large) dimension ∈ ℕ.Then, we are seeking ∈ such that which leads to solving a linear system of equations = in ℝ .(b) If one defines the trial space according to ∶= * , then it is easily seen that the discrete problem (3.1) is well-posed and optimally conditioned, [6], i.e.,

∶= sup
(c) The Xu-Zikatanov lemma ( [31]) ensures that the Petrov-Galerkin error is comparable with the error of the best approximation, namely so that the Petrov-Galerkin approximation is the best approximation (i.e., an identity) for = = 1.
The Petrov-Galerkin framework induces a residual-based error estimation in a straightforward manner.To describe it, let us recall that the residual is defined for some ̃ ∈ 2 ( ) as Then, it is a standard estimate that and ∆ is a residual-based error estimator.Note that for = 1 we have a error- Piecewise linear temporal discretization (hat functions).

PDAE Petrov-Galerkin Discretization.
We are now going to specify the above general framework to PDAEs.This means that we need to introduce a suitable discretization in time.We fix a constant time step size ∆ ∶= ∕ (i.e., ∈ ℕ is the number of time intervals) and choose for simplicity equidistant nodes ∶= ∆ , = 0, ..., in .Denote by , = 0, ..., piecewise linear splines corresponding to the nodes −1 , and +1 , see Figure 1.For ∈ {0, }, the hat functions are restricted to the interval ̄ .For realizing a discretization of higher order, one could simply use splines of higher degree.
As in [6], we start by defining the test space and then construct inf-sup optimal trial spaces.To this end, let ∶= dim(ker ) and assume that we have a basis { 1 , … , } of ker at hand5 and form a matrix ∶= ( 1 , ..., ) ∈ ℝ × by arranging the vectors as columns of .Then, we construct ⊂ = 1 ( ) independent of the parameter and choose the trial space as ∶= * , which will then guarantee that = 1.
As already noted above, of course, one could use different discretizations (e.g. higher order or different discretizations for and the test functions) and we choose the described one just for simplicity.
The efficient numerical solution of this linear system requires a solver that takes the specific structure into account.For similar systems arising from space-time (ultraweak) variational formulations of heat, transport and wave equations, such specific efficient solvers have been introduced in [15,16].The structure of the system above is different and we will consider the development of efficient solvers in future research, see Section 6.
Then, we get so that the parameter dimension is = 1 + 2 = ( + 1) + 2 , which might be large.The right-hand side thus also admits an affine decomposition with = 1 + = ( + 1) + terms.However, this number might be an issue concerning efficiency if is large.Nevertheless, if ≪ , we still have ≪ .
Moreover, in the linear case, the matrix ≡ is independent of the parameter, which means (among other facts) that trial and test spaces are parameter-independent, as sets and also w.r.t.their topology.Note that this is the most common case for system theoretic MOR methods (like BT), which are often even restricted to this case, [20,25], with the exception [8].Our setting seems more flexible in this regard and fully linear DAEs are just a special case.

M : T R B M
The Reduced Basis Method (RBM) is a model order reduction technique which has originally been constructed for parameterized partial differential equations (PPDEs), see e.g.[3,17,22].In an offline training phase, a reduced basis of size ≪ is constructed (typically in a greedy manner, see Algorithm 1 below) from sufficiently detailed approximations for certain parameter samples (also called "truth" approximations or snapshots), which are computed e.g. by a Petrov-Galerkin method as described above.In particular, is assumed to be sufficiently large in order to ensure that is (at least numerically) indistinguishable from the exact state , which explains the name "truth".As long as an efficient solver for the detailed problem is available, we may assume that the snapshots can be computed in ( ) complexity.
Given some parameter value , the reduced approximation ( ) 6 is then computed by solving a reduced system of dimension .Thanks to the affine decomposition (1.2), several quantities for the reduced system can be precomputed and stored so that a reduced approximation is determined in ( 3 ) operations, independent of (which is called online efficient).Moreover, an a posteriori error estimator ∆ ( ) guarantees a 6 For all quantities of the reduced system, we write the parameter as an argument in order to clearly distinguish the detailed approximation from the reduced approximation ( ) for the same parameter.
We are going to use this framework for PDAEs of the form (1.1).Model reduction of (1.1) may be concerned (at least) with the following quantities • size of the system, • dimension of the temporal discretization, where we have in mind to solve (1.1) extremely fast for several values of the parameter .As mentioned earlier, the first issue has extensively been studied in the literature e.g. by system theoretic methods, in particular for fully linear DAEs (3.5).This can be done independently from the subsequent reduction w.r.t.(both for parameterized and non-parameterized versions), so that we even might assume that such Model Order Reduction (MOR) techniques have already been applied in a preprocessing step.We mention [8] for a system theoretic MOR for parameter-dependent DAEs.Here, we are going to consider the reduction w.r.t.time using the RBM based upon a variational formulation w.r.t. the time variable.
We restrict ourselves to the reduction of the fully linear case of (3.5) as it easily shows how the RBM-inspired model reduction can be combined with existing system theoretic approaches to reduce the size of the system (e.g. in a preprocessing step).In the fully linear case, the matrix ≡ and hence all operators and bilinear forms on the left-hand side are parameter-independent.This implies in addition that the ansatz space ≡ and the norm ⦀⋅⦀ ≡ ⦀⋅⦀ inducing the topology on the test space are parameter-independent as well, which of course simplifies the framework.However, parameter-dependent matrices can be treated similar to the RBM for ultraweak formulations of PPDEs as described e.g. in [6,16,30].However, we note that the RB approach also allows the treatment of more general PDAEs and is not restricted to fully linear systems (3.5), in particular w.r.t. the right-hand side.
The idea of the RBM can be described as follows: One determines sample values ∶= { (1) , ..., ( ) } ⊂ of the parameters in an offline training phase by a greedy procedure described in Algorithm 1 below.Then, for each ∈ , we determine a sufficiently detailed "snapshot" ∈ by the ultraweak Petrov-Galerkin discretization as in §3.2 and obtain a reduced space of dimension by setting ∶= span{ ∶ ∈ } =∶ span{ 1 , ..., } ⊂ .
We also need a reduced test space for the Petrov-Galerkin method.Recalling that the operator is parameter-independent here ( ≡ ) and also the trial space is independent of , we can easily identify the optimal test space.In fact, for each snapshot there exists a unique ∈ such that = * .Then, we define Then, given a new parameter value ∈ , one determines the reduced approximation ( ) ∈ by solving (recall that here ≡ ) ( ( ), ) = ( ) for all ∈ .
If ≪ = + , we can compute a reduced approximation with significantly less effort as compared to the Petrov-Galerkin (or a time-stepping) method.To determine the reduced approximation ( ), we have to solve a linear system of the form ( ) = ( ), where the stiffness matrix is given by [ ] , = ( , ), , = 1, ..., , recalling that the bilinear form is parameter-independent.Hence, ∈ ℝ × can be computed and stored in the offline phase.For the right-hand side, we use the affine decomposition (1.2) and get The quantities ( ̃ , ) 2 can be precomputed and stored in the offline phase, so that ( ) is computed online efficient in ( ) operations.Obtaining the coefficient vector ( ), the reduced approximation results in ( ) = ∑ =1 [ ( )] .Note that the matrix is typically densely populated so that the numerical solution requires in general ( 3 ) operations.
The announced greedy selection of the samples is based upon the residual error estimate (here identity) ∆ in (3.3) resp.(3.4) for the reduced system described as follows: In a similar manner as deriving ∆ in (3.3) we get a residual based error estimator for the reduced approximation since the bilinear form and the norm in are parameter-independent here.Hence, the inf-sup constant is parameter-independent as well, i.e., ≡ and it is unity by Remark 3.1, so that Its computation can be done in an online efficient manner in ( ) operations by determining Riesz representations in the offline phase, see [3,17,22].We use this error identity in the greedy method in Algorithm 1 below.

N
In this section, we report on results of some of our numerical experiments.Our main focus is on the numerical solution of the ultraweak form of the PDAE, the error estimation and the quantitative reduction.We solve the arising linear systems for the Petrov-Galerkin and the reduced system by M 's backslash operator, see also our remarks in Section 6 below.The codes for producing the subsequent results is available via https://github.com/mfeuerle/Ultraweak_PDAE.5.1.Serial RLC circuit.We start by a standard problem which (in some cases) admits a closed formula for the analytical solution.This allows us to monitor the exact error and a comparison with standard time-stepping methods.Our particular interest is the approximation property of the ultraweak approach, which is an 2 -approximation.
The serial RLC circuit consists of a resistor with resistance , an inductor with inductance , a capacitor with capacity and a voltage source fed by a voltage curve ∶ ̄ → ℝ. Kirchhoff's circuit and further laws from electrical engineering yield a DAE with the data The solution consists of the electric current and the voltages at the capacitor , at the inductor and at the resistor .
Convergence of the Petrov-Galerkin scheme.In Figure 2, we compare the exact solution with approximations generated by a standard time-stepping scheme (using M 's fully implicit variable order solver with adaptive step size control ode15i, [19]) and by our ultraweak formulation from §3.2.We choose two specific examples for , namely a smooth and a discontinuous one, smooth ( ) ∶= sin 4 , disc ( ) ∶= sign cos 4 .
For the smooth right-hand side (left graph in Figure 2), both ode15i and the ultraweak method give good results.Concerning the deviations for the ultraweak approach at the start and end time, we recall that the ultraweak form yields an approximation in 2 , so that pointwise comparisons are not necessarily meaningful.In the discontinuous case, existence of a classical solution cannot be guaranteed by the above arguments.In particular, there is no closed solution formula.As we see in the right graph in Figure 2, ode15i stops at the first jump.This is to be expected, since disc ∉ 0 ( ̄ ), so that the solution lacks sufficient regularity to guarantee convergence of a time-stepping scheme like ode15i (even though it is an adaptive variable order method).We could resolve the jumps even better by choosing more time steps , while ode15i still fails.We conclude that the ultraweak method also converges for problems lacking regularity.
Convergence rate.Next, we investigate the rate of convergence for the ultraweak form.
To that end, we use smooth , since the analytical solution * is known and we can thus compute the relative error ‖ * − ‖ 2 ∕‖ * ‖ 2 .Using the lowest order discretization as mentioned above (namely piecewise linear test functions , which yield discontinuous trial functions * ), we can only hope for first order (w.r.t. the number of time steps ), which we see in Figure 3 and was observed in all cases we considered.We obtain higher order convergence by choosing test functions of higher order, provided the solution has sufficient smoothness.Moreover, we compare the exact relative error with our error estimator (see §3.1). Figure 3 shows a perfect matching confirming the error-residual identity (3.4) also for the numerically computed error estimator.
In order to combine system theoretic model reduction with the Reduced Basis Method from §4, we use the system theoretic model order reduction package [21].In particular, we use Balanced Truncation (BT) from [27] during a preprocessing step to reduce the above system of dimension to a system as well as ̂ , ̂ ∶ ̄ → ℝ ̂ and ̂ ≪ .We note that the resulting reduced system typically provides regular matrices ̂ , ̂ .Then, the reduced system is a linear time-invariant system (LTI), which is an easier problem than a DAE and in fact a special case.Hence, our presented approach is still valid, even though designed for PDAEs.For an ultraweak formulation of LTI systems, we refer to [10].
Remark 5.1.We use the RBM here for deriving a certified reduced approximation of the state .If we would want to control the output along with a corresponding error estimator ∆ , it is fairly standard in the theory of RBM to use a primal-dual approach with a second (dual) reduced basis, e.g.[17,22].For simplicity of exposition, we leave this to future work and compute the output from the state by ̂ ̂ ( ), resp.
( ). ⋄ Discretization of the control within the RBM.Since we use a variational approach, we are in principle free to choose any discretization for the control (we only need to compute inner products with the test basis functions).We tested piecewise linear discretizations as described in §3.2 for different step sizes ∕ , where might be different from , which we choose for discretizing the state.Doing so, the parameter reads = ( ( 0 ), … , ( )) ∈ ≡ ℝ +1 , i.e., the parameter dimension is = + 1, which might be large.Large parameter dimensions are potentially an issue for the RBM since the curse of dimension occurs.Hence, we investigate if we can reduce within the RBM.
In order to answer this question, we apply Algorithm 1 to the time-dependent Stokes problem (5.1a) (without BT) setting = 0, max = from (1.2) (i.e., = + 1 for the fully linear system with parameter-independent initial value) and train consisting of 500 random vectors for ≡ ∈ {75, 150, 300}, i.e., = 48 524, 96 824, 193 424, where = 224.For these three cases, we investigate the max greedy training error, i.e., max ∈ train ∆ ( ).The results in Figure 4 show an exponential decay w.r.t. the dimension of the reduced system with slower decay as grows.This is to be expected as the discretized control space is much richer for growing and the reduced model has to be able to represent this variety.However, in relative terms (i.e., reduced size compared with full size ), we see that the compression rates are almost the same.This shows that the RBM can effectively reduce the system no matter how strong the influence of the control on the state is.It is expected that this potential is even more pronounced if a primal-dual RBM is used for the output.
Next, we note that for ≡ as (5.1a), the reduced model is always exact for ≥ , which explains the drop off of the curves in Figure 4.For fully linear DAEs, a reduced model with ≥ = + 2 is always exact.Hence, if ≪ (here = 1 ≪ 644 = ), we obtain an exact reduced model of dimension = = + = ( + 1) + 1 ≪ + =∶ .Even though this seems to be attractive for lowdimensional outputs, we stress the fact that the reduced dimension still depends on the temporal dimension , which might be large.Hence, a combination of a possibly small discretization of the control and a RBM seems necessary.Let us comment on the error decay of the RBM produced by the greedy method using the error estimator derived from the ultraweak formulation of the PDAE.We obtain exponential decay of the error, which in fact shows the potential of the RBM.The question if a given PDAE permits a fast decay of the greedy RBM error is wellknown to be linked to the decay of the Kolmogorov -width, [3,17,30], which is a property only of the problem at hand.In other words, if a PDAE can be reduced w.r.t.time, the greedy method will detect this.
The results in Figure 4 use = .The next question is how the error behaves for < .To this end, we determine the error in the state w.r.t. the full resolution, i.e., we compare the state derived from the control with degrees of freedom with the state of the fully resolved control.In Figure 5, we display errors for different values for .We obtain fast convergence, which again shows the significant potential for a reduced temporal discretization of the control.Combination with BT / RBM error decay.Next, we wish to investigate if a combination of a system theoretic MOR (here BT) and an RBM-like reduction w.r.t.time can be combined.To this end, we fix the temporal resolution (i.e., the number of time steps, here = = 300) and determine the RBM error using Algorithm 1 for the full and the BT-reduced system.We use [21] to compute the BT from [27] and obtain a LTI system of dimension ̂ = 5.
The results are shown in Figure 6, where we again show the maximal training error.As we see, the error for the BT-reduced system is smaller than the original one 9 , which in fact indicates that we can combine both methods.We get similar results for other choices of .This shows that there is as much "reduction potential" in the reduced system (5.2a) as in the original system (5.1a).In other words, a combination of BT and RBM shows significant compression potential.6.Maximal RBM relative error decay over the reduced dimension for the full system (5.1a)(blue) and for the reduced system (5.2a) with = 300 (red).

C
In this paper, we introduced a well-posed ultraweak formulation for DAEs and an optimally stable Petrov-Galerkin discretization, which admits a sharp error bound.The scheme shows the expected order of convergence depending on the regularity of the solution and the smoothness of the trial functions.The scheme also converges in lowregularity cases, where classical standard time-stepping schemes fail.Moreover, the stability of the Petrov-Galerkin scheme allows us to choose any temporal discretization without satisfying other stability criteria like a CFL condition.
Based upon the ultraweak framework, we introduced a model order reduction in terms of the Reduced Basis Method with an error/residual identity.We have obtained fast convergence and the possibility to combine the RBM for a reduction w.r.t.time with system theoretic methods such as Balanced Truncation to reduce the size of the system. 9This remains true even after additionally normalizing the training error by the dimensions of the DAE ( and ̂ , respectively) or the dimension of the resulting linear system ( + and ̂ , respectively).
There are several open issues for future research.We already mentioned a primaldual RBM for an efficient reduction of the output, the generalization to parameterdependent matrices and more general DAEs (not only fully linear).We also mentioned that the system matrix is a sum of Kronecker products of high dimension, which calls for specific solvers as in [16] for the (parameterized) wave equation.Another issue in that direction is the need for a basis of ker( ), which might be an issue for high-dimensional problems.R 1) with replaced by ̂ ∶= − ̇ + ̄ and homogeneous initial condition ̂ (0) = 0.Then, ∶= ̂ + ̄ solves the original problem (1.1).If the PDAE possess an affine decomposition (1.
Remark 3.1.(a) If one would choose a discretization with dim( ) > dim( ), one would need to solve a least squares problem ‖ − ‖ 2 → min.

F 2 .
Serial RLC circuit, exact voltage at the inductor; comparison of time-stepping (ode15i -blue) and ultraweak (red) approximation for smooth smooth (left, including analytical solution) and discontinuous disc (right) right-hand side.

F 5 .
Max error for control dimensions of size < .
training error max rel.err.max rel.err.with BT