Regression and Classification With Spline-Based Separable Expansions

We introduce a supervised learning framework for target functions that are well approximated by a sum of (few) separable terms. The framework proposes to approximate each component function by a B-spline, resulting in an approximant where the underlying coefficient tensor of the tensor product expansion has a low-rank polyadic decomposition parametrization. By exploiting the multilinear structure, as well as the sparsity pattern of the compactly supported B-spline basis terms, we demonstrate how such an approximant is well-suited for regression and classification tasks by using the Gauss–Newton algorithm to train the parameters. Various numerical examples are provided analyzing the effectiveness of the approach.


EVALUATION OF THE GRADIENT AND GRAMIAN-VECTOR PRODUCTS
The gradient and Gramian-vector products associated with the objective functions in sections 3 and 4 can be evaluated efficiently by exploiting both the multilinear structure resulting from the low-rank approximation of the coefficient tensor, and the compact support property of the B-spline basis functions. In this appendix, we first develop vectorized expressions for auxiliary variables. Next, these expressions are used to compute the gradient and Gramian-vector products. Finally, we discuss the complexity of the algorithm.

Vectorized expressions for the gradient and Gramian
To facilitate further computations, we first construct a matrix A (d) , d = 1, . . . , D, which contains the set of M B-spline basis functions evaluated in x i , i = 1, . . . , I as its columns: in the regression case, or g (d) in the classification case, can then be expressed in vectorized form as: Similarly, the required Gramian-vector products, i.e., the products of the Gramian built using blocks G (d,d) r,r , and the vector built from z (d) r , can be expressed as By introducing the variables and we can simplify these expressions to where the parenthesis (:, r) denotes the r-th column of X (d) . For the quadratic objective function in section 3, we have that whereŷ In the case of the logarithmic objective function in section 4, we have the expressions η = α (σ α (ŷ) − y) , and ξ = α 2 σ α (ŷ) * (1 − σ α (ŷ)) . (S7)

Summary of computational steps
In each GN or generalized GN iteration, the gradient and it CG Gramian-vector products are required, in which it CG is the number of CG iterations to solve the linear systems described at the end of section 3.2 and 4.2. Let us define the shorthand notations: The steps for evaluating the gradient and Gramian-vector product can then be summarized as: Step 1. Precompute Q (d) ∈ R I×R as defined in (S1).
Algorithm 1 provides a pseudocode on how these formulas can be evaluated in practice.

Complexity
As the number of iterations of the (generalized) GN algorithm is highly variable and depends, among others, on the initialization, the per-iteration complexity is derived here. Every iteration, the gradient is computed once, the function value it TR times (usually once), and the Gramian-vector is computed it CG times for different vectors z In the derivation of the complexity, we take N = max d N d , which typically satisfies N ≤ 4. The number of spline basis function M d , which grows linearly with the number of knots, is determined by M = max d M d . The number of samples is I and the rank of the coefficient tensor is chosen to be R. In this paper, we propose to not use general basis functions, but we use B-splines instead. This has a computational advantage as the matrices A (d) and X (d) are now sparse, having approximately N nonzeros per sample. By exploiting the sparsity, the complexity can effectively be reduced to O (it CG DIN R) flop per GN iteration. Hence the dependence on the potentially large number of basis functions M is removed and replaced by the low order of the B-splines N .