Minimal Cycle Representatives in Persistent Homology Using Linear Programming: An Empirical Study With User’s Guide

Cycle representatives of persistent homology classes can be used to provide descriptions of topological features in data. However, the non-uniqueness of these representatives creates ambiguity and can lead to many different interpretations of the same set of classes. One approach to solving this problem is to optimize the choice of representative against some measure that is meaningful in the context of the data. In this work, we provide a study of the effectiveness and computational cost of several ℓ1 minimization optimization procedures for constructing homological cycle bases for persistent homology with rational coefficients in dimension one, including uniform-weighted and length-weighted edge-loss algorithms as well as uniform-weighted and area-weighted triangle-loss algorithms. We conduct these optimizations via standard linear programming methods, applying general-purpose solvers to optimize over column bases of simplicial boundary matrices. Our key findings are: 1) optimization is effective in reducing the size of cycle representatives, though the extent of the reduction varies according to the dimension and distribution of the underlying data, 2) the computational cost of optimizing a basis of cycle representatives exceeds the cost of computing such a basis, in most data sets we consider, 3) the choice of linear solvers matters a lot to the computation time of optimizing cycles, 4) the computation time of solving an integer program is not significantly longer than the computation time of solving a linear program for most of the cycle representatives, using the Gurobi linear solver, 5) strikingly, whether requiring integer solutions or not, we almost always obtain a solution with the same cost and almost all solutions found have entries in {‐1,0,1} and therefore, are also solutions to a restricted ℓ0 optimization problem, and 6) we obtain qualitatively different results for generators in Erdős-Rényi random clique complexes than in real-world and synthetic point cloud data.

The first step in Algorithms 1 and 2 is to compute a persistent homology cycle basis. The standard method to compute such a basis invokes a so-called R = DV decomposition of the boundary matrices ∂ n Cohen-Steiner et al. [2006]. Here we provide a brief review of this process; further details may be found in Cohen-Steiner et al. [2006]; de Silva et al. [2011].
To begin, we must place a total order on each set S n (K), in ascending order of birth. This naturally allows us to regard ∂ n as an element of G |S n−1 (K)|×|Sn(K)| . The low function on a matrix A ∈ G k×l is defined by We say that A is reduced if low is injective. An R = DV decomposition is a matrix equation where R is reduced and V is invertible and upper triangular.
Suppose that R n = ∂ n V n is such a decomposition for each n. Let low n be the low function of R n , and let Γ n = {(j, low(j)) : R[:, j] = 0} be the graph low n . It can then be shown (Cohen-Steiner et al. [2006];de Silva et al. [2011]) that 1. Each set S n (K) partitions into three disjoint subsets: B n B * n H n , where B n is the image of low n+1 and B * n is the domain of low n . 2. If B * n (t) denotes the set of simplices in B n born by time t, then ∂ n [:, B * n (t)] is a basis for the space of boundaries B n (K t ). 3. LetΓ n denote the subset of Γ n consisting of those pairs (σ, τ ) such that Birth(σ) = Birth(τ ), and let E n = {τ : (σ, τ ) ∈Γ} ∪ H n . Then V n [:, H n ] ∪ R n [:, E n ] is a persistent homology cycle basis. The lifespan of V n [:, σ] is [Birth(σ), ∞) and the lifespan of R n [:, τ ] is [Birth(σ), Birth(τ )), where σ is the unique simplex such that (σ, τ ) ∈ Γ.
4. In particular, the barcode of K • may be read off from the R = DV decompositions.

CORRECTNESS OF ALGORITHMS 1 AND 2
Here we provide proofs of correctness for Algorithms 1 and 2. As the details are primarily technical in nature, the exposition is fairly terse. The arguments are primarily self-contained, however we begin by recalling one result from Henselman and Ghrist [2016], which will be used in the proof of Algorithm 1.

Review: characterization of persistent homology bases
We will invoke a result from Henselman and Ghrist [2016] and Henselman-Petrusek [2017] which requires one new definition and one new notational convention. For notation, let n be given, fix i < j , and set We then define q i,j as the quotient map THEOREM 2.2 (Henselman and Ghrist [2016]; Henselman-Petrusek [2017]). A family of n-cycles E is a persistent homology cycle basis iff is an (i, j) basis for all i and j.
Remark 2.3. A special case of Theorem 2.2 (for simplex-wise filtrations) also appeared in [Wu et al., 2017, Theorem 1]. This construction is also closely related to that of pair groups, c.f. Edelsbrunner and Harer [2008].

Correctness of Algorithm 1
We restate Algorithm 1 for ease of reference.
Algorithm 1 Edge-loss persistent cycle minimization 1: Compute a persistent homology basis B for homology in dimension 1, with coefficients in Q, using the standard matrix decomposition procedure described in the Supplementary Material. Arrange the elements of B into an ordered sequence Z 0 = (z 0,1 , . . . , z 0,m ). 2: for j = 0, . . . , m − 1 do 3: Solve Program (14) to optimize the j + 1th element of Z j . Let x denote the solution to this problem, and define Z j+1 by replacing the j + 1th element of Z j with x. Concretely, z j+1,j+1 = x, and z j+1,k = z j,k for k = j. 4: end for 5: Return B * := {z m,1 , . . . , z m,m }, the set of elements in Z m .
Recall that Program (14) optimizes the jth element of an ordered sequence of cycle representatives Z = (z 1 , . . . , z m ). In particular, it seeks to minimize x Orig := z j . To define this program, we first construct a matrix A such that A[:, i] = z i for i = 1, . . . , m. We then define three index sets, P, Q, R such that (14) can then be defined as follows.
THEOREM 2.4. For each k, the family of cycles {z k,1 , . . . , z k,m } constructed in Algorithm 1 is a persistent homology cycle basis. Moreover, lifespans are preserved, in the sense that for all k and l.
An optimal solution to Program (14) can then be expressed in form Since w ∈ span({z t : t ∈ T }), it is easily argued that span(F ) = span(F ). Thus span(q i,j (F )) = span(q i,j (F )) = span(q i,j (F )) = Q i,j . Dimension counting thus implies that q i,j (F ) is an (i, j)-basis.
Given this observation, it is straightforward to verify that {z k+1,1 , . . . , z k+1,m } is a bona-fide cycle basis and L(x) = L(x Orig ). The desired conclusion follows. This establishes our primary objective: THEOREM 2.5. The set B * returned by Algorithm 1 is a bona fide persistent homology cycle basis of the filtered simpicial complex K • .

Correctness of Algorithm 2
Recall that Obayashi [2018] defines a persistent volume for a birth-death pair (σ b i , σ d i ) as an (n + 1) where is the family of l-simplices whose birth time lies strictly between b i and d i . The linear program associated to (σ b i , σ d i ) in Obayashi [2018] can then be summarized as Let us restate Algorithm 2 and the corresponding optimization problem, Program (15), for ease of reference.
Recall that we refer to Program (15) as the general triangle-loss problem.
Algorithm 2 Triangle-loss persistent cycle minimization 1: Place a filtration-preserving linear order ≤ (l) on S l (K) for each l. 2: Compute an R = ∂ n+1 V decomposition as described in Cohen-Steiner et al. [2006] and the Supplementary Material. We then obtain a set Γ of birth/death pairs (σ, τ ). 3: For each (σ, τ ) ∈ Γ such that Birth(σ) < Birth(τ ), put To verify that Algorithm 2 returns a bona-fide persistent homology cycle basis, let us begin by placing the elements of K into a sequence (σ 1 , . . . , σ |K| ) by ordering simplices first by birth time, second (that is, breaking ties when birth times agree) by dimension, and finally (breaking ties when birth times and dimensions agree) by the chosen linear orders ≤ (l) . It is simple to verify that this rule defines a unique linear order on K, and that the filtration K • defined by THEOREM 2.6. Let (σ, τ ) be a birth-death pair, and choose b i , d i such that (σ b i , σ d i ) = (σ, τ ). Then the sets F n and F n+1 defined in Algorithm 2 both satisfy (S5). Consequently, Program (15) is a special case of Program (10).
PROOF. The proof is a straightforward exercise in definition checking. Now letB be a set containing the boundary of one optimal solution x σ,τ to Program (15) for each birth-death pair (σ, τ ) = (σ b i , σ d i ) (even if Birth(σ) = Birth(τ ))). Let N be a persistent homology cycle basis for K • , and letB = {z ∈ N : Death(z) = ∞} be the collection of cycle representatives in N that never die. Then, by combining Theorem 2.6 with Theorem 5 of Obayashi [2018], we find that B :=B ∪B is a persistent homology cycle basis of K • .
If we assume that the bounding volumes used to obtainB are the same as those used to obtainD in Algorithm 2, and, likewise, thatB =D (this will be true provided we use the order ≤ (l) when ordering columns of ∂ l for the K • calculation) then where Birth and Death are the birth and death functions of K • , not K • .
From here, it remains only to verify that D is a bona fide persistent homology cycle basis K • . This follows from the following general observation. THEOREM 2.7. Let L • be a refinement of a filtration L • on a simplicial complex L, and let B be a persistent homology cycle bases for L • . If Birth and Death are the birth and death functions of L • and D := {z ∈ B : Birth(z) < Death(z)}, then D is a persistent homology cycle basis of L • .
PROOF. Recall that, by definition, a set E is a persistent homology cycle basis of K • iff two criteria hold: (i) L(z) must be nonempty for each z ∈ E, and (ii) {[z] ∈ H n (K i ; G) : i ∈ L(z)} is a basis for H n (K i ; G) for each i. Since B is a persistent homology cycle basis for L • , it is a straightforward exercise in definition checking to verify that D is a persistent homology cycle basis for L • . This establishes our primary objective: THEOREM 2.8 (Correctness of Algorithm 2). Algorithm 2 returns a bona-fide persistent homology cycle basis of K • . Figure S1: Computation time of the GLPK linear solver (red) and the Gurobi linear solver (green) to solve the uniform/length-weighted edge-loss minimal problems in Algorithm 1. We perform experiments on 90 data sets, 10 for each dimension 2-10, generated from the normal distribution. The horizontal axis is the dimension of the data set, and the vertical axis is the time it takes to solve an optimization problem. We observe that the Gurobi solver is consistently faster than the GLPK solver and that computation time seems fairly constant across dimension.