A Practical Guide to the Numerical Implementation of Tensor Networks I: Contractions, Decompositions, and Gauge Freedom

Evenbly, Glen

doi:10.3389/fams.2022.806549

METHODS article

Front. Appl. Math. Stat., 01 June 2022

Sec. Mathematics of Computation and Data Science

Volume 8 - 2022 | https://doi.org/10.3389/fams.2022.806549

This article is part of the Research TopicHigh-Performance Tensor Computations in Scientific Computing and Data ScienceView all 11 articles

A Practical Guide to the Numerical Implementation of Tensor Networks I: Contractions, Decompositions, and Gauge Freedom

Glen Evenbly^*

School of Physics, Georgia Institute of Technology, Atlanta, GA, United States

We present an overview of the key ideas and skills necessary to begin implementing tensor network methods numerically, which is intended to facilitate the practical application of tensor network methods for researchers that are already versed with their theoretical foundations. These skills include an introduction to the contraction of tensor networks, to optimal tensor decompositions, and to the manipulation of gauge degrees of freedom in tensor networks. The topics presented are of key importance to many common tensor network algorithms such as DMRG, TEBD, TRG, PEPS, and MERA.

1. Introduction

Tensor networks have been developed as a useful formalism for the theoretical understanding of quantum many-body wavefunctions [1–10], especially in regards to entanglement [11–13], and are also applied as powerful numeric tools and simulation algorithms. Although developed primarily for the description of quantum many-body systems, they have since found use in a plethora of other applications such as quantum chemistry [14–18], holography [19–24], machine learning [25–29] and the simulation of quantum circuits [30–35].

There currently exist many useful references designed to introduce newcomers to the underlying theory of tensor networks [1–10]. Similarly, for established tensor network methods, there often exist instructional or review articles that address the particular method in great detail [36–40]. Nowadays, many research groups have also made available tensor network code libraries [41–48]. These libraries typically allow other researchers to make use of highly optimized tensor network routines for practical purposes (such as for the numerical simulation of quantum many-body systems).

Comparatively few are resources intended to help researchers that already possess a firm theoretical grounding to begin writing their own numerical implementations of tensor network codes. Yet such numerical skills are essential in many areas of tensor network research: new algorithmic proposals typically require experimentation, testing and bench-marking using numerics. Furthermore, even researchers solely interested in the application of tensor network methods to a problem of interest may be required to program their own version of a method, as a pre-built package may not contain the necessary features as to be suitable for the unique problem under consideration. The purpose of our present work is to help fill this aforementioned gap: to aid students and researchers, who are assumed to possess some prior theoretical understanding of tensor networks, to learn the practical skills required to program their own tensor networks codes and libraries. Indeed, our intent is to arm the interested reader with the key knowledge that would allow them to implement their own versions of algorithms such as the density matrix renormalization group (DMRG) [49–51], time-evolving block decimation (TEBD) [52, 53], projected entangled pair states (PEPS) [54–56], multi-scale entanglement renormalization ansatz (MERA) [57], tensor renormalization group (TRG) [58, 59], or tensor network renormalization (TNR) [60]. Furthermore, this manuscript is designed to compliment online tensor network tutorials [61], which have a focus on code implementation, with more detailed explanations on tensor network theory.

2. Preliminaries

2.1. Prior Knowledge

As stated above, the goal of this manuscript is to help readers that already possess some understanding of tensor network theory to apply this knowledge toward numeric calculations. Thus we assume that the reader has some basic knowledge of tensor networks, specifically that they understand what a tensor network is and have some familiarity with the standard diagrammatic notation used to represent them. An overview of these concepts is presented in Figure 1, otherwise more comprehensive introductions to tensor network theory can be found in [1–10].

FIGURE 1

Figure 1. (A–C) Diagrammatic representations of a vector A_i (or order-1 tensor), a matrix B_ij (or order-2 tensor) and an order-3 tensor C_ijk. (D) A contraction, or summation over an index, between two tensors is represented by a line joining two tensors.

Note that we shall not assume prior knowledge of quantum many-body physics, which is the most common application of tensor network algorithms. The skills and ideas that we introduce in this manuscript are intended to be general for the tensor network formalism, rather than for their use in a specific application, thus can also carry over to other area in which tensor networks have proven useful such as quantum chemistry [14–18], holography [19–24], machine learning [25–29], and the simulation of quantum circuits [30–35].

2.2. Software Libraries

Currently there exists a wide variety of tensor network code libraries, which include [41–48]. Many of these libraries differ greatly in not only their functioning but also their intended applications, and may have their own specific strengths and weaknesses (which we will not attempt to survey in the present manuscript). Almost all of these libraries contain tools to assist in the tasks described in this manuscript, such as contracting, decomposing and re-gauging tensor networks. Additionally many of these libraries also contain full featured versions of complete tensor network algorithms, such as DMRG or TEBD. For a serious numerical calculation involving tensor networks, one where high performance is required, most researchers would be well-advised to utilize an existing library.

However, even if ultimate intent is to use existing library, it is still desirable that one should understand the fundamental tensor network manipulations used in numerical calculations. Indeed, this understanding is necessary to properly discern the limitations of various tensor network tools, to ensure that they are applied in an appropriate way, and to customize the existing tools if necessary. Moreover, exploratory research into new tensor network ansatz, algorithms and applications often requires non-standard operations and tensor manipulations which may not be present in any existing library, thus may require the development of extensive new tensor code. In this setting it can be advantageous to minimize or to forgo the usage of an existing library (unless one was already intimately familiar with its inner workings), given the inherent challenge of extending a library beyond its intended function and the possibility of unintended behavior that this entails.

In the remaining manuscript we aim to describe key tensor network operations (namely contracting, decomposing and re-gauging tensor networks) with sufficient detail that would allow the interested reader to implement tasks numerically without the need to rely on an existing code library.

2.3. Programming Language

Before attempting to implement tensor methods numerically one must, of course, decide on which programming language to use. High-level languages with a focus on scientific computation, such as MATLAB, Julia, and Python (with Numpy) are well-suited for implementing tensor network methods as they have native support for multi-dimensional arrays (i.e., tensors), providing simple and convenient syntax for common operations on these arrays (indexing, slicing, scalar operations) as well as providing a plethora of useful functions for manipulating these tensor objects. Alternatively, some tensor network practitioners may prefer to use lower-level languages such as Fortran or C++ when implementing tensor network algorithms usually for the reason of maximizing performance. However, in many tensor network codes the bulk of the computation time is spent performing large matrix-matrix multiplications, for which even interpreted languages (like MATLAB) still have competitive performance with compiled languages as they call the same underlying BLAS routines. Nonetheless, there are some particular scenarios, such as in dealing with tensor networks in the presence of global symmetries [62–68], where a complied language may offer a significant performance advantage. In this circumstance Julia, which is a compiled language, or Python, in conjunction with various frameworks which allow it to achieve some degree of compilation, may be appealing options.

2.4. Terminology

Before proceeding, let us establish the terminology that we will use when discussing tensor networks. We define the order of a tensor as the number of indices it has, i.e., such that a vector is order-1 and a matrix is order-2. The term rank (or decomposition rank) of a tensor will refer to the number of non-zero singular values with respect to a some bi-partition of the tensor indices. Note that many researchers also use the term rank to describe the number of tensor indices; here we use the alternative term order specifically to avoid the confusion that arises from the double usage of rank. The number of values a tensor index can take will be referred to as the dimension of an index (or bond dimension), which is most often denoted by χ but can also be denoted by m, d, or D. In most cases, the use of d or D to denote a bond dimension is less preferred, as this can be confused which the spatial dimension of the problem under consideration (e.g., when considering a model on a 1D or 2D lattice geometry).

3. Tensor Contractions

The foundation of all tensor networks routines is the contraction of a network containing multiple tensors into a single tensor. An example of the type problem that we consider is depicted in Figure 2A, where we wish to contract the network of tensors {A, B, C} to form an order-3 tensor F, which has components defined

\begin{array}{l} F_{i j k} = \sum_{l, m, n} A_{l j m} B_{i l n} C_{n m k} . & (1) \end{array}

Note that a convention for tensor index ordering is required for the figure to be unambiguously translated to an equation; here we assumed that indices progress clock-wise on each tensor starting from 6 o'clock. Perhaps the most obvious way to evaluate Equation (1) numerically would be through a direct summation over the indices (l, m, n), which could be implemented using a set of nested “FOR” loops. While this approach of summing over all internal indices of a network will produce the correct answer, there are numerous reasons why this is not the preferred approach for evaluating tensor networks. The foremost reason is that it is not the most computationally efficient approach (excluding, perhaps, certain contractions involving sparse tensors, which we will not consider here). Let us analyse the contraction cost for the example given in Equation (1), assuming all tensor indices are χ-dimensional. A single element of tensor F, which has χ³ elements in total, is given through a sum over indices (l, m, n), which requires O(χ³) operations. Thus the total cost of evaluating tensor F through a direct summation over all internal indices is O(χ⁶).

FIGURE 2

Figure 2. (A) The internal indices (l, m, n) of the network {A, B, C} are contracted to give tensor F. (B) The network is contracted via a sequence of two pairwise tensor contractions, the first of which results in the intermediate tensor D.

Now, let us instead consider the evaluation of tensor F broken up into two steps, where we first compute an intermediate tensor D as depicted in Figure 2B,

\begin{array}{l} D_{i j m n} = \sum_{l} A_{l j m} B_{i l n}, & (2) \end{array}

before performing a second contraction to give the final tensor F,

\begin{array}{l} F_{i j k} = \sum_{n, m} D_{i j m n} C_{n m} . & (3) \end{array}

Through similar logic as before, one finds that the cost of evaluating intermediate tensor D scales as O(χ⁵), whilst the subsequent evaluation of F in Equation (3) is also O(χ⁵). Thus breaking the network contraction down into a sequence of smaller contractions each only involving a pair of tensors (which we refer to as a pairwise tensor contraction) is as computationally cheap or cheaper for any non-trivial bond dimension (χ> 1). This is true in general: for any network of 3 or more (dense) tensors it is always at least as efficient (and usually vastly more efficient) to break network contraction into sequence of pairwise contractions, as opposed to directly summing over all the internal indices of the network.

Two natural questions arise at this point. (i) What is optimal way to implement a single pairwise tensor contraction? (ii) Does the chosen sequence of pairwise contractions affect the total computational cost and, if so, how does one decide what sequence to use? We begin by addressing the first question.

3.1. Pairwise Tensor Contractions

Let us consider the problem of evaluating a pairwise tensor contraction, denoted (A × B), between tensors A and B that are connected by one or more common indices. A straight-forward method to evaluate such contractions, as in the examples of Equations (2) and (3), is by using nested “FOR” loops to sum over the shared indices. The computational cost of this evaluation, in terms of the number of scalar multiplications required, can be expressed concisely as

\begin{array}{l} cost : (A \times B) = \frac{| dim (A) | \cdot | dim (B) |}{| dim (A \cap B) |}, & (4) \end{array}

with |dim(A)| as the total dimension of A (i.e., the product of its index dimensions) and |dim(A∩B)| as the total dimension of the shared indices.

Alternatively, one can recast a pairwise contraction as a matrix multiplication as follows: first reorder the free indices and contracted indices on each of A and B such that they appear sequentially (which can be achieved in MATLAB using the “permute” function) and then group the free-indices and the contracted indices each into a single index (which can be achieved in MATLAB using the “reshape” function). After these steps the contraction is evaluated using a single matrix-matrix multiplication, although the final product may also need to be reshaped back into a tensor. Recasting as a matrix multiplication does not reduce the formal computational cost from Equation (4). However, modern computers, leveraging highly optimized BLAS routines, typically perform matrix multiplications significantly more efficiently than the equivalent “FOR” loop summations. Thus, especially in the limit of tensors with large total dimension, recasting as a matrix multiplication is most often the preferred approach to evaluate pairwise tensor contractions, even though this requires some additional computational overhead from the necessity of rearranging tensor elements in memory when using “permute”. Note that the “tensordot” function in the Numpy module for Python conveniently evaluates a pairwise tensor contraction using this matrix multiplication approach.

3.2. Contraction Sequence

It is straight-forward to establish that, when breaking a network contraction into a sequence of binary contractions, the choice of sequence can affect the total computational cost. As an example, we consider the product of two matrices A, B with vector C,

\begin{array}{l} F_{i} = \sum_{j, k} A_{i j} B_{j k} C_{k}, & (5) \end{array}

where all indices are assumed to be dimension χ, see also Figure 3. If we evaluate this expression by first performing the matrix-matrix multiplication, i.e., as F = (A × B) × C, then the leading order computational cost scales as O(χ³) by Equation (4). Alternatively, if we evaluate the expression by first performing the matrix-vector multiplication, i.e., as F = A × (B × C), then the leading order computational cost scales as O(χ²). Thus it is evident that the sequence of binary contractions needs to be properly considered in order to minimize the overall computational cost.

FIGURE 3

Figure 3. (A) A product of three tensors {A, B, C} is contracted to a tensor F, where all indices are assumed to be of dimension χ. (B,C) The total computational cost of contracting the network depends on the sequence of pairwise contractions; the cost from following the sequence in (B) scales as (χ³+χ²) as compared to the cost from (C) which scales as (2χ²).

So how does one find the optimal contraction sequence for some tensor network of interest? For the networks that arise in common algorithms (such as DMRG, MERA, PEPS, TRG and TNR) it is relatively easy, with some practice, to find the optimal sequence through manual inspection or trial-and-error. This follows as most networks one needs to evaluate contain fewer than 10 tensors and the tensor index dimensions take a only single or a few distinct values within a network, which limits the number of viable contraction sequences that need be considered. More generally, determination of optimal contraction sequences is known to be an NP-hard problem [69], such that it is very unlikely that an algorithm which scales polynomially with the number of tensors in the network will ever be found to exist. However, numerical methods based on exhaustive searches and/or heuristics can typically find optimal sequences for networks with fewer than 20 tensors in a reasonable amount of time [69–72], and larger networks are seldom encountered in practice.

Note that many tensor network optimization algorithms based on an iterative sweep, where the same network diagrams are contracted each iteration (although perhaps containing different tensors and with different bond dimensions). The usual approach in this setting is to determine the optimal sequences once, before beginning the iterative sweeps, using the initial bond dimensions and then cache the sequences for re-use in later iterations. The contraction sequences are then only recomputed the if the bond dimensions stray too far from the initial values.

3.3. Network Contraction Routines

Although certainly feasible, manually writing the code for each tensor network contraction as a sequence of pairwise contractions is not recommended. Not only is substantial programming effort required, but this also results in code which is error-prone and difficult to check. There is also a more fundamental problem: contracting a network by manually writing a sequence of binary contractions requires specifying a particular contraction sequence at the time of coding. However, in many cases the index dimensions within networks are variable, and the optimal sequence can change depending on the relative sizes of dimensions. For instance, one may have a network which contains indices of dimensions χ₁ and χ₂, where the optimal contraction sequence changes dependant of whether χ₁ is larger or smaller than χ₂. In order to have a program which works efficiently in both regimes, one would have to write code separately for both contraction sequences.

Given the considerations above, the use of an automated contraction routine, such as the “ncon” (Network-CONtractor) function [61, 73] or something similar from an existing tensor network library [41–48], is highly recommended. Automated contraction routines can evaluate any network contraction in a single call by appropriately generating and evaluating a sequence of binary contractions, hence greatly reducing both the programming effort required and the risk of programming errors occurring. Most contraction routines, such as “ncon”, also remove the need to fix a contraction sequence at the time of writing the code, as the sequence can be specified as an input variable to the routine and thus can be changed without the need to rewrite any code. This can also allow the contraction sequence to be adjusted dynamically at run-time to ensure that the sequence is optimal given the specific index dimensions in use.

3.4. Summary: Contractions

In evaluating a network of multiple tensors, it is always more efficient to break the contraction into a sequence of pairwise tensor contractions, each of which should (usually) be recast into a matrix-matrix multiplication in order to achieve optimal computational performance. The total cost of evaluating a network can depend on the particular sequence of pairwise contractions chosen. While there is no known method for determining an optimal contraction sequence that is efficiently scalable in the size of the network, manual inference or brute-force numeric searches are usually viable for the relatively small networks encountered in common tensor network algorithms. When coding a tensor network program it is useful to utilize an automated network contraction routine which can evaluate a tensor network in a single call by properly chaining together a sequence of pairwise contractions. This not only reduces the programming effort required, but also grants a program more flexibility in allowing a contraction sequence to be easily changed.

4. Matrix Factorizations

Another key operation common in tensor network algorithms, complimentary to the tensor contractions considered previously, are factorizations. In this section, we will discuss some of the various means by which a higher-order tensor can be split into a product of fewer-order tensors. In particular, the means that we consider involve applying standard matrix decompositions [74, 75], to tensor unfoldings, such that this section may serve as a review of the linear algebra necessary before consideration of more sophisticated network decompositions. Specifically we recount the spectral decomposition, QR decomposition and singular value decomposition and outline their usefulness in the context of tensor networks, particular in achieving optimal low-rank tensor approximations. Before discussing decompositions, we define some special types of tensor.

4.1. Special Tensor Types

A d-by-d matrix U is said to be unitary if it has orthonormal rows and columns, which implies that it annihilates to the identity when multiplied with its conjugate,

\begin{array}{l} U^{†} U = U U^{†} = I, & (6) \end{array}

where I is the d-by-d identity matrix. We define a tensor (whose order is greater than 2) as unitary with respect to a particular bi-partition of indices if the tensor can be reshaped into a unitary matrix according to this partition. Similarly an d₁-by-d₂ matrix W, with d₁ > d₂ is said to be an isometry if

\begin{array}{l} W^{†} W = I, & (7) \end{array}

with I the d₂-by-d₂ identity matrix. Likewise we say that a tensor (order greater than 2) is isometric with respect to a particular bi-partition of indices if the tensor can be reshaped into a isometric matrix. Notice that, rather than equalling identity, the reverse order product does now evaluate to a projector P,

\begin{array}{l} W W^{†} = P, & (8) \end{array}

where projectors are defined as Hermitian matrices that square to themselves,

\begin{array}{l} P = P^{†}, P^{2} = P . & (9) \end{array}

4.2. Useful Matrix Decompositions

A commonly used matrix decomposition is the spectral decomposition (or eigen-decomposition). In the context of tensor network codes it is most often used for Hermitian, positive semi-definite matrices, such as for the density matrices used to describe quantum states. If H is a d × d Hermitian matrix, or tensor that can be reshaped into such, then the spectral decomposition yields

\begin{array}{l} H = U D U^{†}, & (10) \end{array}

where U is d × d unitary matrix and D is diagonal matrix of eigenvalues, see also Figure 4A. The numerical cost of performing the decomposition scales as O(d³). In the context of tensor network algorithms the spectral decomposition is often applied to approximate a Hermitian tensor with one of smaller rank, as will be discussed in Section 4.4.

FIGURE 4

Figure 4. Depiction of some common matrix decompositions. All indices are assumed to be of dimension d unless otherwise indicated. (Ai) The spectral decomposition is applied to the order-4 Hermitian tensor H across the partition indicated by the dashed line, yielding a diagonal matrix of eigenvalues D and a unitary U. (Aii) The unitary tensor U annihilates to identity with its conjugate, as per Equation (6). (Bi) The QR decomposition is applied to the order-3 tensor A across the partition indicated, yielding an isometry Q and an upper triangular matrix R. (Bii) The isometry Q annihilates to identity with its conjugate as per Equation (7), while the R matrix is upper triangular. (Ci) The singular value decomposition (SVD) is applied to the order-3 tensor A across the partition indicated, yielding an isometry U, a diagonal matrix of singular values S, and a unitary V. (Cii) Depiction of the constraints satisfied by the isometry U and unitary V.

Another useful decomposition is the QR decomposition. If A be an arbitrary d₁×d₂ matrix with d₁>d₂, or tensor that can be reshaped into such, then the QR decomposition gives

\begin{array}{l} A = Q R, & (11) \end{array}

see also Figure 4B. Here Q is d₁ × d₂ isometry, such that Q^†Q = I, where I is the d₂ × d₂ identity matrix, and R is d₂ × d₂ upper triangular matrix. Note that we are considering the so-called economical decomposition (which is most often used in tensor network algorithms); otherwise the full decomposition gives Q as a d₁ × d₁ unitary and R is dimension d₁ × d₂. The numerical cost of the economical QR decomposition scales as the larger dimension times the square of the smaller dimension $O (d_{1} {d_{2}}^{2})$ , as opposed to cost $O ({d_{1}}^{2} d_{2})$ for the full decomposition. The QR decomposition is one of the most computationally efficient ways to obtain an orthonormal basis for the column space of a matrix, thus a common application is in orthogonalizing tensors within a network (i.e., transforming them into isometries), which will be discussed further in Section 5.3.1.

The final decomposition that we consider is the singular value decomposition (SVD), which is also widely used in many areas of mathematics, statistics, physics and engineering. The SVD allows an arbitrary d₁ × d₂ matrix A, where we assume for simplicity that d₁ ≥ d₂, to be decomposed as

\begin{array}{l} A = U S V^{†} & (12) \end{array}

where U is d₁ × d₂ isometry (or unitary if d₁ = d₂), S is diagonal d₂ × d₂ matrix of positive elements (called singular values), and V is d₂ × d₂ unitary matrix, see also Figure 4C. Similar to the economical QR decomposition, we have also considered the economical form of the SVD; the full SVD would otherwise produce U as a d₁ × d₁ unitary and S as a rectangular d₁ × d₂ matrix padded with zeros. The numerical cost of the economical SVD scales as $O (d_{1} {d_{2}}^{2})$ , identical to that of the economical QR decomposition. The rank of a tensor (across a specified bi-partition) is defined as the number of non-zero singular values that appear in the SVD. A common application of the SVD is in finding an approximation to a tensor by another of smaller rank, which will be discussed further in Section 4.4.

Notice that for any matrix A the spectral decompositions of AA^† and A^†A are related to the SVD of A; more precisely, the eigenvectors of AA^† and A^†A are equivalent to the singular vectors in U and V respectively of the SVD in Equation (12). Furthermore the (non-zero) eigenvalues in AA^† or A^†A are the squares of the singular values in S. It can also be seen that, for a Hermitian positive semi-definite matrix H, the spectral decomposition is equivalent to an SVD.

4.3. Tensor Norms

The primary use for matrix decompositions, such as the SVD, in the context of tensor networks is in accurately approximating a higher-order tensor as a product lower-order tensors. However, before discussing tensor approximations, it is necessary to define the tensor norm in use. A tensor norm that is particularly convenient is the Frobenius norm (or Hilbert-Schmidt norm). Given a tensor A_ijk… the Frobenius norm for A, denoted as ||A||, is defined as the square-root of the sum of the magnitude of each element squared,

\begin{array}{l} | | A | | = \sqrt{\sum_{i j k \dots} {| A_{i j k \dots} |}^{2}} . & (13) \end{array}

This can be equivalently expressed as the tensor trace of A multiplied by its conjugate,

\begin{array}{l} | | A | | = \sqrt{Ttr (A^{†}, A)}, & (14) \end{array}

where the tensor trace, Ttr(A^†, A), represents the contraction of tensor A with its conjugate over all matching indices, see Figure 5. It also follows that Frobenius norm is related to the singular values s_k of A across any chosen bi-partition,

\begin{array}{l} | | A | | = \sqrt{\sum_{k} {(s_{k})}^{2}} . & (15) \end{array}

Notice that Equation (14) implies that the difference ε = ||A−B|| between two tensors A and B of equal dimension can equivalently be expressed as

\begin{array}{l} | | A - B | |^{2} = Ttr (A^{†}, A) - 2 | Ttr (A^{†}, B) | + Ttr (B^{†}, B) . & (16) \end{array}

FIGURE 5

Figure 5. For any tensor A the tensor trace Ttr of A with its conjugate A^† (drawn with opposite vertical orientation) is obtained by contracting over all matching indices. The Frobenius norm can be defined as the root of this tensor trace, see Equation (14).

4.4. Optimal Low-Rank Approximations

Given some matrix A, or higher-order tensor that viewed as a matrix across a chosen bi-partition of its indices, we now focus on the problem of finding the tensor B that best approximates A according to the Frobenius norm (i.e., that which minimizes the difference in Equation 16), assuming B has a fixed rank r. Let us assume, without loss of generality, that tensor A is equivalent to a d₁ × d₂ matrix (with d₁ ≥ d₂) under a specified bi-partition of its indices, and that A has singular value decomposition,

\begin{array}{l} A_{i j} = \sum_{k = 1}^{d_{2}} U_{i k} s_{k} V_{k j}^{*}, & (17) \end{array}

where the singular values are assumed to be in descending order, s_k ≥ s_k+1. Then the optimal rank r tensor B that approximates A is known from the Eckart–Young–Mirsky theorem [76], which states that B is given by truncating to retain only the r largest singular values and their corresponding singular vectors,

\begin{array}{l} B_{i j} = \sum_{k = 1}^{r} U_{i k} s_{k} V_{k j}^{*} . & (18) \end{array}

see also Figure 6. It follows that the error of approximation ε = ||A−B||, as measured in the Frobenius norm, is related to the discarded singular values as

\begin{array}{l} ε = \sqrt{\sum_{k > r} {(s_{k})}^{2}} . & (19) \end{array}

If the spectrum of singular values is sharply decaying then the error is well-approximated by the largest of the discarded singular values, ε≈s_(r+1).

FIGURE 6

Figure 6. (A) The singular value decomposition is taken on tensor A across a bi-partition between its top two and bottom three indices, and is assumed to produce d non-zero singular values. (B) Tensor B is now defined by truncating the matrix of singular values $S \to \tilde{S}$ to retain only r < d of the largest singular values, while similarly truncating the matrices of singular vectors, U → Ũ and V → Ṽ, to retain only the corresponding singular vectors. By the Eckart–Young–Mirsky theorem [76] it is known that B is the optimal rank-r approximation to A (across the chosen bi-partition of tensor indices).

Notice that, in the case that the tensor A under consideration is Hermitian and positive definite across the chosen bi-partition, that the spectral decomposition could instead be used in Equation (17). The low-rank approximation obtained by truncating the smallest eigenvalues would still be guaranteed optimal, as the spectral decomposition is equivalent to the SVD in this case.

4.5. Summary: Decompositions

In this section we have described how special types of matrices, such as unitary matrices and projections, can be generalized to the case of tensors (which can always be viewed as a matrix across a chosen bi-partition of their indices). Similarly we have shown how several concepts from matrices, such as the trace and the norm, are also generalized to tensors. Finally, we have described how the optimal low-rank approximation to a tensor can be obtained via the SVD.

5. Gauge Freedom

All tensor networks possess gauge degrees of freedom; exploiting the gauge freedom allows one to change the tensors within a network whilst leaving the final product of the network unchanged. In this section, we describe methods for manipulating the gauge degrees of freedom and discuss the utility of fixing the gauge in a specific manner.

5.1. Tree Tensor Networks

In this manuscript we shall restrict our considerations of gauge manipulations to focus only on tensors networks that do not possess closed loops (i.e., networks that correspond to acyclic graphs), which are commonly referred to as tree tensor networks (TTN) [77, 78]. Figure 7A presents an example of a tree tensor network. If we select a single tensor to act as the center (or root node) then we can understand the tree tensor network as being composed of a set of distinct branches extending from this chosen tensor. For instance, Figure 7B depicts the three branches (excluding the single trivial branch) extending from the 4th-order tensor A. It is important to note that connections between the different branches are not possible in networks without closed loops; this restriction is required for the re-gauging methods considered in this manuscript. However these methods can (mostly) be generalized to the case of networks containing closed loops by using a more sophisticated formalism as shown in [79].

FIGURE 7

Figure 7. (A) An example of a tree tensor network (TTN), here composed of 7 tensors. (B) The three (non-trivial) branches of the tree with respect to choosing A as the root tensor.

5.2. Gauge Transformations

Consider a tensor network of multiple tensors that, under contraction of all internal indices, evaluates to some product tensor H. We now pose the following question: is there a different choice of tensors with the same network geometry that also evaluates to H? Clearly the answer to this question is yes! As shown below in Figure 8A, on any of the internal indices of the network one can introduce a resolution of the identity (i.e., a pair of matrices X and X⁻¹) which, by construction, does not change the final product that the network evaluates to. However, absorbing one of these matrices into each adjoining tensor changes the individual tensors, see Figure 8B, while leaving the product of the network unchanged. It follows that there are infinitely many choices of tensors such that the network product evaluates to some fixed output tensor, since the gauge change matrix X can be any invertible matrix. This ability to introduce an arbitrary resolution of the identity on an internal index, while leaving the product of the network unchanged, is referred to as the gauge freedom of the network.

FIGURE 8

Figure 8. (A) Given a network of three tensors {A, B, C}, one can introduce gauge change matrices X and Y (together with their inverses) on the internal indices of the network while leaving the final product D of the network unchanged. (B) Definitions of the new tensors ${Ã, \tilde{B}, \tilde{C}}$ after the change of gauge.

While in some respects the gauge freedom could be considered bothersome, as it implies tensor decompositions are never unique, it can also be exploited to simplify many types of operations on tensor networks. Indeed, most tensor network algorithms require fixing the gauge in a prescribed manner in order to function correctly. In the following sections we discuss ways to fix the gauge degree of freedom as to create an orthogonality center and the benefits of doing so.

5.3. Orthogonality Centers

A given tensor A within a network is said to be an orthogonality center if every branch connected to tensor A annihilates to the identity when contracted with its conjugate as shown in Figure 9. Equivalently, each branch must (collectively) form an isometry across the bi-partition between its open indices and the index connected to tensor A. By properly manipulating the gauge degrees of freedom, it is possible to turn any tensor with a tree tensor network into an orthogonality center [80]. We now discuss two different methods for achieving this: a “pulling through” approach, which was a key ingredient in the original formulation of DMRG [49–51], and a “direct orthogonalization” approach, which was an important part of the TEBD algorithm [52, 53].

FIGURE 9

Figure 9. (A) An example of a tree tensor network. (B) A depiction of the constraints required for the tensor A to be an orthogonality center: each of the branches must annihilate to the identity when contracted with its conjugate.

5.3.1. Creating an Orthogonality Center via “Pulling Through”

Here we describe a method for setting a tensor A within a network as an orthogonality center through iterative use of the QR decomposition. The idea behind his method is very simple: if each individual tensor within a branch is transformed into a (properly oriented) isometry, then the entire branch collectively becomes an isometry and thus the tensor at center of the branches becomes an orthogonality center. Let us begin by orienting each index of the network by drawing an arrow pointing toward the desired center A. Then, starting at the tip of each branch, we should perform a QR decomposition on each tensor based on a bi-partition between its incoming and outgoing indices. The R part of the decomposition should then be absorbed into the next tensor in the branch (i.e., closer to the root tensor A), and the process repeated as depicted in Figures 10A–C. At the final step an R part of the QR decomposition from each branch is absorbed into the central tensor A and the process is complete, see also Figure 10D.

FIGURE 10

Figure 10. A depiction of how the tensor A can be made into an orthogonality center of the network from Figure 7 via the “pulling-though” approach. (A,B) Tensors D and E, which reside at the tips of a branch, are decomposed via the QR decomposition. (C) The R components of the previous QR decompositions are absorbed into the B tensor higher on the branch, which is then itself decomposed via the QR decomposition. (D) Following this procedure, all tensors in the network are orthogonalized (with respect to their incoming vs. outgoing indices) such that A′ becomes an orthogonality center of the network.

Note that the SVD could be used as an alternative to the QR decomposition: instead of absorbing the R part of the QR decomposition into the next tensor in the branch one could absorb the product of the S and V part of the SVD from Equation (12). However, in practice, the QR decomposition is most often preferable as it computationally cheaper than the SVD.

5.3.2. Creating an Orthogonality Center via “Direct Orthogonalization”

Here we describe a method for setting a tensor A within a network as an orthogonality center based on use of a single decomposition for each branch, as depicted in Figure 11. (i) We begin by computing the positive-definite matrix ρ for each branch (with respect to the center tensor A) by contracting the branch with its Hermitian conjugates. (ii) The principle square root X of each of the matrices ρ is then computed, i.e., such that ρ = XX^†, which can be computed using the spectral decomposition if necessary. (iii) Finally, a change of gauge is made on each of the indices of tensor A using the appropriate X matrix and its corresponding inverse X⁻¹, with the X part absorbed in tensor A and the X⁻¹ absorbed in the leading tensor of the branch such that the branch matrix transforms as $ρ \to \tilde{ρ} = X^{- 1} ρ {(X^{- 1})}^{†}$ . It follows that the tensor A is now an orthogonality center as each branch matrix $\tilde{ρ}$ of the transformed network evaluates as the identity,

\begin{array}{l} \tilde{ρ} = X^{- 1} ρ {(X^{- 1})}^{†} = X^{- 1} X X^{†} {(X^{- 1})}^{†} = I . & (20) \end{array}

Note that, for simplicity, we have assumed that the branch matrices ρ do not have zero eigenvalues, such that their inverses exist. Otherwise, if zero eigenvalues are present, the current method is not valid unless the index dimensions are first reduced by truncating any zero eigenvalues.

FIGURE 11

Figure 11. A depiction of how the tensor A from the network of Figure 7 can be made into an orthogonality center via the “direct orthogonalization” approach. (A) A change of gauge is made on every (non-trivial) branch connected to A, such that A becomes an orthogonality center. (B–D) The gauge change matrices {X₁, X₂, X₃} are obtained by contracting each branch with its Hermitian conjugate and then taking the principle root.

5.3.3. Comparison of Methods for Creating Orthogonality Centers

Each of the two methods discussed to create an orthogonality center have their own advantages and disadvantages, such that the preferred method may depend on the specific application under consideration. In practice, the “direct orthogonalization” is typically computationally cheaper and easier to execute, since this method only requires changing the gauge on the indices connected to the center tensor. In contrast the “pulling through” method involves changing the gauge on all indices of the network. Additionally, the “direct orthogonalization” approach can easily be employed in networks of infinite extent, such as infinite MPS [52, 53], if the matrix ρ associated to a branch of infinite extent can be computed using by solving for a dominant eigenvector. While “pulling through” can also potentially be employed for networks of infinite extent, i.e., through successive decompositions until sufficient convergence is achieved, this is likely to be more computationally expensive. However the “pulling through” approach can be advantageous if the branch matrices ρ are ill-conditioned as the errors due to floating-point arithmetic are lesser. This follows since the “direct orthogonalization” requires one to square the tensors in each branch. The “pulling-through” approach also results in transforming every tensor in the network (with the exception of the center tensor) into an isometry, which may be desirable in certain applications.

5.4. Decompositions of Tensors Within Networks

In Section 4, it was described how the SVD could be applied to find the optimal low-rank approximation to a tensor in terms of minimizing the Frobenius norm between the original tensor and the approximation. In the present section we extend this concept further and detail how, by first creating an orthogonality center, a tensor within a network can be optimally decomposed as to minimize the global error coming from consideration of the entire network.

Let us consider a tree tensor network of tensors {A, B, C, D, E, F, G} that evaluates to a tensor H, as depicted in Figure 12A. We now replace a single tensor A from this network by a new tensor A′ such that the network now evaluates to a tensor H′ as depicted in Figure 12B. Our goal is to address the following question: how can we find the optimal low-rank approximation A′ to tensor A such that the error from the full network, ||H−H′||, is minimized? Notice that if we follow the method from Section 4 and simple truncate the smallest singular values of A, see Figure 12C, then this will only ensure that the local error, ||A−A′||, is minimized. The key to resolving this issue is through creation of an orthogonality center, which can reduce the global norm of a network to the norm of a single tensor. Specifically if tensor A is an orthogonality center of a network that evaluates to a final tensor H then it follow from the definition of an orthogonality center that ||H|| = ||A||, as depicted in Figure 13A. Thus it also can be seen that under replacement of the center tensor A with a new tensor A′, such that the network now evaluates to a new tensor H′, that the difference between the tensors ||A−A′|| is precisely equal to the global difference between the networks ||H−H′||. This follows as the overlap of H and H′ equals the overlap of A and A′, as depicted in Figure 13B. In other words, by appropriately manipulating the gauge degrees of freedom in a network, the global difference resulting from changing a single tensor in a network can become equivalent to the local difference between the single tensors. The solution to the problem of finding the optimal low-rank approximation A′ to a tensor A within a network thus becomes clear; we should first adjust the gauge such that A becomes an orthogonality center, after which we can follow the method from Section 4 and create the optimal global approximation (i.e., that which minimizes the global error) by truncating the smallest singular values of A. The importance of this result in the context tensor network algorithms cannot be overstated; this understanding for how to optimally truncate a single tensor within a tensor network, see also [81], is a key aspect of the DMRG algorithm [49–51], the TEBD algorithm [52, 53] and many other tensor network algorithms.

FIGURE 12

Figure 12. (A) A network of 7 tensors {A, B, C, D, E, F, G} contracts to give a tensor H. (B) After replacing a single tensor A→A′ the network contracts to a different tensor H′. (C) The tensor A′ is decomposed into a pair of tensors A_L and A_R, leaving the final tensor H′ unchanged.

FIGURE 13

Figure 13. (A) In the network from Figure 12A, if the tensor A is an orthogonality center then it follows that the norm of the network ||H|| is equal to the norm of the center tensor ||A||. (B) Similarly it follows that, in changing only the center tensor A→A′, the global overlap between the networks H and H′ is equal to the local overlap between the center tensors A and A′.

5.5. Summary: Gauge Freedom

In the preceding section, we discussed manipulations of the gauge degrees of freedom in a tensor network and described two methods that can be used to create an orthogonality center. The proper use of an orthogonality center was then demonstrated to allow one to decompose a tensor within a network in such a way as to minimize the global error. Note that while the results in this section were described only for tree tensor networks (i.e., networks based on acyclic graphs), they can be generalized to arbitrary networks by using more sophisticated methodology [79].

6. Conclusions

Network contractions and decompositions are the twin pillars of all tensor network algorithms. In this manuscript we have recounted the key theoretical considerations required for performing these operations efficiently and also discussed aspects of their implementation in numeric codes. We expect that a proper understanding of these results could facilitate an individuals effort to implement many common tensor network algorithms, such as DMRG, TEBD, TRG, PEPS and MERA, and also further aid researchers in the design and development of new tensor network algorithms.

However, there are still a wide variety of additional general ideas and methods, not covered in this manuscript, that are necessary for the implementation of more advanced tensor network algorithms. These include (i) strategies for performing variational optimization, (ii) methods for dealing with decompositions in networks containing closed loops, (iii) the use of approximations in tensor network contractions. We shall address several of these topics in a follow-up work.

Data Availability Statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author/s.

Author Contributions

GE was solely responsible for the preparation of this manuscript.

Funding

This work was funded by Institutional Startup Funds.

Conflict of Interest

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

1. Cirac JI, Verstraete F. Renormalization and tensor product states in spin chains and lattices. J Phys A Math Theor. (2009) 42:504004. doi: 10.1088/1751-8113/42/50/504004