A Practical Guide to the Numerical Implementation of Tensor Networks I: Contractions, Decompositions, and Gauge Freedom

We present an overview of the key ideas and skills necessary to begin implementing tensor network methods numerically, which is intended to facilitate the practical application of tensor network methods for researchers that are already versed with their theoretical foundations. These skills include an introduction to the contraction of tensor networks, to optimal tensor decompositions, and to the manipulation of gauge degrees of freedom in tensor networks. The topics presented are of key importance to many common tensor network algorithms such as DMRG, TEBD, TRG, PEPS, and MERA.

Comparatively few are resources intended to help researchers that already possess a firm theoretical grounding to begin writing their own numerical implementations of tensor network codes.Yet such numerical skills are essential in many areas of tensor network research: new algorithmic proposals typically require experimentation, testing and bench-marking using numerics.Furthermore, even researchers solely interested in the application of tensor network methods to a problem of interest may be required to program their own version of a method, as a pre-built package may not contain the necessary features as to be suitable for the unique problem under consideration.The purpose of our present work is to help fill this aforementioned gap: to aid students and researchers, who are assumed to possess some prior theoretical understanding of tensor networks, to learn the practical skills required to program their own tensor networks codes and libraries.Indeed, our intent is to arm the interested reader with the key knowledge that would allow them to implement their own versions of algorithms such as the density matrix renormalization group (DMRG) [49][50][51] , time-evolving block decimation (TEBD) 52,53 , projected entangled pair states (PEPS) [54][55][56] , multi-scale entanglement renormalization ansatz (MERA) 57 , tensor renormalization group (TRG) 58,59 or tensor network renormalization (TNR) 60 .Furthermore, this manuscript is designed to compliment online tensor network tutorials 61 , which have a focus on code implementation, with more detailed explanations on tensor network theory.

II. PRELIMINARIES A. Prior knowledge
As stated above, the goal of this manuscript is to help readers that already possess some understanding of tensor network theory to apply this knowledge towards numeric calculations.Thus we assume that the reader has some basic knowledge of tensor networks, specifically that they understand what a tensor network is and have some familiarity with the standard diagrammatic notation used to represent them.An overview of these concepts is presented in Fig. 1, otherwise more comprehensive introductions to tensor network theory can be found in Refs.[1-10].

B. Software Libraries
Currently there exists a wide variety of tensor network code libraries, which include Refs.41 but also their intended applications, and may have their own specific strengths and weaknesses (which we will not attempt to survey in the present manuscript).Almost all of these libraries contain tools to assist in the tasks described in this manuscript, such as contracting, decomposing and re-gauging tensor networks.Additionally many of these libraries also contain full featured versions of complete tensor network algorithms, such as DMRG or TEBD.For a serious numerical calculation involving tensor networks, one where high performance is required, most researchers would be well advised to utilise an existing library.
However, even if ultimate intent is to use existing library, it is still desirable that one should understand the fundamental tensor network manipulations used in numerical calculations.Indeed, this understanding is necessary to properly discern the limitations of various tensor network tools, to ensure that they are applied in an appropriate way, and to customize the existing tools if necessary.Moreover, exploratory research into new tensor network ansatz, algorithms and applications often requires non-standard operations and tensor manipulations which may not be present in any existing library, thus may require the development of extensive new tensor code.In this setting it can be advantageous to minimize or to forgo the usage of an existing library (unless one was already intimately familiar with its inner workings), given the inherent challenge of extending a library beyond its intended function and the possibility of unintended behavior that this entails.
In the remaining manuscript we aim to describe key tensor network operations (namely contracting, decomposing and re-gauging tensor networks) with sufficient detail that would allow the interested reader to implement tasks numerically without the need to rely on an existing code library.

C. Programming language
Before attempting to implement tensor methods numerically one must, of course, decide on which programming language to use.High-level languages with a focus on scientific computation, such as MATLAB, Julia and Python (with Numpy) are well-suited for implementing tensor network methods as they have native support for multi-dimensional arrays (i.e.tensors), providing simple and convenient syntax for common operations on these arrays (indexing, slicing, scalar operations) as well as providing a plethora of useful functions for manipulating these tensor objects.Alternatively, some tensor network practitioners may prefer to use lower-level languages such as Fortran or C ++ when implementing tensor network algorithms usually for the reason of maximising performance.However, in many tensor network codes the bulk of the computation time is spent performing large matrix-matrix multiplications, for which even interpreted languages (like MATLAB) still have competitive performance with compiled languages as they call the same underlying BLAS routines.Nonetheless, there are some particular scenarios, such as in dealing with tensor networks in the presence of global symmetries [62][63][64][65][66][67][68] , where a complied language may offer a significant performance advantage.In this circumstance Julia, which is a compiled language, or Python, in conjunction with various frameworks which allow it to achieve some degree of compilation, may be appealing options.

D. Terminology
Before proceeding, let us establish the terminology that we will use when discussing tensor networks.We define the order of a tensor as the number of indices it has, i.e. such that a vector is order-1 and a matrix is order-2.The term rank (or decomposition rank) of a tensor will refer to the number of non-zero singular values with respect to a some bi-partition of the tensor indices.Note that many researchers also use the term rank to describe the number of tensor indices; here we use the alternative term order specifically to avoid the confusion that arises from the double usage of rank.The number of values a tensor index can take will be referred to as the dimension of an index (or bond dimension), which is most often denoted by χ but can also be denoted by m, d or D. In most cases, the use of d or D to denote a bond dimension is less preferred, as this can be confused which the spatial dimension of the problem under consideration (e.g. when considering a model on a 1D or 2D lattice geometry).

III. TENSOR CONTRACTIONS
The foundation of all tensor networks routines is the contraction of a network containing multiple tensors into a single tensor.An example of the type problem that we consider is depicted in Fig. 2(a), where we wish to contract the network of tensors {A, B, C} to form an order-3 tensor F , which has components defined Note that a convention for tensor index ordering is required for the figure to be unambiguously translated to an equation; here we assumed that indices progress clockwise on each tensor starting from 6 o'clock.Perhaps the most obvious way to evaluate Eq. 1 numerically would be through a direct summation over the indices (l, m, n), which could be implemented using a set of nested 'FOR' loops.While this approach of summing over all internal indices of a network will produce the correct answer, there are numerous reasons why this is not the preferred approach for evaluating tensor networks.The foremost reason is that it is not the most computationally efficient approach (excluding, perhaps, certain contractions involving sparse tensors, which we will not consider here).
Let us analyse the contraction cost for the example given in Eq. 1, assuming all tensor indices are χ-dimensional.
A single element of tensor F , which has χ 3 elements in total, is given through a sum over indices (l, m, n), which requires O(χ 3 ) operations.Thus the total cost of evaluating tensor F through a direct summation over all internal indices is O(χ 6 ).Now, let us instead consider the evaluation of tensor F broken up into two steps, where we first compute an intermediate tensor D as depicted in Fig. 2(b), before performing a second contraction to give the final tensor F , Through similar logic as before, one finds that the cost of evaluating intermediate tensor D scales as O(χ 5 ), whilst the subsequent evaluation of F in Eq. 3 is also O(χ 5 ).Thus breaking the network contraction down into a sequence of smaller contractions each only involving a pair of tensors (which we refer to as a pairwise tensor contraction) is as computationally cheap or cheaper for any non-trivial bond dimension (χ > 1).This is true in general: for any network of 3 or more (dense) tensors it is always at least as efficient (and usually vastly more efficient) to break network contraction into sequence of pairwise contractions, as opposed to directly summing over all the internal indices of the network.Two natural questions arise at this point.(i) What is optimal way to implement a single pairwise tensor contraction?(ii) Does the chosen sequence of pairwise contractions affect the total computational cost and, if so, how does one decide what sequence to use?We begin by addressing the first question.

A. Pairwise tensor contractions
Let us consider the problem of evaluating a pairwise tensor contraction, denoted (A × B), between tensors A and B that are connected by one or more common indices.A straight-forward method to evaluate such contractions, as in the examples of Eq. 2 and Eq. 3, is by using nested 'FOR' loops to sum over the shared indices.The computational cost of this evaluation, in terms of the number of scalar multiplications required, can be expressed concisely as cost : with |dim(A)| as the total dimension of A (i.e. the product of its index dimensions) and |dim(A ∩ B)| as the total dimension of the shared indices.Alternatively, one can recast a pairwise contraction as a matrix multiplication as follows: first reorder the free indices and contracted indices on each of A and B such that they appear sequentially (which can be achieved in MATLAB using the 'permute' function) and then group the free-indices and the contracted indices each into a single index (which can be achieved in MATLAB using the 'reshape' function).After these steps the contraction is evaluated using a single matrix-matrix multiplication, although the final product may also need to be reshaped back into a tensor.Recasting as a matrix multiplication does not reduce the formal computational cost from Eq. 4.However, modern computers, leveraging highly optimized BLAS routines, typically perform matrix multiplications significantly more efficiently than the equivalent 'FOR' loop summations.Thus, especially in the limit of tensors with large total dimension, recasting as a matrix multiplication is most often the preferred approach to evaluate pairwise tensor contractions, even though this requires some additional computational overhead from the necessity of rearranging tensor elements in memory when using 'permute'.Note that the 'tensordot' function in the Numpy module for Python conveniently evaluates a pairwise tensor contraction using this matrix multiplication approach.

B. Contraction sequence
It is straight-forward to establish that, when breaking a network contraction into a sequence of binary contractions, the choice of sequence can affect the total computational cost.As an example, we consider the product of two matrices A, B with vector C, where all indices are assumed to be dimension χ, see also Fig. 3.If we evaluate this expression by first performing the matrix-matrix multiplication, i.e. as F = (A×B)×C, then the leading order computational cost scales as O(χ 3 ) by Eq. 4. Alternatively, if we evaluate the expression by first performing the matrix-vector multiplication, i.e. as F = A × (B × C), then the leading order computational cost scales as O(χ 2 ).Thus it is evident that the sequence of binary contractions needs to be properly considered in order to minimize the overall computational cost.So how does one find the optimal contraction sequence for some tensor network of interest?For the networks that arise in common algorithms (such as DMRG, MERA, PEPS, TRG and TNR) it is relatively easy, with some practice, to find the optimal sequence through manual inspection or trial-and-error.This follows as most networks one needs to evaluate contain fewer than 10 tensors and the tensor index dimensions take a only single or a few distinct values within a network, which limits the number of viable contraction sequences that need be considered.More generally, determination of optimal contraction sequences is known to be an NP-hard problem 69 , such that it is very unlikely that an algorithm which scales polynomially with the number of tensors in the network will ever be found to exist.However, numerical methods based on exhaustive searches and/or heuristics can typically find optimal sequences for networks with fewer than 20 tensors in a reasonable amount of time [69][70][71][72] , and larger networks are seldom encountered in practice.
Note that many tensor network optimisation algorithms based on an iterative sweep, where the same network diagrams are contracted each iteration (although perhaps containing different tensors and with different bond dimensions).The usual approach in this setting is to determine the optimal sequences once, before beginning the iterative sweeps, using the initial bond dimensions and then cache the sequences for re-use in later iterations.The contraction sequences are then only recomputed the if the bond dimensions stray too far from the initial values.

C. Network contraction routines
Although certainly feasible, manually writing the code for each tensor network contraction as a sequence of pairwise contractions is not recommended.Not only is substantial programming effort required, but this also results in code which is error-prone and difficult to check.There is also a more fundamental problem: contracting a network by manually writing a sequence of binary contractions requires specifying a particular contraction sequence at the time of coding.However, in many cases the index dimensions within networks are variable, and the optimal sequence can change depending on the relative sizes of dimensions.For instance, one may have a network which contains indices of dimensions χ 1 and χ 2 , where the optimal contraction sequence changes dependant of whether χ 1 is larger or smaller than χ 2 .In order to have a program which works efficiently in both regimes, one would have to write code separately for both contraction sequences.
Given the considerations above, the use of an automated contraction routine, such as the 'ncon' (Network-CONtractor) function 61,73 or something similar from an existing tensor network library [41][42][43][44][45][46][47][48] , is highly recommended.Automated contraction routines can evaluate any network contraction in a single call by appropriately generating and evaluating a sequence of binary contractions, hence greatly reducing both the programming effort required and the risk of programming errors occurring.Most contraction routines, such as 'ncon', also remove the need to fix a contraction sequence at the time of writing the code, as the sequence can be specified as an input variable to the routine and thus can be changed without the need to rewrite any code.This can also allow the contraction sequence to be adjusted dynamically at run-time to ensure that the sequence is optimal given the specific index dimensions in use.

D. Summary: contractions
In evaluating a network of multiple tensors, it is always more efficient to break the contraction into a sequence of pairwise tensor contractions, each of which should (usually) be recast into a matrix-matrix multiplication in order to achieve optimal computational performance.The total cost of evaluating a network can depend on the particular sequence of pairwise contractions chosen.While there is no known method for determining an optimal contraction sequence that is efficiently scalable in the size of the network, manual inference or brute-force numeric searches are usually viable for the relatively small networks encountered in common tensor network algorithms.When coding a tensor network program it is useful to utilise an automated network contraction routine which can evaluate a tensor network in a single call by properly chaining together a sequence of pairwise contractions.This not only reduces the programming effort required, but also grants a program more flexibility in allowing a contraction sequence to be easily changed.

IV. MATRIX FACTORIZATIONS
Another key operation common in tensor network algorithms, complimentary to the tensor contractions considered previously, are factorizations.In this section we will discuss some of the various means by which a higherorder tensor can be split into a product of fewer-order tensors.In particular, the means that we consider involve applying standard matrix decompositions 74,75 , to tensor unfoldings, such that this section may serve as a review of the linear algebra necessary before consideration of more sophisticated network decompositions.Specifically we recount the spectral decomposition, QR decomposition and singular value decomposition and outline their usefulness in the context of tensor networks, particular in achieving optimal low-rank tensor approximations.Before discussing decompositions, we define some special types of tensor.

A. Special tensor types
A d-by-d matrix U is said to be unitary if it has orthonormal rows and columns, which implies that it annihilates to the identity when multiplied with its conjugate, where I is the d-by-d identity matrix.We define a tensor (whose order is greater than 2) as unitary with respect to a particular bi-partition of indices if the tensor can be reshaped into a unitary matrix according to this partition.Similarly an d 1 -by-d 2 matrix W , with d 1 > d 2 is said to be an isometry if with I the d 2 -by-d 2 identity matrix.Likewise we say that a tensor (order greater than 2) is isometric with respect to a particular bi-partition of indices if the tensor can be reshaped into a isometric matrix.Notice that, rather than equalling identity, the reverse order product does now evaluate to a projector P , where projectors are defined as Hermitian matrices that square to themselves, B. Useful matrix decompositions: A commonly used matrix decomposition is the spectral decomposition (or eigen-decomposition).In the context of tensor network codes it is most often used for Hermitian, positive semi-definite matrices, such as for the density matrices used to describe quantum states.If H is a d × d Hermitian matrix, or tensor that can be reshaped into such, then the spectral decomposition yields where U is d × d unitary matrix and D is diagonal matrix of eigenvalues, see also Fig. 4(a).The numerical cost of performing the decomposition scales as O(d 3 ).In the context of tensor network algorithms the spectral decomposition is often applied to approximate a Hermitian tensor with one of smaller rank, as will be discussed in Sect.IV D. Another useful decomposition is the QR decomposition.If A be an arbitrary d 1 × d 2 matrix with d 1 > d 2 , or tensor that can be reshaped into such, then the QR decomposition gives see also Fig. 4 ), as opposed to cost O(d 1 2 d 2 ) for the full decomposition.The QR decomposition is one of the most computationally efficient ways to obtain an orthonormal basis for the column space of a matrix, thus a common application is in orthogonalizing tensors within a network (i.e.transforming them into isometries), which will be discussed further in Sect.V C 1.
The final decomposition that we consider is the singular value decomposition (SVD), which is also widely used in many areas of mathematics, statistics, physics and engineering.The SVD allows an arbitrary d 1 × d 2 matrix A, where we assume for simplicity that d 1 ≥ d 2 , to be decomposed as where ), identical to that of the economical QR decomposition.The rank of a tensor (across a specified bi-partition) is defined as the number of non-zero singular values that appear in the SVD.A common application of the SVD is in finding an approximation to a tensor by another of smaller rank, which will be discussed further in Sect.IV D.
Notice that for any matrix A the spectral decompositions of AA † and A † A are related to the SVD of A; more precisely, the eigenvectors of AA † and A † A are equivalent to the singular vectors in U and V respectively of the SVD in Eq. 12. Furthermore the (non-zero) eigenvalues in AA † or A † A are the squares of the singular values in S. It can also be seen that, for a Hermitian positive semi-definite matrix H, the spectral decomposition is equivalent to an SVD.

C. Tensor Norms
The primary use for matrix decompositions, such as the SVD, in the context of tensor networks is in accurately approximating a higher-order tensor as a product lower-order tensors.However, before discussing tensor approximations, it is necessary to define the tensor norm in use.A tensor norm that is particularly convenient is the Frobenius norm (or Hilbert-Schmidt norm).Given a tensor A ijk... the Frobenius norm for A, denoted as A , is defined as the square-root of the sum of the magnitude of each element squared, This can be equivalently expressed as the tensor trace of A multiplied by its conjugate, where the tensor trace, Ttr(A † , A), represents the contraction of tensor A with its conjugate over all matching indices, see Fig. 5.It also follows that Frobenius norm is related to the singular values s k of A across any chosen bi-partition, Notice that Eq. 14 implies that the difference ε = A − B between two tensors A and B of equal dimension can equivalently be expressed as

D. Optimal Low-rank Approximations
Given some matrix A, or higher-order tensor that viewed as a matrix across a chosen bi-partition of its indices, we now focus on the problem of finding the tensor B that best approximates A according to the Frobenius norm (i.e. that which minimizes the difference in Eq. 16), assuming B has a fixed rank r.Let us assume, without loss of generality, that tensor A is equivalent to a d 1 × d 2 matrix (with d 1 ≥ d 2 ) under a specified bi-partition of its indices, and that A has singular value decomposition, where the singular values are assumed to be in descending order, s k ≥ s k+1 .Then the optimal rank r tensor B that approximates A is known from the Eckart-Young-Mirsky theorem 76 , which states that B is given by truncating to retain only the r largest singular values and their corresponding singular vectors, It follows that the error of approximation ε = A − B , as measured in the Frobenius norm, is related to the discarded singular values as If the spectrum of singular values is sharply decaying then the error is well approximated by the largest of the discarded singular values, ε ≈ s (r+1) .Notice that, in the case that the tensor A under consideration is Hermitian and positive definite across the chosen bi-partition, that the spectral decomposition could instead be used in Eq. 17.The low-rank approximation obtained by truncating the smallest eigenvalues would still be guaranteed optimal, as the spectral decomposition is equivalent to the SVD in this case.

E. Summary: decompositions
In this section we have described how special types of matrices, such as unitary matrices and projections, can be generalized to the case of tensors (which can always be viewed as a matrix across a chosen bi-partition of their indices).Similarly we have shown how several concepts from matrices, such as the trace and the norm, are also generalized to tensors.Finally, we have described how the optimal low-rank approximation to a tensor can be obtained via the SVD.

V. GAUGE FREEDOM
All tensor networks possess gauge degrees of freedom; exploiting the gauge freedom allows one to change the tensors within a network whilst leaving the final product of the network unchanged.In this section we describe methods for manipulating the gauge degrees of freedom and discuss the utility of fixing the gauge in a specific manner.

A. Tree tensor networks
In this manuscript we shall restrict our considerations of gauge manipulations to focus only on tensors networks that do not possess closed loops (i.e.networks that correspond to acyclic graphs), which are commonly referred to as tree tensor networks (TTN) 77,78 .Fig. 7(a) presents an example of a tree tensor network.If we select a single tensor to act as the center (or root node) then we can understand the tree tensor network as being composed of a set of distinct branches extending from this chosen tensor.For instance, Fig. 7(b) depicts the three branches (excluding the single trivial branch) extending from the 4 th -order tensor A. It is important to note that connections between the different branches are not possible in networks without closed loops; this restriction is required for the re-gauging methods considered in this manuscript.However these methods can (mostly) be generalized to the case of networks containing closed loops by using a more sophisticated formalism as shown in Ref. 79.

B. Gauge transformations
Consider a tensor network of multiple tensors that, under contraction of all internal indices, evaluates to some product tensor H.We now pose the following question: is there a different choice of tensors with the same network geometry that also evaluates to H? Clearly the answer to this question is yes!As shown below in Fig. 8(a), on any of the internal indices of the network one can introduce a resolution of the identity (i.e. a pair of matrices X and X −1 ) which, by construction, does not change the final product that the network evaluates to.However, absorbing one of these matrices into each adjoining tensor changes the individual tensors while leaving the product of the network unchanged.It follows that there are infinitely many choices of tensors such that the network product evaluates to some fixed output tensor, since the gauge change matrix X can be any invertible matrix.This ability to introduce an arbitrary resolution of the identity on an internal index, while leaving the product of the network unchanged, is referred to as the gauge freedom of the network.
While in some respects the gauge freedom could be considered bothersome, as it implies tensor decompositions are never unique, it can also be exploited to simplify many types of operations on tensor networks.Indeed, most tensor network algorithms require fixing the gauge in a prescribed manner in order to function correctly.In the following sections we discuss ways to fix the gauge degree of freedom as to create an orthogonality center and the benefits of doing so.

C. Orthogonality centers
A given tensor A within a network is said to be an orthogonality center if every branch connected to tensor A annihilates to the identity when contracted with its conjugate as shown in Fig. 9. Equivalently, each branch must (collectively) form an isometry across the bi-partition between its open indices and the index connected to tensor A. By properly manipulating the gauge degrees of freedom, it is possible to turn any tensor with a tree tensor network into an orthogonality center 80 .We now discuss two different methods for achieving this: a 'pulling through' approach, which was a key ingredient in the original formulation of DMRG [49][50][51] , and a 'direct orthogonalization' approach, which was an important part of the TEBD algorithm 52,53 .

Creating an orthogonality center via 'pulling through'
Here we describe a method for setting a tensor A within a network as an orthogonality center through iterative use of the QR decomposition.The idea behind his method is very simple: if each individual tensor within a branch is transformed into a (properly oriented) isometry, then the entire branch collectively becomes an isometry and thus the tensor at center of the branches becomes an orthogonality center.Let us begin by orienting each index of the network by drawing an arrow pointing towards the desired center A. Then, starting at the tip of each branch, we should perform a QR decomposition on each tensor based on a bi-partition between its incoming and outgoing indices.The R part of the decomposition should then be absorbed into the next tensor in the branch (i.e.closer to the root tensor A), and the process repeated as depicted in Fig. 10(a-c).At the final step an R part of the QR decomposition from each branch is absorbed into the central tensor A and the process is complete, see also Fig. 10(d).
Note that the SVD could be used as an alternative to the QR decomposition: instead of absorbing the R part of the QR decomposition into the next tensor in the branch one could absorb the product of the S and V part of the SVD from Eq. 12.However, in practice, the QR decomposition is most often preferable as it computationally cheaper than the SVD.

Creating an orthogonality center via 'direct orthogonalization'
Here we describe a method for setting a tensor A within a network as an orthogonality center based on use of a single decomposition for each branch, as depicted in Fig. 11.(i) We begin by computing the positive-definite matrix ρ for each branch (with respect to the center tensor A) by contracting the branch with its Hermitian conjugates.(ii) The principle square root X of each of the matrices ρ is then computed, i.e. such that ρ = XX † , which can be computed using the spectral decomposition if necessary.(iii) Finally, a change of gauge is made on each of the indices of tensor A using the appropriate X matrix and its corresponding inverse X −1 , with the X part absorbed in tensor A and the X −1 absorbed in the leading tensor of the branch such that the branch matrix transforms as ρ → ρ = X −1 ρ(X −1 ) † .It follows that the tensor A is now an orthogonality center as each branch matrix ρ of the transformed network evaluates as the identity, Note that, for simplicity, we have assumed that the branch matrices ρ do not have zero eigenvalues, such that their inverses exist.Otherwise, if zero eigenvalues are present, the current method is not valid unless the index dimensions are first reduced by truncating any zero eigenvalues.

Comparison of methods for creating orthogonality centers
Each of the two methods discussed to create an orthogonality center have their own advantages and disadvantages, such that the preferred method may depend on the specific application under consideration.In practice, the 'direct orthogonalization' is typically computationally cheaper and easier to execute, since this method only requires changing the gauge on the indices connected to the center tensor.In contrast the 'pulling through' method involves changing the gauge on all indices of the network.Additionally, the 'direct orthogonalization' approach can easily be employed in networks of infinite extent, such as infinite MPS 52,53 , if the matrix ρ associated to a branch of infinite extent can be computed using by solving for a dominant eigenvector.While 'pulling through' can also potentially be employed for networks of infinite extent, i.e. through successive decompositions until sufficient convergence is achieved, this is likely to be more computationally expensive.However the 'pulling through' approach can be advantageous if the branch matrices ρ are ill-conditioned as the errors due to floatingpoint arithmetic are lesser.This follows since the 'direct orthogonalization' requires one to square the tensors in each branch.The 'pulling-through' approach also results in transforming every tensor in the network (with the exception of the center tensor) into an isometry, which may be desirable in certain applications.

D. Decompositions of tensors within networks
In Sect.IV it was described how the SVD could be applied to find the optimal low-rank approximation to a tensor in terms of minimizing the Frobenius norm between the original tensor and the approximation.In the present section we extend this concept further and detail how, by first creating an orthogonality center, a tensor within a network can be optimally decomposed as to minimize the global error coming from consideration of the entire network.
Let us consider a tree tensor network of tensors {A, B, C, D, E, F, G} that evaluates to a tensor H, as depicted in Fig. 12(a).We now replace a single tensor A from this network by a new tensor A such that the network now evaluates to a tensor H as depicted in Fig. 12(b).Our goal is to address the following question: how can we find the optimal low-rank approximation A to tensor A such that the error from the full network, H − H , is minimised?Notice that if we follow the method from Sect.IV and simple truncate the smallest singular values of A, then this will only ensure that the local error, A − A , is minimised.The key to resolving this issue is through creation of an orthogonality center, which can reduce the global norm of a network to the norm of a single tensor.Specifically if tensor A is an orthogonality center of a network that evaluates to a final tensor H then it follow from the definition of an orthogonality center that H = A , as depicted in Fig. 13(a).Thus it also can be seen that under replacement of the center tensor A with a new tensor A , such that the network now evaluates to a new tensor H , that the difference between the tensors A − A is precisely equal to the global difference between the networks H − H .This follows as the overlap of H and H equals the overlap of A and A , as depicted in Fig. 13(b).In other words, by appropriately manipulating the gauge degrees of freedom in a network, the global difference resulting from changing a single tensor in a network can become equivalent to the local difference between the single tensors.The solution to the problem of finding the optimal low-rank approximation A to a tensor A within a network thus becomes clear; we should first adjust the gauge such that A becomes an orthogonality center, after which we can follow the method from Sect.IV and create the optimal global approximation (i.e. that which minimizes the global error) by truncating the smallest singular values of A. The importance of this result in the context tensor network algorithms cannot be overstated; this understanding for how to optimally truncate a single tensor within a tensor network, see also Ref. 81, is a key aspect of the DMRG algorithm [49][50][51] , the TEBD algorithm 52,53 and many other tensor network algorithms.

E. Summary: gauge freedom
In the preceding section we discussed manipulations of the gauge degrees of freedom in a tensor network and described two methods that can be used to create an orthogonality center.The proper use of an orthogonality center was then demonstrated to allow one to decompose a tensor within a network in such a way as to minimize the global error.Note that while the results in this section were described only for tree tensor networks (i.e.networks based on acyclic graphs), they can be generalized to arbitrary networks by using more sophisticated methodology 79 .

VI. CONCLUSIONS
Network contractions and decompositions are the twin pillars of all tensor network algorithms.In this manuscript we have recounted the key theoretical considerations required for performing these operations effi-ciently and also discussed aspects of their implementation in numeric codes.We expect that a proper understanding of these results could facilitate an individuals effort to implement many common tensor network algorithms, such as DMRG, TEBD, TRG, PEPS and MERA, and also further aid researchers in the design and development of new tensor network algorithms.
However, there are still a wide variety of additional general ideas and methods, not covered in this manuscript, that are necessary for the implementation of more advanced tensor network algorithms.These include (i) strategies for performing variational optimization, (ii) methods for dealing with decompositions in networks containing closed loops, (iii) the use of approximations in tensor network contractions.We shall address several of these topics in a follow-up work.
FIG. 1. (a-c) Diagrammatic representations of a vector Ai (or order-1 tensor), a matrix Bij (or order-2 tensor) and an order-3 tensor C ijk .(d) A contraction, or summation over an index, between two tensors is represented by a line joining two tensors.

FIG. 2 .
FIG. 2. (a) The internal indices (l, m, n) of the network {A, B, C} are contracted to give tensor F .(b) The network is contracted via a sequence of two pairwise tensor contractions, the first of which results in the intermediate tensor D.

FIG. 3 .
FIG. 3. (a) A product of three tensors {A, B, C} is contracted to a tensor F , where all indices are assumed to be of dimension χ. (b-c) The total computational cost of contracting the network depends on the sequence of pairwise contractions; the cost from following the sequence in (b) scales as (χ 3 + χ 2 ) as compared to the cost from (c) which scales as (2χ 2 ).

FIG. 4 .
FIG. 4. Depiction of some common matrix decompositions.All indices are assumed to be of dimension d unless otherwise indicated.(a-i) The spectral decomposition is applied to the order-4 Hermitian tensor H across the partition indicated by the dashed line, yielding a diagonal matrix of eigenvalues D and a unitary U .(a-ii) The unitary tensor U annihilates to identity with its conjugate, as per Eq.6.(b-i) The QR decomposition is applied to the order-3 tensor A across the partition indicated, yielding an isometry Q and an upper triangular matrix R. (b-ii) The isometry Q annihilates to identity with its conjugate as per Eq.7, while the R matrix is upper triangular.(c-i) The singular value decomposition (SVD) is applied to the order-3 tensor A across the partition indicated, yielding an isometry U , a diagonal matrix of singular values S, and a unitary V .(c-ii) Depiction of the constraints satisfied by the isometry U and unitary V .
FIG.5.For any tensor A the tensor trace Ttr of A with its conjugate A † (drawn with opposite vertical orientation) is obtained by contracting over all matching indices.The Frobenius norm can be defined as the root of this tensor trace, see Eq. 14.

FIG. 6 .
FIG. 6.(a) The singular value decomposition is taken on tensor A across a bi-partition between its top two and bottom three indices, and is assumed to produce d non-zero singular values.(b) Tensor B is now defined by truncating the matrix of singular values S → S to retain only r < d of the largest singular values, while similarly truncating the matrices of singular vectors, U → Ũ and V → Ṽ , to retain only the corresponding singular vectors.By the Eckart-Young-Mirsky theorem76 it is known that B is the optimal rank-r approximation to A (across the chosen bi-partition of tensor indices).

FIG. 7 .
FIG. 7. (a) An example of a tree tensor network (TTN), here composed of 7 tensors.(b) The three (non-trivial) branches of the tree with respect to choosing A as the root tensor.

FIG. 8 .
FIG. 8. (a) Given a network of three tensors {A, B, C}, one can introduce gauge change matrices X and Y (together with their inverses) on the internal indices of the network while leaving the final product D of the network unchanged.(b) Definitions of the new tensors { Ã, B, C} after the change of gauge.

FIG. 10 .
FIG. 10.A depiction of how the tensor A can be made into an orthogonality center of the network from Fig. 7 via the "pulling-though" approach.(a-b) Tensors D and E, which reside at the tips of a branch, are decomposed via the QR decomposition.(c) The R components of the previous QR decompositions are absorbed into the B tensor higher on the branch, which is then itself decomposed via the QR decomposition.(d) Following this procedure, all tensors in the network are orthogonalized (with respect to their incoming versus outgoing indices) such that A becomes an orthogonality center of the network.

FIG. 11 .
FIG. 11.A depiction of how the tensor A from the network of Fig. 7 can be made into an orthogonality center via the "direct orthogonalization" approach.(a) A change of gauge is made on every (non-trivial) branch connected to A, such that A becomes an orthogonality center.(b-d) The gauge change matrices {X1, X2, X3} are obtained by contracting each branch with its Hermitian conjugate and then taking the principle root.

FIG. 12 .
FIG. 12. (a) A network of 7 tensors {A, B, C, D, E, F, G} contracts to give a tensor H.(b) After replacing a single tensor A → A the network contracts to a different tensor H . (c) The tensor A is decomposed into a pair of tensors AL and AR, leaving the final tensor H unchanged.

FIG. 13
FIG. 13.(a) In the network from Fig. 12(a), if the tensor A is an orthogonality center then it follows that the norm of the network H is equal to the norm of the center tensor A .(b) Similarly it follows that, in changing only the center tensor A → A , the global overlap between the networks H and H is equal to the local overlap between the center tensors A and A .