Linear to multi-linear algebra and systems using tensors

In past few decades, tensor algebra also known as multi-linear algebra has been developed and customized as a tool to be used for various engineering applications. In particular, with the help of a special form of tensor contracted product, known as the Einstein Product and its properties, many of the known concepts from Linear Algebra could be extended to a multi-linear setting. This enables to define the notions of multi-linear system theory where the input, output signals and the system are multi-domain in nature. This paper provides an overview of tensor algebra tools which can be seen as an extension of linear algebra, at the same time highlighting the difference and advantages that the multi-linear setting brings forth. In particular, the notion of tensor inversion, tensor singular value and tensor Eigenvalue decomposition using the Einstein product is explained. In addition, this paper also introduces the notion of contracted convolution in both discrete and continuous multi-linear system tensors. Tensor Networks representation of various tensor operations is also presented. Also, application of tensor tools in developing transceiver schemes for multi-domain communication systems, with an example of MIMO CDMA systems, is presented. Thus this paper acts as an entry point tutorial for graduate students whose research involves multi-domain or multi-modal signals and systems.


I. INTRODUCTION
Tensors are multi-way arrays that are indexed by multiple indices and the number of indices is called the order of the tensor [1].Subsequently, matrices and vectors can be seen as order two and order one tensors respectively.Higher-order tensors are inherently capable of mathematically representing processes and systems with dependency on more than two variables.Hence tensors are widely employed for several applications in many engineering and science disciplines.Tensors were initially introduced for applications in Physics during the early nineteenth century [2].Later with the work of Tucker [3], tensors were used in Psychometrics in the 1960s for extending twoway data analysis to higher-order datasets, and further in Chemometrics in the 1980s [4], [5].
When appropriately employed, tensors can help in developing models that capture interactions between various parameters of multi-domain systems.Such tensor-based system representation can enhance the understanding of the mutual effects of various system domains.
Given the wide scope of applications that tensors support, there have been many recent publications summarizing the essential topics in tensor algebra.One such primary reference is [1] where the fundamental tensor decompositions such as Tucker, PARAFAC and their variants are discussed in great detail with applications.Another useful reference is [2] which presents tensors as a mapping from one linear space to another, along with a discussion on tensor ranks.A more signal processing oriented outlook on tensors is considered in [12], including applications such as Big Data storage and Compressed sensing.A more recent and exhaustive tutorial style paper is [11] which presents a detailed overview of up-to-date tensor decomposition algorithms, computations and applications in machine learning.Similarly, [19] presents such an overview with applications in multiple-input multiple-output (MIMO) wireless communications.Also, [20] provides a detailed review of many tensor decompositions with a focus on the needs of Data Analytics community.However, all these papers do not consider in particular the notions of tensor contracted product and contracted convolution, which are the crux of this paper.With the help of a specific form of contracted product, known as the Einstein product of tensors, various tensor decompositions and properties can be established which may be viewed as an intuitive and meaningful extension of the corresponding linear algebra concepts.
The most popular and widely used decompositions in the case of matrices are the singular value and eigenvalue decompositions.In order to consider their extensions to higher-order tensors, it is important to note there is no single generalization that preserves all the properties of the matrix case [21], [22].The most commonly used generalization of the matrix singular value decomposition is known as a Higher-order Singular Value Decomposition (HOSVD) which is basically the same as Tucker decomposition for higher-order tensors [23].Similarly, several definitions exist in the literature for tensor eigenvalues as a generalization of the matrix eigenvalues [24].More recently, in order to solve a set of multi-linear equations using tensor inversion, a specific notion of tensor singular value decomposition and eigenvalue decomposition was introduced in [25], which generalizes the matrix SVD and EVD to tensors through a fixed transformation of the tensors into matrices.The authors in [25] establish the equivalence between the Einstein product of tensors and the matrix product of the transformed tensors, thereby proving that a tensor group endowed with the Einstein product is structurally similar or isomorphic to a general linear group of matrices.The notion of equivalence between the Einstein product of tensors and the corresponding matrix product of the transformed tensors is very crucial and relevant as it helps in developing many tools and concepts from matrix theory such as matrix inverse, ranks, and determinants for tensors.Hence as a follow-up to [25], several other works explored different notions of linear algebra which can be extended to multi-linear algebra using the Einstein product [26], [27], [28], [29], [30], [31], [32].
The purpose of this paper is twofold.First, we intend to present an overview of tensor algebra concepts developed in the past decade using the Einstein product.Since there is a natural way of extending the linear algebra concepts to tensors, the idea here is to present a summary of the most commonly used and relevant concepts which can equip the reader with tools to define and prove other properties more specific to their intended applications.Secondly, this paper introduces the notion of contracted convolutions for both discrete and continuous system tensors.The theory of linear time invariant (LTI) systems has been an indispensable tool in various engineering applications such as Communication Systems, and Controls.Now with the evolution of these subjects to multi-domain communication systems and multi-linear systems theory, there is a need to better understand the classical topics in a multi-domain setting.This paper intends to provide such tools through a tutorial style presentation of the subject matter leading to a mechanism to develop more tools needed for research and applications in any multi-domain/multi-dimensional/multi-modal/multi-linear setting.
The organization of this paper is as follows: in section II, we present basic tensor definitions and operations, including the concept of signal tensors and contracted convolutions.In section III, we present the Tensor Networks representation of various tensor operations.Section IV presents some tensor decompositions based on the Einstein product.Section V defines the notions of multilinear system tensors and discusses their stability in both time and frequency domains.It also includes a detailed discussion on the application of tensors to multi-linear system representation with an example of MIMO CDMA system.The paper is concluded in section VI.

II. FUNDAMENTALS OF TENSORS AND NOTATION
A tensor is a multi-way array whose elements are indexed by three or more indices.Each index may correspond to a different domain, dimension, or mode of the quantity being represented by the array.The order of the tensor is the number of such indices or domains or dimensions or modes.A vector is often referred to as a tensor of order-1, a matrix as a tensor of order-2 and tensors of order greater than 2 are known as higher-order tensors.

A. Notations
In this paper we use lowercase underline fonts to represent vectors, e.g.x, uppercase fonts to represent matrices, e.g.X and uppercase calligraphic fonts to represent tensors, e.g.X.The individual elements of a tensor are denoted by the indices in subscript, e.g. the (i 1 , i 2 , i 3 )th element of a third-order tensor X is denoted by X i 1 ,i 2 ,i 3 .A colon in subscript for a mode corresponds to every element of that mode corresponding to fixed other modes.For instance, X :,i 2 ,i 3 denotes every element of tensor X corresponding to i 2 th second and i 3 th third mode.The nth element in a sequence is denoted by a superscript in parentheses, e.g.A (n) denotes the n th tensor in a sequence of tensors.We use C and C k to denote a set of complex numbers and a set of complex numbers which are a function of k, respectively.

B. Definitions and tensor operations
Definition 1. Tensor Linear Space : The set of all tensors of size I 1 × . . .× I K over C forms a linear space, denoted as T I 1 ,...,I K (C).For A, B ∈ T I 1 ,...,I K (C) and α ∈ C, the sum Definition 2. Fiber : Fiber is defined by fixing every index in a tensor but one.A matrix column is a mode-1 fiber and matrix row is a mode-2 fiber.A third order tensor has similarly column (mode-1), row (mode-2) and tube (mode-3) fibers [1].Definition 3. Slices : Slices are two dimensional sections of a tensors defined by fixing all but two indices.
Subsequently, the 2−norm or the Frobenius norm of X is defined as the square root of the sum of the square of absolute values of all its elements : ......
Also, the 1−norm and ∞−norm of a tensor are defined as : Definition 5. Kronecker Product of Matrices : The Kronecker product of two matrices A of size I × J and B of size K × L, denoted by A ⊗ B, is a matrix of size (IK) × (JL) and is defined as Definition 6. Matricization Transformation : Let us denote the linear space of P × Q matrices over C as M P,Q (C).For an order K = N +M tensor A ∈ C I 1 ×...×I N ×J 1 ×...×J M , the transformation A is defined component-wise as [25] : This transformation is referred to as matricization, or matrix unfolding by partitioning the indices into two disjoint subsets [33].The vectorization operation as defined in [34] can be seen as a specific case of (6) by using The bar in subscript of f I 1 ,...,I N |J 1 ,...,J M represents the partitioning after N modes of an N +M order tensor where first N modes correspond to the rows of the representing matrix, and the last M modes correspond to the columns of the representing matrix.This mapping is bijective [28], and it preserves addition and scalar multiplication operations i.e., for A, B ∈ T I 1 ,...,I N ,J 1 ,...,J M (C) and any scalar α ∈ C, we have f I 1 ,...,I N |J 1 ,...,J M (A+ Hence the linear spaces T I 1 ,...,I N ,J 1 ,...,J M (C) and and the transformation f I 1 ,...,I N |J 1 ,...,J M is an isomorphism between the linear spaces.For a matrix, the transformation (6) does no change when N = M = 1, creates a column vector when N = 2, M = 0 and a row vector when N = 0, M = 2.
1) Tensor Products: Tensors have multiple modes, hence a product between two tensors can be defined in various ways.In this section, we present definitions of the most commonly used tensor products.
It is important to note that the modes to be contracted need not be consecutive, however the size of the corresponding dimensions must be equal.where • represents usual matrix multiplication.Several other tensor products can be defined as specific cases of contracted product.One such commonly used tensor product is the Einstein product where the modes to be contracted are at a fixed location as defined next.
Definition 8. Einstein product : The Einstein product between tensors A ∈ C I 1 ×...×I P ×K 1 ...×K N and B ∈ C K 1 ×...×K N ×J 1 ...×J M is defined as a contraction between their N common modes, denoted by * N , as [25] : In Einstein product, contraction is over N consecutive modes and can also be written using the more general notation with contracted modes in subscript (say for sixth order tensors H, S ∈ C I×J×K×I×J×K ) : Note that one can define Einstein product for several specific mode orderings.For instance, in [17] Einstein product is defined as contraction over N alternate modes and not consecutive modes.However, that would not change the concepts presented here on so far as we remain consistent with the definition.
Definition 9. Inner Product : The inner product of two tensors X ∈ C I 1 ×I 2 ×...×I N and Y ∈ C I 1 ×I 2 ×...×I N of the same order N with all the dimensions of same length is given by : ⟨X, Y⟩ = . . .
It can also be seen as the Einstein product of tensors where contraction is along all the dimen-sions, i.e. ⟨X, Y⟩ = X * N Y = Y * N X.
Definition 10.Outer Product : Consider two tensors X ∈ C I 1 ×I 2 ×...×I N and Y ∈ C J 1 ×J 2 ×...×J M of order N and M respectively.The outer product between X and Y denoted by X • Y is given by a tensor of size I 1 × I 2 × . . .× I N × J 1 × J 2 × . . .× J M with individual elements as : It can also be seen as the special case of Einstein product of tensors in equation ( 9) with N = 0.
Definition 11. n-mode product: The n-mode product of a tensor A ∈ C I 1 ×I 2 ×...×I N with a matrix U ∈ C J×In is denoted by A× n U and is defined as [23] : Each mode-n fiber is multiplied by the matrix U.The result of n-mode product is a tensor of same order but with a new nth mode of size J.The resulting tensor is of size Definition 12. Square tensors : A tensor A ∈ C I 1 ×...×I N ×J 1 ×...×J M is called a square tensor if N = M and I k = J k for k = 1, . . ., N [28].
For square tensors A, B of size I × J × I × J, it was shown in [25] that f I,J|I,J (A * 2 B) = f I,J|I,J (A)•f I,J|I,J (B) where • refers to the usual matrix multiplication.It was further generalized to square or non-square tensors of any order and size in [32] as the following Lemma: Lemma 1.For tensors A ∈ C I 1 ×...×I N ×J 1 ×...×J M and B ∈ C J 1 ×...×J M ×K 1 ×...×K P under the transformation from (6), the following holds: Definition 13.Pseudo-diagonal Tensors : Any tensor D ∈ C I 1 ×...×I N ×J 1 ×...×J M of order N +M is called pseudo-diagonal if its transformation D = f I 1 ,...,I N |J 1 ,...,J M (D) yields a diagonal matrix such that D i,j is non-zero only when i = j.
Since the transformation ( 6) is bijective, we can say that a diagonal matrix D ∈ C I×J under inverse transformation f −1 I 1 ,...,I N |J 1 ,...,J M (D) will yield a pseudo-diagonal tensor where all its entries D i 1 ,...,i N ,j 1 ,...,j N are zero except when i 1 = j 1 , i 2 = j 2 , . . ., i N = j N .In [25], [27] such a tensor is termed as diagonal tensor, and in [17] it is termed as U-diagonal, but we tend to call it pseudodiagonal for our purpose of discussion, so as to make a clear distinction from the diagonal tensor definition more widely found in literature which states that a diagonal tensor is one where entries . This can be seen as a more strict diagonal condition as non-zero elements exist only when all the modes have same index whereas in a pseudo-diagonal tensor, say of order 2N , elements are non-zero when every ith and (i + N )th mode have same index for i = 1, . . ., N .An illustration of order 4 tensor showing the difference between diagonal and pseudo-diagonal structures can be found in [16].For a matrix, which has just two modes, the diagonal and pseudo-diagonal structures are the same.Note that pseudodiagonality is defined with respect to partition after N modes.For instance, if we refer to a third order tensor as pseudo-diagonal, then it is important to specify whether it is pseudo-diagonal with respect to partition after the first mode or the second mode.So to avoid overload of notation in this paper whenever we write a tensor explicitly as N + M or 2N order tensor and call it pseudo-diagonal, then pseudo-diagonality is with respect to partition after N modes.

Definition 14. Pseudo-Triangular Tensor:
where a i 1 ,...,i N ,i ′ 1 ,...,i ′ N are arbitrary scalars.Similarly, the tensor is said to be pseudo-upper trian- An illustration of an upper triangular tensor of size is presented in Figure 1 and its pseudo-upper triangular elements highlighted in gray along with its pseudo-diagonal elements shown in black.A similar illustration of a lower triangular tensor can be found in [15].It can be readily seen that a lower triangular tensor becomes a lower triangular matrix under the tensor to matrix transformation defined in ( 6) and a pseudo-upper triangular tensor becomes an upper triangular matrix.
Definition 15.Identity Tensor : An identity tensor, and in which all non-zero entries are 1, i.e.
2) Transpose, Hermitian and Inverse of a Tensor: The transpose of a matrix is a permutation of its two indices corresponding to rows and columns.Since elements of a higher-order tensor are indexed by multiple indices, there are several permutations of such indices and hence there can be multiple ways to write the transpose or Hermitian of a tensor.Such permutation dependent transpose of a tensor is defined in [35].
Assume the set S N = {1, 2, . . ., N } and σ is a permutation of S N .We denote σ(j) = i j for j = 1, 2, . . ., N where {i 1 , i 2 , . . ., i N } = {1, 2, . . ., N } = S N .Since S N is a finite set with N elements, it has N !different permutations.Hence, discounting the identity permutation there are N !−1 different transposes for a tensor with N dimensions or modes.For a tensor A ∈ C I 1 ×...×I N we define its transpose associated with a certain permutation Similarly, the Hermitian of a tensor A ∈ C I 1 ×...×I N associated with a permutation σ is defined as the conjugate of its transpose and is denoted as For example, a transpose of a third-order tensor X ∈ C I 1 ×I 2 ×I 3 such that its third mode is transposed with the first can be written as X T σ where σ = [3, 2, 1] with components X T σ i 3 ,i 2 ,i 1 = X i 1 ,i 2 ,i 3 .For two tensors A ∈ C I 1 ×...×I N and B ∈ C I 1 ×...×I N we have [35] ⟨A, B⟩ = ⟨A T σ , B T σ ⟩.
Consider a tensor Y ∈ C I 1 ×...×I N ×J 1 ×...×J M with a transposition such that the final M modes are swapped with the first N modes can be represented by a permutation function σ = [(N + 1), . . .
..,j M .Since we will use tensors to define system theory elements with fixed order M output and order N input, the most often encountered case of transpose or Hermitian in this paper would be after N modes of an N + M or 2N tensor, i.e. σ = [(N + 1), . . .(N + M ), 1, . . .N ].Henceforth, in such a case we drop the superscript σ for ease of representation and represent such a transpose by Y T and its conjugate by Y H . Further, a square tensor [28].The inverse of a tensor exists if its transformation f I 1 ,...,I N |I 1 ,...,I N (A) is invertible [25].Several algorithms using the Einstein product such as Higher-order Bi-conjugate Gradient method [25] or Newton's method [36] can be used to find tensor inverse without relying on actually transforming the tensor into a matrix.
Based on the definition of tensor inverse, Hermitian, and the Einstein product, several tensor algebra relations and properties can be derived.Here we present a few properties which are often used.Along similar lines, several more properties can be derived.
Commutativity : Einstein product is not commutative in general.However for the specific case where the product is taken over all the N modes of one of the tensors, say for tensors A ∈ C I 1 ×...×I P ×J 1 ×...×J N and B ∈ C J 1 ×...×J N , the following can be established : Proof.
(e) For square tensors A and B ∈ C I 1 ×...×I N ×I 1 ×...×I N , we have : x is an order N tensor whose components are functions of x.Using a third order function tensor as an example, each component of A(x) is written as A i,j,k (x).If x takes discrete values, we represent the function tensor using square bracket notation as A[x].
A generalization of the function tensor would be the multivariate function tensor A(x 1 , . . ., x p ) ∈ C I 1 ×...×I N x 1 ,...,xp , which is an order N tensor whose components are functions of the continuous variables x 1 , . . ., x p .If the variables take discrete values, we denote the function tensor as Using the same example of a third order tensor, each component can be written as A i,j,k (x 1 , x 2 , . . ., x p ).
A linear system is often expressed as Ax = b where A ∈ C M ×N is a matrix operating upon the vector x ∈ C N to produce another vector b ∈ C M [25].Essentially, the matrix defines a linear operator L : C N → C M between two vector linear spaces C N and C M .A multi-linear system can be thus defined as a linear operator between two tensor linear spaces C I 1 ×...×I N and C J 1 ×...×J M , i.e.ML : C I 1 ×...×I N → C J 1 ×...×J M .Multi-linear systems model several phenomenon in various science and engineering applications.However often in literature, a multilinear system is degenerated into a linear system by mapping the tensor linear space C I 1 ×...×I N into a vector linear space C I 1 •••I N through vectorization.The vectorization process allows one to use tools from linear algebra for convenience but also leads to a representation where distinction between different modes of the system is lost.Thus possible hidden patterns, structures, and correlations cannot be explicitly identified in the vectorized tensor entities.With the help of tensor contracted product, one can develop signals and system representation without having to rely on vectorization, at the same time extending tools from linear to a multi-linear setting intuitively.A multi-linear time invariant discrete system tensor is an order N + M tensor sequence

C. Discrete Time Signal Tensors
convolution defined as: Most often the ordering of the modes while defining such system tensors is fixed, where the system tensor contracts over all the input modes.Hence for a more compact notation, we can define the contracted convolution using the Einstein product as: In scalar signals and systems notations, a convolution between two functions is often represented using an asterisk ( * ).However, to make a distinction with the Einstein product notation which also uses the asterisk symbol, we denote the contracted convolution using the notation • N , i.e.
The complex frequency domain representation of discrete signal tensors can be given using the z-transform of the signal tensors, as discussed next.
Definition 16. z-transform of a Discrete Tensor Sequence: The z-transform of is a tensor of the z-transform of its components defined as The discrete time Fourier transform denoted by X(ω) = F(X[k]) of a tensor sequence can be found by substituting z = e jω in its z-transform as Taking the z-transform of (28) we get which shows that the discrete contracted convolution between two tensors in the time domain as given by ( 28) leads to the Einstein product between the tensors in the z-domain.

D. Continuous Time Signal Tensors
A continuous time signal tensor X(t) ∈ C I 1 ×...×I N t is a function tensor whose components are functions of the continuous time variable t.
A multi-linear time invariant continuous system tensor is an order that couples an order N input continuous tensor signal X(t) ∈ C I 1 ×...×I N t with an order M output tensor signal Y(t) ∈ C J 1 ×...×J M t through a continuous contracted convolution defined as: In cases where the mode sequence is fixed, similar to the discrete case, we can define a more compact notation using the Einstein product as: The frequency domain representation of continuous signal tensors can be given using the Fourier transform of the signal tensor as defined next.
Definition 17. Fourier transform: The Fourier transform of X(t) ∈ C I 1 ×...×I N t denoted by ) is a tensor of the Fourier transform of its components defined as with components Xi 1 ,...,i N (ω) = X i 1 ,...,i N (t)e −iωt dt.
Using similar line of derivation as for (31), it can be shown that ( 33) can be written in frequency domain as Ȳ(ω) = H(ω) * N X(ω).

III. TENSOR NETWORKS
Tensor Network (TN) diagrams are a graphical way of illustrating tensor operations [38].A TN diagram uses a node to represent a tensor and each outgoing edge from a node represents a mode of the tensor.As such, a vector can be represented through a node with a single edge, a matrix through a node with double edges, and an order N tensor through a node with N edges.

A. Illustration of Contraction Products
A contraction between two modes of a tensor is represented in TN by connecting the edges corresponding to the modes which are to be contracted.Hence the number of free edges represent the order of the resulting tensor.As such any contracted product can be illustrated through a TN diagram, and a few examples are shown in Figure 3.In Figure 3(a), the mode-n product of a tensor A ∈ C I 1 ×...×I N with U ∈ C J×In , i.e.A × n U from ( 13) is depicted where the nth edge of A is connected with the second edge of U to represent the contraction of these modes of same where all the edges of both the tensors are connected.Since there is no free edge remaining, the result is a scalar.In Figure 3(c), a fifth order tensor A ∈ C I×J×K×L×M contracts with a fourth order tensor B ∈ C J×P ×L×N along its two common modes as {A, B} {2,4;1,3} .The resulting tensor is an order 5 tensor as there are total 5 free edges in the diagram.Finally, Figure 3(d) shows the Einstein product between tensors from (9) where the common N modes are connected and we have P + M free edges.

B. Illustration of Contracted Convolutions
A contracted convolution is an operation between tensor functions.A function tensor in a TN diagram is represented by a node which is a function of a variable.To represent a function tensor in a TN we use a rectangular node rather than a circular node.For a given value of the time index k, the contracted convolution from (28) can be depicted using a TN diagram as shown Very often, TN diagrams are used to represent tensor operations as it provides a better visual understanding, and thereby aids in developing algorithms to compute tensor operations by making use of elements from graph theory and data structures.Furthermore, a TN diagram can also be used to illustrate how a tensor is formed from several other component tensors.Hence, most tensor decompositions studied in literature are often represented using a TN.In the next section, we discuss some tensor decompositions.or the Parallel Factor (PARAFAC) decomposition, Tensor Train decomposition and many more have been extensively studied in the literature [1], [11], [20].However, in the past decade with the help of Einstein product and its properties, a generalization of matrix SVD and EVD has been proposed in the literature which has found applications in solving multi-linear system of equations and systems theory [18], [25], [26].this section, we present some such decompositions using the Einstein product of tensors.

A. Tensor Singular Value Decomposition (SVD)
Tucker decomposition of a tensor can be seen as a higher order SVD [23] and has found many applications particularly in extracting low rank structures in higher dimensional data [39].
A more specific version of tensor SVD is explored in [25] as a tool for finding tensor inversion and solving multi-linear systems.Note that [25] presents SVD for square tensors only.The idea of SVD from [25] is further generalized for any even order tensor in [27].However, it can be further extended for any arbitrary order and size of tensor.We present a tensor SVD theorem here for any tensor of order N + M .
Proof.For tensors A ∈ C I 1 ×...×I N ×J 1 ×...×J M and B ∈ C J 1 ×...×J M ×K 1 ×...×K P , from ( 14), we get : B respectively, then substituting f I 1 ,...,I N |J 1 ,...,J M (A) = A and f J 1 ,...,J M |K 1 ,...,K P (B) = B in (36) gives us Hence if A = U • D • V H (obtained from matrix SVD), then based on (37), for an order N + M tensor A ∈ C I 1 ×...×I N ×J 1 ×...×J M , we have : Note that for a tensor of order N + M , we will get different SVDs for different values of N and M , but for a given N and M , the SVD is unique which depends on the matrix SVD of f I 1 ,...,I N |J 1 ,...,J M (A) = A. A proof of this theorem for 2N order tensors with N = M using transformation defined in ( 6) is provided in [25].This SVD can be seen as a specific case of Tucker decomposition by expressing the unitary tensors in terms of the factor matrices obtained through Tucker decomposition.Let's consider an example of a fourth-order tensor.For a tensor A ∈ C I 1 ×I 2 ×K 1 ×K 2 , the Tucker decomposition has the form: where B (i) are factor matrices along all four modes of the tensor and × n denotes the n-mode product.Now, (39) can be written in matrix form as follows [26]: Now using the transformation from ( 6), we can map the elements of matrix U to a tensor U as: Also since U = (B (1) ⊗B (2) ) and Kronecker product when written element-wise, can be expressed as [12]: This relation can be seen as the unitary tensor U being the outer product of matrices B (1) and B (2) [25], but with different mode permutation.Similar relation can be established for V in terms of B (3) and B (4) .

B. Tensor Eigen Value Decomposition (EVD)
As a generalization of matrix eigenvalues to tensors, several definitions exist in literature for tensor eigenvalues [24].But most of these definitions applies to super-symmetric tensors which are defined as a class of tensors that are invariant under any permutation of their indices [40].Such an approach has applications in Physics and Mechanics [40].But there is no single generalization to tensor case that preserves all the properties of matrix eigenvalues [21].Here we present a particular generalization from [26], [18], [28] which can be seen as the extension of matrix spectral decomposition theorem and has applications in multi-linear system theory.
, where X and λ satisfy A * N X = λX, then we call X and λ as eigentensor and eigenvalue of A respectively [26].
Proof.Consider X ∈ C I 1 ×...×I N and λ ∈ C. Now let's assume A is hermitian.i.e.
This theorem can be proven using Lemma 1, details are provided in [25], [28].The eigenvalues of A are same as the eigenvalues of f I 1 ,...,I N |I 1 ,...,I N (A) [41].We will refer to a tensor A ∈ C I 1 ×...×I N ×I 1 ×...×I N as positive semi-definite, denoted by A ⪰ 0 if all its eigenvalues are nonnegative, which is same as f where D H * N D is the pseudo-diagonal tensor with eigenvalues of A H * N A on its pseudo-diagonal which are square of the singular values obtained from the SVD of A.
Definition 19.Trace: The trace of a tensor A ∈ C I 1 ×...×I N ×I 1 ×...×I N is defined as the sum of its pseudo-diagonal entries : Definition 20.Determinant: The determinant of a tensor A ∈ C I 1 ×...×I N ×I 1 ×...×I N is defined as the product of its eigenvalues i.e., if The eigenvalues of A are the same as that of its matrix transformation, hence det(A) = det(f I 1 ,...,I N |I 1 ,...,I N (A)).Note that there exists other definitions in literature for determinant based on how one chooses to define the eigenvalues of tensors [42].The definition presented here is the same as the unfolding determinant in [28].

Some properties of trace and determinant:
The following properties can be easily shown by writing the tensors component wise or using Lemma 1.
1) For two tensors A ∈ C I 1 ×...×I N and B ∈ C I 1 ×...×I N of same size and order N , 2) For tensors A ∈ C I 1 ×...×I N ×J 1 ×...×J M and B ∈ C J 1 ×...×J M ×I 1 ×...×I N , we have : where I N and I M are identity tensors of order 2N and 2M respectively.To prove (49), we can use lemma 1 and Sylvester's matrix determinant identity [43].
3) For tensors A, B ∈ C I 1 ×...×I N ×I 1 ×...×I N , we have : 4) Trace of a positive semi-definite tensor is the sum of its eigenvalues.
5) The absolute value of the determinant of a unitary tensor is 1 and the determinant of a square pseudo-diagonal tensor is the product of its pseudo-diagonal entries.

C. Tensor LU decomposition
LU decomposition is a powerful tool in Linear Algebra which can be used for solving system of equations.In order to solve systems of multi-linear equations, [28] proposed an LU decomposition form for tensors.For A ∈ C I 1 ×...×I N ×I 1 ×...×I N , the LU factorization takes the form : where L, U ∈ C I 1 ×...×I N ×I 1 ×...×I N are pseudo-lower and pseudo-upper triangular tensors respectively.In order to solve a system of multi-linear equation A * N X = B to find X, LU decomposition of A can be used to break the equation into two pseudo-triangular equations These two equations can be solved using forward and backward substitution algorithms proposed in [28].When B is an identity tensor, this method can also be used for finding the inverse of a tensor.More details on computing LU decomposition and required conditions for its existence can be found in [28].
Note that all these tensor decompositions represent a given tensor in terms of a contracted product between factor tensors.Hence they all can be represented using Tensor Network dia- Fig. 6: TN representation of Tensor SVD from (35) grams.For example, we show the TN diagram corresponding to tensor SVD in Figure 6.A detailed TN representation of several other tensor decompositions such as Tucker, PARAFAC, Tensor Train Decomposition, is also presented in [38].

V. MULTI-LINEAR TENSOR SYSTEMS
Using the tools presented so far, we will now present the notions of multi-linear system theory using tensors.

A. Discrete time multi-linear tensor systems
A discrete time multi-linear tensor system is characterized by an order through a discrete contracted convolution as defined in (28).The system tensor can be seen as an impulse response tensor whose (j 1 , . . ., j M , i 1 , . . ., i N )th entry is the impulse response from the (i 1 , . . ., i N )th input to the (j 1 , . . ., j M )th output.
A system tensor is considered p−stable if corresponding to every input of finite p−norm, the system produces an output which is also finite p−norm.When p → ∞, this notion is known as Bounded Input Bounded Output (BIBO) stability.The ∞−norm of a signal tensor X is essentially its peak amplitude evaluated over all the tensor components and all times, i.e.
Theorem 3.For a discrete multi-linear time invariant system with order N input and order M output with an order N + M impulse response system tensor H Proof.If the input signal tensor X[k] satisfies ∥X∥ ∞ < ∞, then output is given via discrete contracted convolution as: and thus, max = max Hence we get, which proves that output is bounded if (53) is satisfied.To prove of the converse of the theorem, it suffices to show any example where if (53) is not satisfied, there exists a bounded input which leads to an unbounded output.For this, we can simply consider the case where input and output are scalars which is a special case of the tensor formulation.Equation (53) in that case translates to BIBO condition for SISO LTI system, i.e. k |h[k]|< ∞.Hence it can be readily verified that a signum input defined as x[n] = sgn(h[−n]) which is bounded will lead to an unbounded output if the impulse response sequence is not absolutely summable [44].
The BIBO stability condition for a MIMO LTI system requires that every element of the impulse response matrix must be absolutely summable.The condition from (53) can be seen as an extension to the tensor case, where every element of the impulse response tensor must be absolutely summable.Furthermore, we extend the definitions of poles and zeros from matrixbased systems to tensors.A matrix transfer function H(z) has a pole at frequency ν if some entry of H(z) has a pole at z = ν [45].Also, H(z) has a zero at frequency γ if the rank of H(z) drops at z = γ [45].Similarly, a tensor transfer function H(z) has a pole at frequency ν if some entry of H(z) has a pole at z = ν.Also, H(z) has a zero at frequency γ if the rank of f J 1 ,...,J M |I 1 ,...,I N ( H(z)) drops at z = γ.Such a rank is also sometimes referred to as the unfolding rank of the tensor [28], [18].
A tensor system is BIBO stable if all its components are BIBO stable.This implies that every pole of every entry of its transfer function has a magnitude less than 1, i.e. all the poles lie within the unit circle on the z-plane.

B. Continuous time multi-linear tensor systems
A continuous time multi-linear tensor system is characterized by an order N +M system tensor which produces an order M output tensor signal Y(t) ∈ C J 1 ×...×J M t from an input tensor signal X(t) ∈ C I 1 ×...×I N t through a contracted convolution as defined in (33).
The system tensor can be seen as an impulse response tensor whose (j 1 , . . ., j M , i 1 , . . ., i N )th entry is the impulse response from the (i 1 , . . ., i N )th input to the (j 1 , . . ., j M )th output.
Similar to the discrete case, a continuous system tensor is considered p−stable if corresponding to every input of finite p−norm, the system produces an output which is also finite p−norm.This notion is known as Bounded Input Bounded Output (BIBO) stability if p = ∞.The ∞−norm of a continuous signal tensor X is essentially its peak amplitude evaluated over all the tensor components and all times, i.e.
Theorem 4. For a continuous multi-linear time invariant system with order N input and order M output with an order M + N impulse response system tensor The condition from (61) implies that every element of the impulse response tensor must be absolutely integrable.The proof of Theorem 4 follows the same line of proof as of Theorem 3.
Furthermore, a continuous system tensor with transfer function H(ω) is BIBO stable if all its components are BIBO stable.This implies that every pole of every entry of its transfer function has a real part less than 0.

C. Applications of multi-linear tensor systems
A tensor multi-linear (TML) system can be used to model and represent processes where two tensor signals are coupled through a multi-linear functional.Among various other applications, The linearity of the system is reflected in the fact that any given edge of the channel is connected with a single edge of the input.Thus a TML system is easy to identify visually in a TN diagram by observing the presence of one on one edge connections between the system and the input.
Note that in a communication system, the input signal is often precoded before transmission by a transmit filter and the output signal is processes via a receive filter.The transmit and receiver filter can also be considered as system tensors.Thus the TML channel H(t) can be seen as a cascade of three system tensors.Let the transmit filter be represented by which transforms the order N input into an order Q transmit signal.The physical channel between the source and destination is modelled as an order and the receive filter is represented by H R (t) ∈ C J 1 ×...×J M ×L 1 ×...×L P t .In this case, the equivalent channel H(t) is obtained via a cascade of the three system tensors as : A detailed derivation of such a channel representation can be found in [15].A cascade of TML systems is conveniently represented in a TN diagram as shown in Figure 8  pseudo-random spreading code to transmit information which is used to distinguish the users at the receiver.More details on CDMA can be found in [46].
Consider an uplink scenario where K users are transmitting information to a single base station (BS).Assume a simple additive white Gaussian Noise (AWGN) channel.Each user is   assigned a distinct spreading sequence denoted by vector s (k) ∈ C L of length L which transmits a symbol x (k) for user k.The received signal at the BS can be written as [47] : where z ∈ C L represents the noise vector.Now consider the extension of such a system model in the presence of flat fading channel and multiple antennas.Assume K users each with N T transmit antennas are transmitting simultaneously to a BS with N R receive antennas.To allow multiple access, all the transmit antennas of all different users are assigned different spreading sequences of length L. Let s (k,i) ∈ C L denotes the length L spreading vector for the data transmitted by the ith antenna of the kth user, x (k,i) .Transmit symbols are assumed to have zero mean and energy , and the transmit vector from each user and each antenna is generated as x (k,i) s (k,i) .The MIMO communication channel between user k and the BS is defined as a matrix H (k) ∈ C N R ×N T where the random channel matrix has independent and identically distributed (i.i.d.) zero mean circular symmetric complex Gaussian entries with variance 1/N R .
The distribution is denoted as CN (0, 1/N R ).The received signal Y ∈ C N R ×L can be written as [47], [48]: where X (k) is an N T × N T diagonal matrix defined as diag(x (k,1) , x (k,2) , . . ., x (k,N T ) ) and S (k)   is an N T × L matrix defined as (s (k,1)T , s (k,2)T , . . ., s (k,N T )T ) T .Also Z represents N R × L noise matrix with i.i.d.components distributed as CN (0, N 0 ).In [47], a per user matched filter receiver is considered for such a system by assuming the interference from other users as noise.It is shown in [47] that such a receiver under performs as compared to a multi-user receiver which detects the transmit symbols for all the users together.Hence several multi-user receivers are presented in [47] by rewriting the system model from (64) as : where H = (H (1) , . . ., Based on this, a multi-user receiver which aims to mitigate the effects of H (spatial interference) and S (multiple access interference) is considered.The received signal is linearly processed in two stages as Y → AY → AYB and the transmit signal is decoded as [47] : where D denotes the set of symbols in the transmit constellation map.Essentially AYB represents an estimated version of matrix X, whose diagonal elements at index j are used to decode the transmitted symbols and map them back to index (k, i).The matrices A and B separately aims to mitigate the effects of spatial interference and multiple access interference on the received signal, and are defined as : where I KN T is an identity matrix of size K • N T × K • N T .The zero forcing (ZF) receiver ignores the impact of noise and only tries to counter the effect of channel, while the linear minimum mean square error (LMMSE) receiver tries to reduce the noise while simultaneously aiming to mitigate the effect of channel.The DECOR choice represents a multi user decorrelator receiver.
At high SNR value, i.e. as N 0 /E s → 0, the LMMSE options reduces to ZF and DECOR.
Such a receiver based on jointly processing all the users gives better performance than a per user receiver [47].But it still has a drawback that it tries to combat the spatial interference and multiple access interference separately in two stages.Moreover, while the input in (65) is represented as a matrix X, only its diagonal elements contain the transmit elements which is formed from the concatenation of various x (k,i) .Thus such a system model does not fully exploit the multi-linearity of the system and tries to force a linear structure by manipulating the entities involved in order to fit the vector based well known LMMSE, ZF or DECOR solutions.In fact, the tensor framework can be perfectly used to represent such a system model while keeping the natural structure of the system intact, and develop a tensor multi-linear (TML) receiver.
Since the input symbol x (k,i) is indexed by two indices k and i, it is natural to represent the input as a matrix X of size K × N T with elements X k,i = x (k,i) .Further, the input signal is transmitted as a vector x (k,i) of length L corresponding to each user index k and antenna index i.
Hence the transmitted signal through the channel can be represented as a third order tensor X of size K × N T × L where X k,i,l = x (k,i) l . To generate X from X, we define the spreading sequences as an order 5 tensor S ∈ C K×N T ×L×K×N T with elements S k,i,l,k for all l, and 0 elsewhere.Then we have X = S * 2 X.Note that here we assume that the elements of X are mapped one to one with a spreading sequence hence the entries of S corresponding to k ̸ = k ′ , i ̸ = i ′ are zero.In certain applications, a linear combination of the input symbols might be transmitted, in which case the structure of S which represents a transmit filtering operation, will change accordingly.The channel matrices H (k) corresponding to each user can be represented as a slice in a third order tensor H ∈ C N R ×K×N T where H :,k,: = H (k) .Thus the system model can be given as : where H ∈ C N R ×L×K×N T represents the equivalent fourth order TML channel between the order two input X and order two output Y.
Note the advantage in modelling the system model through ( 69) is that all the associated entities retain their natural structure, and a joint TML receiver can be designed to combat the effect of all the interferences of all the users simultaneously.A multi-linear minimum mean square error receiver which acts across all the domains simultaneously can be represented through a tensor R ∈ C K×N T ×N R ×L which produces an estimate of the input X by acting upon the received tensor Y as X = R * 2 Y. Thus each element of the estimated input at the receiver is a linear combination of all the elements of Y, where the coefficients of the linear combinations are encapsulated in R.An optimal choice of R which minimizes the mean square error between X and X, defined , is given as [14] : where I is an identity tensor of size K × N T × K × N T .The estimated symbol X can be used to detect the transmit symbols as:    dashed lines correspond to an 8dB SNR per bit.It can be clearly seen that for a fixed SNR per bit, the BER and NMSE curves for TML MMSE case remain almost flat as the number of users increase.On the other hand, the performance of LMMSE1 and LMMSE2 significantly degrades with increase in the number of users.As the number of users increase, the interference across domains also increases which is only efficiently utilized in the TML MMSE receiver.Hence it shows a robustness in its performance with increasing number of users.
Note that (69) can be re-written as a system of linear equations by using vectorization of the input, output and noise, and considering the channel as a concatenated matrix f N R ,L|K,N T ( H).
Subsequently a joint receiver can also be designed using the transformed matrix channel, as presented in [47], which is conceptually equivalent to the TML MMSE approach presented in this paper.However, the concatenation of various domains obscures the different domain representations (indices) in the system.Such an approach makes it difficult to incorporate domain specific constraints at the transceiver.For instance, a common transmit constraint is the power budget which in most practical cases would be different for different users.Thus designing a transmission scheme with per user power constraints becomes important, and can be achieved using the tensor framework as it maintains the identifiability of domains.Such a consideration has been presented in [16], [49].
3) Other Examples: In [15], several multi-domain communication systems such as MIMO OFDM, GFDM, FBMC are represented using the tensor contracted convolution and contracted product.Also [15] develops tensor based receiver equalization methods which are used to combat interference in communication systems using the notion of tensor inversion.A tensor based receiver and precoder are presented in [50] for a MIMO GFDM system where the channel is represented as a sixth order tensor.The tensor SVD and EVD presented in this paper is used to design transmit coding operations and perform an information theoretic analysis of the tensor channel [49] leading to the notion of multi-domain water filling power allocation method.Also, the discrete multi-linear contracted convolution is used to design Tensor Partial Response Signaling (TPRS) systems to shape the spectrum and cross spectrum of transmit signals [16].
The tensor inversion method can also be used to develop estimation techniques for various signal processing applications such as in big data or wireless communications as shown in [14].
Another example of use of tensor Einstein product is for Image restoration and reconstruction application where objective is to retrieve an image affected by noise, a focal-field distribution, and aperture function [26].The image data is stored as a three dimensional tensor and an order 6 tensor acts as a channel obtained from the point spread function [26], such that output is given using the Einstein product between input and channel.Another area where the Einstein product properties have been used is the multi-linear dynamical system theory [17], [18].In [17], a generalized multi-linear time invariant system theory is developed using the Einstein product which can be applied for dynamical systems such as human genome, social networks, cognitive sciences.The notion of tensor Eigenvalue decomposition presented in this paper is used in [17] to derive conditions for stability, reachability, and observability for dynamical systems.
The Einstein product has this distinct advantage that it lets us develop tensor algebra notions similar to linear algebra at the same time without disturbing or reshaping the structure of tensors.
In addition, the more general tensor contracted product and contracted convolution can be used to model multi-domain systems with any mode ordering as well.

VI. SUMMARY AND CONCLUDING REMARKS
This paper presented a review of tensor algebra concepts developed using the contracted product, more specifically the Einstein Product, extending the common notions in Linear Algebra to a multi-linear setting.In particular, the notion of tensor inverse, singular and Eigenvalues decompositions, LU decomposition were discussed.We also studied the tensor networks representations of tensor contractions and convolutions.The notions of time invariant discrete and continuous multi-linear systems which can be defined using the contracted convolutions were also presented.We presented an application in a multi-domain communication system where the channel is modelled as a multi-linear system.The multi-linearity of the channel allowed us to develop a receiver which jointly combats interference across all the domains, thereby giving much better BER and MSE performance as compared to linear receivers which act on a specific domain at a time.The tensor algebra notions discussed in this paper has extensive applications in various fields such as Communications, signals and systems, controls, image processing, to name a few.In the presence of several other tensor tutorial papers in literature, this paper by no means intends to summarize all the multi-linear algebra concepts, but provides a tutorial style introduction to the main concepts from a signals and systems perspective.
For example, tensors A ∈ C K×L×M ×N and B ∈ C K×M ×Q×R can be contracted along the first and third mode of A and first and second mode of B as C = {A, B} {1,3;1,2} where C ∈ C L×N ×Q×R .Matrix multiplication between A ∈ C I×J and B ∈ C J×K can be seen as a specific case of the contracted product as A • B = {A, B} {2;1}

A
discrete time signal tensor X[n] ∈ C I 1 ×...×I N n is a function tensor whose components are functions of the sampled time index n.A discrete tensor signal can also be called a tensor sequence indexed by n.
I 1 ,...,I N |I 1 ,...,I N (A) being a positive semi-definite matrix.A tensor is positive definite, A ≻ 0, if all its eigenvalues are strictly greater than zero.A positive semidefinite pseudo-diagonal tensor D, will have all its components non-negative.Its square root can be denoted as D 1/2 which is also pseudo-diagonal positive semi-definite whose elements are the square root of elements of D such that D 1/2 * N D 1/2 = D. Similarly, if D is positive definite, its inverse can be denoted as D −1 which is also pseudo-diagonal whose non zero elements are the reciprocal of the corresponding elements of D. Based on tensor EVD, we can also write the square root of any Hermitian positive semi-definite tensor as A 1/2 = U * N D 1/2 * N U H and inverse of any Hermitian positive definite tensor as A −1 = U * N D −1 * N U H .It is straightforward to see that the singular values of a tensor A ∈ C I 1 ×...×I N ×J 1 ×...×J M are the square root of the eigenvalues of tensor A H of tensors is ubiquitous in modern communication systems where the signals and systems involved have an inherent multi-domain structure.The physical layer model of modern communication systems invariably spans more than one domain of transmission and reception such as space, time, frequency to name a few.Consequently, the associated signal processing at the transmitter and receiver has to be cognizant of the multiple domains and their mutual effect on each other for an efficient resource utilization.Hence the signals and the systems involved are best represented using tensors.1)TN representation of TML systems: Consider a multi-domain communication system where the input signal is an order N tensor X(t) ∈ C I 1 ×...×I N t , which passes through an order M + N multi-linear channel, H(t) ∈ C J 1 ×...×J M ×I 1 ×...×I N t to generate an output of order M using contracted convolution as Y(t) = H(t) • N X(t).The channel is expressed as a TML system and its coupling with the input in both frequency and time domain can be represented in a TN diagram as shown in Figure7.Note that each edge of the input is connected with the common edge of the channel via a dotted line in time domain and via a solid line in frequency domain.Instead of a regular block diagram representation of such systems, the TN diagram has the advantage that it graphically details all the modes of the input and the channel.Thus, just by looking at the free edges of the overall TN diagram, one can determine the modes of the output.

2 )
which illustrates the coupling of the receive filter, physical channel and transmit filter system tensors in both time and frequency domains.Hence the nodes for H(t) and H(ω) in Figure 7 can be broken down into component system tensors from Figure 8.A tensor system has multiple modes, and the contraction can be along various combinations of such modes.Hence the TN representation becomes extremely useful as opposed to regular block diagrams, since it allows to depict the state and coupling of each mode.Example of Tensor Contraction for MIMO CDMA systems: Code Division Multiple Access (CDMA) is a spread spectrum technique used in communication systems where multiple transmitting users can send information simultaneously over a single communication channel, thereby enabling multiple access.Each user uses the entire bandwidth along with a distinct

Fig. 7 :
Fig. 7: TN representation of TML system in time and frequency domain.

Fig. 8 :
Fig. 8: TN representation of equivalent multi linear channel in time and frequency domain.

Fig. 11 :
Fig. 11: BER performance for different receivers against number of users.

Fig. 12 :
Fig. 12: Normalized MSE for different receivers against number of users.