Input Redundancy for Parameterized Quantum Circuits

The topic area of this paper parameterized quantum circuits (quantum neural networks) which are trained to estimate a given function, specifically the type of circuits proposed by Mitarai et al. (Phys. Rev. A, 2018). The input is encoded into amplitudes of states of qubits. The no-cloning principle of quantum mechanics suggests that there is an advantage in redundantly encoding the input value several times. We follow this suggestion and prove lower bounds on the number of redundant copies for two types of input encoding. We draw conclusions for the architecture design of QNNs.


Introduction
Classical circuits and artificial neural networks can have fan-out: The ouput of one node can be the input to several others.The no-cloning principle of quantum mechanics forbids to duplicated data which is only encoded in the amplitudes of a quantum state in superposition.This applies, in particular, to "Quantum Neural Networks", and, even more specifically, to the input that is fed into such a QNN, if the input is encoded in the amplitudes of the input state.
The objects of study of this paper are parameterized quantum circuits whose input consists of real numbers x, each of which is encoded into amplitudes by applying a unitary of the form e iη(x)H at one point (no redundancy) or several points, possibly with different functions η and different Hamiltonians H.
Take, as example, the parameterized quantum circuits from of Mitarai et al. [1].Mitarai et al. demonstrate that their parameterized quantum circuit constructions can compute -as the expectation value of a fixed observableany 1 polynomial in a single variable x.In their demonstration, Mitarai et al. feed the input real number x into the quantum circuit by encoding it in the amplitudes of n qubits in a product state, where n is the degree of the polynomial, by applying the Pauli rotation e −i arcsin(x)σ X /2 to n different qubits.
In this paper, we call this effect input redundancy.More precisely, the input x to the quantum circuits of Mitarai et al. are encoded with redundancy n.
It's kinda obvious from the no-cloning principle that redundancy is needed.Schuld et al. [2] mention the possibility of replicating the input when they discuss state preparation, but they don't seem to discuss the advantage of or necessity for input redundancy.In HHL and derived algorithms, the input is encoded many times as the core algorithmic steps undergo iterations to amplify amplitudes.The HHL example also shows that input replication doesn't necessarily require that the data is encoded several times in parallel in different quantum registers (Schuld et al. consider only this possibility, through tensorial maps) -it can be encoded sequentially in the same quantum register.(It should be pointed out that general state preparation procedures as in HHL and [2] cannot not be studied with the tools of this paper: They apply many operations with parameters derived from a collection of inputs.) In the case of Mitarai et al.'s example, it can easily be seen -from algebraic arguments involving the quantum operations which are performed -that redundancy n is best possible for the particular way of encoding the value x.
In the present paper, we make an attempt to study role of the input encoding more systematically.Fig. 1 shows the schematic of quantum circuits with input x.The setup resembles that of a neural network layer.The j'th "copy" of the input is made available in the quantum circuit by, at some time, performing the unitary operation e 2πiηj Hj on one (or several: H j can be a multi-qubit Hamiltonian) qubit(s), where eta j = ϕ(a j x + b j ), for an "activation function" ϕ.For example, in the above mentioned example in [1], the activation function is ϕ := arcsin.Generalization of our results to several inputs is straightforward only when the activation functions in Fig. 1 have a single input.
Our results.For two activation functionsthe identity ("linear" input encoding) and arcsin -we prove lower bounds on the input redundance which are logarithmic in terms of linearalgebraic complexity measures of functions.
Our lower bounds are modest, as (a) log is not impressive, and (b) the argument of the log is an arbitrary complexity measure of the function which is to be represented.Moreover, our results are conceptually weak, as they don't apply in a meaningful way to general state preparation or even to the case when several inputs are "mixed" in some of the activation function nodes in Fig. 1.
Here's the "but": Our results give a rigorous justification of decisions for the design of parameterized quantum circuit architectures: i. Input redundancy must be present if good approximations of functions are the goal; ii.The encoding of the input (the coefficients in the affine transformation links in Fig. 1) must be variable or even "trained" along with the parameters θ of the quantum circuit in order to be able to approximate a large class of functions with the same MiNKiF architecture.
This paper is organized as follows.In the next section we review the background on MiNKiF circuits and the Fourier calculus on them.Sections 3 and 4, contain the results on linear and arcsin input encoding, respectively.We close with a discussion and directions of future work.

MiNKiF PQCs
We now describe the parameterized quantum circuit (PQCs) in more detail.Denote by the quantum operations of an evolution with Hamiltonian H. (The 2π factor is just a convenience2 for us and are w.l.o.g. by dilating the α).Following, in spirit, [1], in this paper we consider quantum circuits which apply quantum operations each which is one of the following: (a) An operation as in (1), with a parameter α := η which will encode input, x (i.e., η depends on x); (b) An operation as in (1), with a parameter α := θ which will be "trained"; (c) A quantum operation not defined by any parameter (its effect can, of course, depend on θ, η, e.g., via dependency on measurement results).
Denote the concatenated quantum operation by E(η, θ).Now let M be an observable, and consider its expectation value on the state which results if the parameterized quantum circuit is applied to a fixed input stat |0 0|.We denote the expectation value by f : The PQCs could have multiple outputs, but we don't consider that in this paper.We refer to PQCs of this type as MiNKiF PQCs, as [1] realized the fundamental property (we suppress the η for convenience; e j is the vector with a 1 in position j and 0 otherwise), which the sufficiently lazy calculus student immediately recognizes as the equation characterizing t → sin(2πt).
The setting we consider in this paper is that • there is a single input x ∈ R; • the parameters θ have been trained perfectly and are thus ignored.
The number n and the function η are the objects of study of this paper.In particular, we are not interested in training the parameters θ.Consequently, there's no difference in the quantum operations of types (b) and (c) above, and the applicability of our results actually protudes beyond MiNKiF PQCs to a slightly bigger class of quantum programs with analog encoding of a single input value.
To summarize, we study the function for arbitrary quantum operations V i , and for Hamiltonians H j , j = 1, . . ., n. (3b)

Fourier calculus on MiNKiF circuits
This paper builds on the observation of [4] that, under assumptions which are reasonable for near-term gate-based quantum computers, the function obtained as the expectation value of a parameterized quantum circuit is periodic in the parameters, with Fourier spectrum supported on the differences of the eigenvalues of the Hamiltonians carrying the parameters.More specifically, for each Hamiltonian H j in (3b), consider the subgroup of R generated by the differences of the eigenvalues of H j .If that subgroup is dense in R, our methods don't apply; otherwise scale the parameter η j and correspondingly (virtually) the Hamiltonian in such a way that the eigenvalue differences become integral: you can make the group of differences equal to Z.After scaling, the eigenvalues of H j are in δ + Z, for a fixed δ ∈ [0, 1[, and it can readily be checked that then the function f in (3a) is 1-periodic in η j .
After performing the scaling for all the parameters j = 1, . . ., n, we find that the Fourier transform is supported on the set of differences of the eigenvalues: f (w) = 0 unless, for all j, w j = λ j − λ j for two eigenvalues of the Hamiltonian H j which is controlled by the variable η j .(We refer to [4] or [3] for the details.) Note that, while the V i in (3) can perform non-unitary quantum operations, the parameters η can only occur as described.(For example, they cannot be changed depending on the results of measurements within the same "run" of the quantum program.) Abusing notation, we write R n /1 := R n /Z n = (R/Z) n for the torus group in additive notation, and our f is defined on R n /1.
To simplify the presentation, in this paper, we will restrict ourselves to the case that that the Hamiltonians have only two eigenvalues, ± 1 /2, which is the case, for example, for the usual 1qubit rotations, such as e −itσ/2 with σ a Pauli operator (i.e., a 1-qubit reflection) -just that our parameters are dilated by 1/(2π) to make things 1-periodic instead of 2π-periodic.
We summarize the property of the function η → f (η), with the restrictions on the Hamiltonians, in the following handy remark.
(a) f is a 1-periodic, real-valued function with Fourier spectrum Z n 3 , i.e., for all η ∈ R n : (b) Applying Euler'sidentity, we obtain the expansion of f as a multi-linear polynomial in sine and cosine functions: τ j (2πη j ). (5)

Linear input encoding
We start discussing the case where the input parameters are affine functions of the input variable, e.g., η(x) = x • a + b for some a, b ∈ R n (input redundancy n).In other words, the activation function in Fig. 1 is the identity, ϕ = id.For a ∈ R n let For k ∈ R, consider the function These functions are elements of the vector space C ∞ ([0, 1]) of smooth functions on the unit interval.
We say that a function h : [0, 1] → R has Fourier rank r if there exists a size-r set Theorem 1.Let f be as in Remark 2.2.For any , then the spread of a is at least the Fourier rank of h.

More specifically, every function representable as
Proof.This is a direct consequence of (4).For any such h and all x ∈ R, we have where we let γ k := Remark 2. Our restriction to Hamiltonians with two eigenvalues leads to the definition of the spread in (6).If the set of eigenvalue distances of the Hamiltonian encoding the input η j is D j ⊂ Z, then, for the definition of the spread, we must put this: Theorem 1 remains valid, with, in essence, the same proof.The bounds will be weaker.

Consequences
We return to discussion two-level Hamiltonians only, for simplicity.
We would like to quantify "how many" functions could be expressed for fixed a, b and ϕ.As a rough upper bound, we study the dimension of the space of functions.In the case of linear input encoding, according to Therorem 1, that function space is spanned by χ k , k ∈ K a , and the constant functions.
Corollary 3. MiNKiF circuits with linear input encoding are restricted within a space whose dimension depends on a.If a ∈ ]0, 1] n , then the dimension is between n + 1 and 3 n /2 , where the lower bound is attained by a := 1 and the upper bound is attained, e.g., by (a) a chosen uniformly at random in ]0, 1] n ; or (b) a := 1 3 1.
Proof.The functions χ k defined in (7) are linearly independent. 3This can be checked using standard techniques; cf.[3].Hence, the dimension of said space is equal to the number of nonnegative elements of K a .
It can be readily verified that |K a | ≥ 2n + 1 unless a has some 0-entries, and for a := 1, we have K a = {−n, . . ., n}.
The upper bound |K a | ≤ 3 n is trivial, and attained if the entries of a are in general position wrt. the action of Z 3 on R. That this is the case in the given examples (a), (b) is straightforward; cf.[3].
We have arrived at the lower bound for the input redundancy.Corollary 4. To represent a function as an expectation value of a MiNKiF circuit with linear input encoding, the input redundancy has to be log 3 (r) − O(1), where r is the Fourier rank of that function (and the constant in the O(1) is absolute).

Arcsine activation function
We now consider the original situation of the example in [1], where the activation function 3 In the usual sense of vector spaces: Every finite subset is linearly independent.
if that interval is non-empty.Note, though, that this condition is not necessary.The following can be shown (cf.[3]).
Lemma 5.If a linear combination of scmonomials of degree at most d defines a function h on all of R, then h is a polynomial in x of degree at most d.
We immediately obtain this corollary.
Corollary 6.To represent a polynomial as an expectation value of a MiNKiF circuit with arcsin activation function, the input redundancy has to be at least the degree of the polynomial.
As indicated in the introduction, in the special case which is considered in [1] -where the input amplitudes are stored (by rotations) in n distinct qubits before any other quantum operation is performed -this can be proved by looking directly at the effect of a Pauli transfer matrix on the mixed state vector in the Pauli basis.
In their paper, [1] hint that, due to the √ terms, the functions represented by the expectation values can more easily represent a larger class of functions than polynomials.Here is the corresponding result.
Proposition 7. MiNKiF circuits with arcsin activation function are restricted within a space whose dimension depends on a, b.If a ∈ ]0, ∞[ n , then the dimension is between n+1 and 3 n , where the lower bound is attained by a := 1 b := 0, and the upper bound is attained, e.g., by letting a := 1 and choosing b uniformly at random in The somewhat technical proof sheds little light on the problem and we refer the reader to [3] for it.

Conclusions and outlook
Both activation functions we considered give clear evidence that input redundancy is necessary, and grows at least logarithmically with the "complexity" of the function: The complexity of a function f wrt a family B of "basis functions" is the number of functions from the family which are needed to obtain f as a linear combination.In our results, the function family B depends on the activation function.In the case of linear input encoding (activation function "identity"), the basis functions are trigonometric functions t → e 2πikt , whereas for the arcsin activation function, we obtain the basis monomials (10) already identified in [1].
More importantly, it is now clear that the weights a, b (the coefficients in the affine transformation links in Fig. 1) have to be variable in order to ensure a reasonable amount of expressiveness in the function represented by the quantum circuit.Undoubtedly, this fact will influence future architectures for quantum neural networks -more accurately named hybrid quantum-classical layers of a neural network.
As for an outlook towards future research, we consider a serious shortcoming in the authors' understanding of the research topic that the complexity measures (i.e., the B) depend on the activation functions.This dependence makes a comparison of different activation functions difficult.Hence, the comparison of activation functions, or at least the question which activation functions make any sense at all, is a goal of future research.
Moreover, obviously, the hybrid quantum classical neural networks layers suggested in this paper should be implemented and studied experimentally.

Figure 1 :
Figure 1: Schematic for the parameterized quantum circuits we consider.(Additional parameters θ not shown.)

ϕh(a j x − b j ) j∈C 1 −
Abbreviating s j := a j x − b j and c j := 1 − (a j x − b j ) 2 for j = 1, . . ., n, Remark 2.2(b), gives us that the function computed by our hypothetical quantum circuit is 5 h(x) =