Tackling the Trade-Off Between Information Processing Capacity and Rate in Delay-Based Reservoir Computers

We study the role of the system response time in the computational capacity of delay-based reservoir computers. Photonic hardware implementation of these systems offers high processing speed. However, delay-based reservoir computers have a trade-off between computational capacity and processing speed due to the non-zero response time of the non-linear node. The reservoir state is obtained from the sampled output of the non-linear node. We show that the computational capacity is degraded when the sampling output rate is higher than the inverse of the system response time. We ﬁnd that the computational capacity depends not only on the sampling output rate but also on the misalignment between the delay time of the non-linear node and the data injection time. We show that the capacity degradation due to the high sampling output rate can be reduced when the delay time is greater than the data injection time. We ﬁnd that this mismatch gives an improvement of the performance of delay-based reservoir computers for several benchmarking tasks. Our results show that the processing speed of delay-based reservoir computers can be increased while keeping a good computational capacity by using a mismatch between delay and data injection times. It is also shown that computational capacity for high sampling output rates can be further increased by using an extra feedback line and delay times greater than the data injection time.


INTRODUCTION
Reservoir computing (RC) is a successful brain-inspired concept to process information with temporal dependencies [1,2].RC conceptually belongs to the field of recurrent neural networks (RNN) [3].In these systems, the input signal is non-linearly projected onto a high-dimensional state space where the task can be solved much more easily than in the original input space.The high-dimensional space is typically a network of interconnected non-linear nodes (called neurons).The ensemble of neurons is called the reservoir.RC implementations are generally composed of three layers: input, reservoir, and output (see Figure 1).The input layer feeds the input signal to the reservoir via fixed weighted connections.The input weights are often chosen randomly.These weights determine how strongly each of the inputs couples to each of the neurons.In traditional RNN the connections among the neurons are optimized to solve the task.Nevertheless, in RC, the coupling weights in the reservoir are not trained and can be chosen at random.The reservoir state is given by the combined states of all the individual nodes.Under the influence of input signals, the nodes of the reservoir remain in a transient state such that each input is injected in the presence of the response to the previous input.As a result the reservoir can retain input data for a finite amount of time (short-term memory [4]), and it can compute linear and non-linear functions of the retained information.The reservoir output is constructed through a linear combination of neural responses, with readout weights that are trained for the specific task.These weights are typically obtained by a simple linear regression.The strength of the reservoir computing scheme lies in the simplicity of its training method, where only the connections with the output are optimized.
Hardware implementations of RC are sought because they offer high processing speed [5], parallelism, and low power consumption [6] compared to digital implementations.However, traditional RC involves a large number of interconnected non-linear neurons, so the hardware implementation is very challenging.Recently, it has been shown that RC can be efficiently implemented using a single non-linear dynamical system (neuron) subject to delayed feedback (delay-based RC) [7].This architecture emulates the dynamic complexity traditionally achieved by a network of neurons.In delay-based RC, the spatial multiplexing of the input in standard RC systems with N neurons is replaced by time-multiplexing (see Figure 1).The reservoir is composed of N sampled outputs of the nonlinear node distributed along the delay line, called virtual nodes.Connections between these N virtual nodes are established through the delayed feedback when a mismatch between the delay and data injection times is introduced [8].Delay-based RC has facilitated hardware implementation in photonic systems that have the potential to develop high-speed information processing.An overview of recent advances is given in Van der Sande et al. [9].However, the information processing rate is limited by the non-zero response time of the system.The reservoir state is obtained from the sampled output of the non-linear node.The information processing (or data injection) time is given by T p = Nθ , where θ is the inverse of the output sampling rate, i.e., the time interval between two virtual nodes (see Figure 1).The information processing rate T p −1 can be increased by decreasing the node distance (higher sampling output rate).However, when θ is less than the response time of the system T, virtual nodes are connected through the non-linear node dynamics.Network connections due to inertia lead to virtual node-states with similar dependence on inputs.Then the number of independent virtual nodes decreases and the diversity of the reservoir states is reduced.As a consequence computational capacity is degraded.Then there is a trade-off between information processing capacity and rate in delay-based reservoir computers.
In this work we show, using numerical simulations, that the computational capacity is degraded when the sampling output rate is higher than the inverse of the system response time.We obtain the memory capacities for different values of θ/T and the mismatch between the delay and data injection times.Until now only two different delay-based reservoir architectures have been considered: θ < T without mismatch [7] and θ ≫ T with mismatch time θ [8].We find that the computational capacity depends not only on the sampling output rate but also on the misalignment between the delay time of the nonlinear node and the data injection time.We show that the capacity degradation due to high sampling output rate can be reduced when the delay time is greater than the data injection time.We also find that this mismatch gives an improvement of the performance of delay-based reservoir computers for several benchmarking tasks.Then, delay-based reservoir computers can achieve a high processing speed and good computational capacity using a mismatch between delay and data injection times.
We first consider a simple architecture of a single non-linear node with one feedback delay line.The linear and non-linear information processing capacities are obtained for different values of θ/T.It is found that information processing capacity is boosted for small values of θ/T if the delay of the non-linear node τ is greater than T p .A similar performance is obtained for small and large values of θ/T for channel equalization and also for NARMA-10 task if values of the delay time greater than T p are used.Then the information processing rate is increased without causing system performance degradation.This is due to the increase in reservoir diversity.Another strategy to increase reservoir diversity is to use an extra feedback line.We show that memory capacity can be further increased with this architecture for small values of θ/T when the delay time is greater than the information processing time.

Delay-Based Reservoir Computers
Traditional RC implementations consist of a large number N of randomly interconnected non-linear nodes [3].The state of the reservoir at time step n, r(n), is determined by: where u(n) is sequentially injected input data and f is the reservoir activation function.The matrices W and W in contain the (generally random) reservoir and input connection weights, respectively.The matrix W (W in ) is rescaled with a connection (input) scaling factor β (γ ).The exact internal connectivity is not crucial.In fact, it has been shown that simple non-random connection topologies (e.g., a simple chain or ring) gives a good performance [10].Delay-based RC is a minimal approach to information processing based on the emulation of a recurrent network via a single non-linear dynamical node subject to delayed feedback.The reservoir nodes (called virtual nodes) are the sampled outputs of the non-linear node distributed along the delay line (see Figure 1).In the time delay-based approach there is only one real non-linear node.Thus, the spatial multiplexing of the input in standard RC is replaced here by time multiplexing.The advantage of delay-based RC lies in the minimal hardware requirements.There is a price to pay for this hardware simplification: compared to an N-node standard spatially-distributed reservoir, the dynamical behaviour in the system has to run at an N-times higher speed in order to have equal input-throughput.
The dynamics of a delay-based reservoir has been described as [7,[11][12][13][14][15][16]: where T is the response time of the system, τ the delay time, β > 0 the feedback strength and γ the input scaling.The masked input J(t) is the continuous version of the discrete random mapping of the original input W in u(n).In our approach, every time interval of the data injection/processing time T p represents another discrete time step.This time is given by T p = Nθ , where θ is the temporal separation between virtual nodes.Individual virtual nodes are addressed by time-multiplexing the input signal.An input mask is used to emulate the input weights of traditional RC.This mask function is a piecewise constant function, constant over an interval of θ , and periodic with period T p .The N mask values m i are drawn from a random uniform distribution in the interval [-1,1] The procedure to construct the continuous data J(t) is the following.First, the input stream u(n) undergoes a sample and hold operation to define a stream which is constant during one T p , before it is updated.Every segment of length T p is multiplied by the mask (see Figure 1).The masked input u(n + 1) ⊗ Mask is injected directly following u(n) ⊗Mask.After a time T p , each virtual node is updated.
The reservoir state that corresponds to the input u(n), r(n) = [r 1 (n) . . .r N (n)], is the collection of N outputs of the dynamical system, r i (n) = x(nT p − (N − i)θ ), where i = 1, . . ., N (see Figure 1).These N points are called virtual nodes because they correspond to taps in the delay line and play the same role as the neurons in standard RC.The node responses r i (n) are used to train the reservoir to perform a specific task.As in the standard RC [1,17], only the output weights W out are computed to obtain the output ŷ = W out r.A linear regression method is used to minimize the error between the output ŷ and the desired target y in the training phase.The testing is then performed using previously unseen input data of the same kind as those used for training.

Interconnection Structure of Delay-Based Reservoir Computers
In delay-based reservoir computers virtual nodes are connected through the feedback loop with nodes affected by previous inputs.Virtual node states also depend on close (in time) nodes through the inherent dynamics of the non-linear node.We can identify four time scales in the delayed feedback system with external input described by Equation (2): the response time T of the non-linear node, the delay time τ , the separation of the virtual nodes θ , and the data injection/processing time T p .Setting the values of the different time scales creates a fixed interconnection structure.The virtual nodes can set up a network structure via the feedback loop by introducing a mismatch between T p and τ .Interconnection between virtual nodes due to the inherent dynamics of the non-linear node is obtained if the node separation θ is smaller than the response time of the system T. Due to inertia the response of the system is not instantaneous.Therefore, the state of a virtual node depends on the states of nodes that correspond to previous taps in the delay line.However, if θ is too short, the non-linear node will not be able to follow the changes in the input signal and the response signal will be too small to measure.Typically, a number of θ = 0.2T is quoted [7,[11][12][13][14][15][16]18].
When θ ≫ T the state of a given virtual node is independent of the states of the neighboring virtual nodes.Then virtual nodes are not coupled through the non-linear node dynamics.The reservoir state is only determined by the instantaneous value of the input J(t) and the delayed reservoir state.The system given by Equation (2) can then be described with a map: A network structure can be obtained via the feedback loop by introducing a mismatch between T p and τ .This mismatch can be quantified in terms of the number of virtual nodes by α = (τ − Nθ )/θ .In the case of 0 ≤ α < N and θ ≫ T, the virtual node states are given by: The network topology depends on the value of α.When α = 1 (i.e., τ = T p +θ ) the topology is equivalent to the ring topology in standard RC systems [10].When α < 0, a number |α| of virtual nodes are not connected through the feedback line with nodes at a previous time.When α and N have no common divisors, all virtual nodes are connected through feedback in a single ring.However, when N and α are not coprimes, subnetworks are formed with a similar dependence on inputs and the reservoir diversity is reduced.
It is clear that the information processing rate of delay-based reservoir computers T p −1 depends on the node separation.Then reservoir computers with nodes connected only through the feedback line (θ ≫ T) are slower than a counterpart exploiting the virtual connections through the system dynamics (θ < T).However, as we will show in 3.1, information processing capacity is degraded when θ < T. In this case, the computational capacity increases with the mismatch between the delay and data injection times (see section 3.1).

Computational Capacity
Delay-based reservoir computers can reconstruct functions of h previous inputs y k (n) = y(u(n − k 1 ), . . ., u(n − k h )) from the state of a dynamical system using a linear estimator ŷk .Here k denotes the vector (k 1 , . . ., k h ).The estimator ŷk is obtained from N internal variables (node states) of the system.The suitability of a reservoir to reconstruct y k can be quantified by using the capacity [20]: The capacity is C y k = 1 when the reconstruction error for y k is zero.The capacity for reconstructing a function of the inputs y, C y , is given by the sum of C y k over all sequences of past inputs [20]: The total computational capacity C T is the sum of C y k over all sequences of past inputs and a complete orthonormal set of functions.When y k is a linear function of one of the past inputs, y k (n) = u(n − k), the capacity C y corresponds to the linear memory capacity introduced in Jaeger [4].The capacity of the system to compute non-linear functions of the retained information is given by the non-linear memory capacity [20].The computational capacity is given by the sum of the linear and non-linear memory capacities.The total capacity is limited by the dimension of the reservoir.As a consequence, there is a trade-off between linear and non-linear memory capacities [20].The total computational capacity of delay-based reservoirs is given by the number of linearly independent virtual nodes.The computational power of delay-based reservoir computers is therefore hidden in the diversity of the reservoir states.In the presence of inertia (θ < T) non-linear node dynamics couples close (in time) virtual nodes.This coupling reduces reservoir diversity, and then computational capacity is degraded.The computational capacity of delay-based reservoir depends not only on the separation between the virtual nodes but also on the misalignment between T p and τ , given by α.When α < 0, the state of a virtual node of index i > (N − |α|), r i (n), is a function of the virtual node state r i−N+|α| (n) at the same time.Then the reservoir diversity and computational capacity are reduced.Computational capacity is also reduced if |α| and N are not coprimes.In this case, the N virtual nodes form gcd(|α|, N) ring subnetworks, where gcd is the greatest common divisor.Each subnetwork has p = N/gcd(|α|, N) virtual nodes.Virtual node-states belonging to different subnetworks have a similar dependence on inputs and reservoir diversity is reduced.

Reservoir Computers With Two Delay Lines
An architecture with several delay lines has been proposed [21,22] to increase the memory capacity of delay-based reservoir computers with virtual nodes connected only through non-linear system dynamics (θ < T and α = 0).Several delay lines are added to preserve older information.The longer the delay, the older the response that is being fed back.Even without explicitly reading the older states from the delay line, the information is re-injected into the system and its memory can be extended.We apply this approach to delay-based reservoir computers with virtual nodes that are connected through non-linear node dynamics and by the feedback line.
The dynamics of reservoir computers with two delay lines is described by: (6) where β i ≥ 0 is the feedback strength of the delay line i.The total feedback strength is β = β 1 + β 2 .The corresponding delays are given by τ 1 = Nθ + α 1 and τ 2 = 2Nθ + α 2 , where 0 ≤ α i < Nθ .The reservoir state is the same as in one delay-based RC, i.e., the virtual nodes correspond to taps only in the shorter (τ 1 ) delay line.In the case of α 1 = 0, it has been shown [23] that the best performance for NARMA-10 task is obtained when τ 1 and τ 2 are coprimes.In this case, the number of virtual nodes that are mixed together within the history of each virtual node is maximized.
If the mismatches α i (i = 1, 2) are zero, the virtual node states at time n depend on the reservoir state at time (n − 1) and (n − 2) via the delay line 1 and 2, respectively.In one-delay reservoirs (β 2 = 0), the number of virtual nodes whose state at time n depends on the reservoir state at time (n − 2) increases with the mismatch (see Equation 2.1.1 for the case without inertia).When a second delay is added with a mismatch α 2 > 0, some virtual nodes at time n are connected with nodes at time (n − 3).The number of virtual nodes with states at time n that depend on the reservoir state at time (n − 3) increases with α 2 .These connections with older states can extend the memory of the two-delay reservoir computer.

RESULTS
In this section, we show the numerical results obtained for the memory capacities and performance of a non-linear delaybased RC system.We study a delay-based reservoir computer with a single non-linear node for the one and two delay lines architectures.The one-delay system is governed by Equation ( 2) and the two-delay reservoir by Equation (6).In both cases the reservoir activation function f is given by: where a = 2 and λ = 1.The value of f s = 2.5 is chosen to have, when β < 1, a stable fixed point for the system defined by Equation ( 2) in absence of input (γ = 0).This non-linear function is asymmetric to allow that the reservoir computer reconstructs even functions of the input.Similar results are obtained for different reservoir activation functions, in particular for a sin 2 function, that corresponds to an optoelectronic implementation [8,11,[13][14][15].The number of virtual nodes used in the numerical simulations is a prime number, N = 97, to avoid the capacity degradation due to the formation of subnetworks.The rest of fixed parameters are: T = 1 and β = β 1 = 0.8 for the onedelay reservoir computer and β 1 + β 2 = β = 0.8 for the two-delay reservoir computer.The effective non-linearity of the delay-based reservoir computer can be changed with the scaling input parameter γ .In this work, we consider γ = 0.1 and γ = 1 that correspond to low-to-moderate and strong non-linearity, respectively.The total capacity of a linear reservoir computer with f (z) = z will also be analyzed.
All the results presented in this paper are the average over 5 simulation runs with different training/test sets and different masks.A total of 8,000 inputs (6,000 for training and 2,000 for testing) are used for computational capacities and the NARMA-10 task.The dataset for the channel equalization task has 10,000 points for training and 6,000 for testing.

Computational Capacity
To analyze the computational capacity of the non-linear delaybased reservoir computer, we calculate by using (Equations 4 and 5) four capacities as in Duport et al. [19], namely linear (LMC), quadratic (QMC), cubic (CMC) and cross (XMC) memory capacities, which correspond to functions y given by the first, second and third order Legendre polynomials, respectively.In order to obtain these capacities a series of i.i.d.input samples drawn uniformly from the interval [-1, 1] is injected into the reservoir.The LMC is obtained by summing over k the capacity C y k for reconstructing y k (n) = u(n − k).It corresponds to the linear memory capacity introduced in Jaeger [4].The QMC and CMC are obtained by summing over k the capacity for y k (n) = (3u 2 (n − k) − 1)/2 and y k (n) = (5u 3 (n − k) − 3u(n − k))/2, respectively.The XMC is obtained by summing over k, k ′ for k < k ′ the capacities for the product of two inputs, y k,k ′ = u(n − k) • u(n − k ′ ).In non-linear systems, the sum C s = LMC + QMC + CMC + XMC does not include all possible contributions to C T , so C s ≤ C T , whereas for linear systems C s = LMC = C T .Finally, note that in some cases the main contribution to the LMC is due to the sum of C y k over a large range of values of k greater than a certain value k c with large normalized-rootmean-square reconstruction errors NRMSRE This corresponds to a memory function m(k) = C y k with a long tail.In these cases a high LMC can be obtained but the reconstruction error for y k when k > k c is large.This low quality memory capacity leads to poor performance for tasks requiring long memory, such as NARMA-10 task [10].A memory capacity with good quality (quality memory capacity) can be calculated by summing only the capacities for y k over k until they drop below a certain value q.If we consider that the error is small when NRMSRE(k) < 0.3, this corresponds to C y k > 0.91.Then we consider a value q = 0.9 to obtain the quality memory capacity C y q = 0.9 .

Memory Capacities of One-Delay Reservoir Computers
First, we simulate a delay-based reservoir computer with a single delay line.We focus on the influence of the system response time on the computational capacity for different values of the mismatch α between the data injection and delay times.Until now two values of the mismatch have been used: α = 0 with θ = 0.2T [7,[12][13][14][15][16][17][18], and α = 1 with θ ≫ T [8,15,19].We first consider a linear system with f (z) = z in Equation (2).As stated before, the total computational capacity of this system can be obtained from the linear memory capacity, e.g., C l T = LMC.Figure 2 shows the total computational capacity of the linear reservoir computer as a function of the node separation for two different values of the detuning between T p and τ : α = 0 and α = 1.For α = 1 (Figure 2B), C l T increases with θ/T and the upper bound C T = N = 97 is almost reached for θ/T = 10.Similar behaviour is obtained for detuning values 1 < α < N. Then almost all the nodes are linearly independent for θ/T = 10 and non-zero α.The quality memory C l(q = 0.9) T = LMC q=0.9 of the linear delay-based reservoir computer also increases with θ/T following the same behavior than C l T for α = 1.However, when θ < T a total capacity C l T < 50 is obtained.Then a clear degradation of the capacity is observed with respect to its upper bound, given by N = 97, when the node separation is smaller than the response time of the non-linear node dynamics.In this case virtual nodes with an index difference smaller than T have similar states.Then reservoir diversity is reduced and the information processing capacity is degraded.When θ/T increases the coupling between close (in time) virtual nodes decreases, and the capacity increases.) and the dashed line with black points is the total quality computational capacity calculated for q = 0.9.
In the special case of zero detuning (α = 0), the only coupling between the virtual nodes is through the system dynamics with non-zero response time.For α = 0, the total capacity of the linear delay-based reservoir computer has a maximum value C l T = 38 at θ/T ∼ 1.2 (see Figure 2A).In this case a clear degradation of the capacity is observed for any value of θ/T.The maximum is due to the trade-off between the fading of the coupling through the system dynamics for low sampling output rates and the very similar responses to different inputs for small θ .Furthermore, for α = 0, the quality memory capacity decreases with θ/T and the maximum C l(q=0.9) T is obtained at θ/T = 0.2.For low inertia, θ/T = 4, we obtain a normalized-root-mean-square reconstruction error NRMSRE(k) > 0.6 when k > 2. For θ/T = 1 a NRMSRE(k) > 0.3 when k > 12 is obtained.
We consider now a non-linear delay-based reservoir computer with an activation function given by Equation ( 7) and a low-to-moderate non-linearity (γ = 0.1).In this case, the capacity C s has a behaviour as a function of θ similar to that of the total capacity of the linear case C l T (see Figure 3).For α = 1, C s increases with θ/T, and a value of C s = 93 is obtained at θ/T = 4.If all the capacities would be considered for α = 1, C T ∼ N. The increase in C s with θ/T is mainly due to the XMC and to the LMC.When θ/T < 1 a capacity C s < 75 is obtained.However, this degradation in C s is smaller than in the linear case.It is worth mentioning that for α = 1, C s is greater than the total capacity of the linear case C l T .Then we have C l T < C s ≤ C nl T , where C nl T is the total capacity of the non-linear system.This is due to the fact that non-linearity increases the number of linearly independent virtual node states, since correlations between virtual nodes are smaller for non-linear delay-based reservoir computer.In the case without mismatch (α = 0) the capacity C s of the non-linear reservoir computer (see Figure 3A) has a maximum as in the linear case at θ/T ∼ 1.2.The degradation of C s is smaller than that of C l T in the linear case.We have shown that the computational capacity is degraded when the sampling output rate is higher than the inverse of the system response time.However, the information processing capacity of delay-based reservoir computers depends not only on output sampling rate (i.e., the separation between the virtual nodes) but also on the detuning between T p and τ , i.e., α.To study this dependency, we calculate the memory capacities as a function of α for a non-linear delay-based reservoir computer with two different response times: an instantaneous response to the input T = 0 (Figures 4C,D) and T = θ/0.2(Figures 4A,B).This node separation θ = 0.2T is the one used in most of the reservoirs with connections through system dynamics [7,[12][13][14][15][16][17][18].The capacities for T = 0 correspond to a node separation much larger than T. When θ/T ≫ 1 the nodes response to an input reach the steady state after a time θ .Then the reservoir state is given by Equation (2) for T = 0.As a consequence, when θ/T ≫ 1 the computational capacity tends to the value obtained for T = 0.For a mismatch α = 1 this limit is reached for θ/T > 4 (see Figure 3B).Two values of γ = 0.1 and γ = 1 that correspond to low-to-moderate and strong non-linearity, respectively are considered.We also calculate the total capacity as a function of α for a linear reservoir computer with θ = 0.2T (Figure 4B).
The virtual states of delay-based systems with an instantaneous response to the input are given by the map of Equation (3).When N and α are coprimes, we have for 0 < α < N a total capacity C T ≈ N. Thus, increasing α in the case of T = 0 does not increase the total capacity; it only changes the relative contribution of the different capacities to C nl T .This is clearly shown in Figure 4D where a low-to-moderate non-linearity (γ = 0.1) is considered.Here, the non-linear memory capacities of degree greater than two are zero (i.e., CMC), and C s ∼ 95 for 0 < α < 90.This value is very close to the upper bound for the capacity C T = N = 97.Since C T is limited by N, there is a trade-off between the linear and non-linear capacities.Then the increase in the LMC with α is compensated by a decrease of the XMC in Figure 4D.In the case of strong non-linearity (γ = 1), Figure 4C shows that C s is not close to the upper bound for the capacity C T = N = 97.Then there is a significant contribution to C nl T of capacities with a non-linear degree greater than the ones considered in C s .An increase in C s with α is obtained.This increase is mainly due to LMC and XMC.It only indicates that the contribution to C nl T of the capacities with a lower non-linear degree considered in C s increases.Now we analyze the capacity dependence on α when θ/T = 0.2.We consider integer values of α.Similar results are obtained when α is not an exact integer.We first consider the linear system.In this case the total capacity C l T is given by the LMC.As seen in Figure 2A the capacity is degraded when θ < T due to the similar evolution in time of close (in time) virtual nodes connected through non-linear node dynamics.Figure 4B shows that C l T increases with α.A significant increase of nearly 50% is obtained for the capacity when the mismatch is large.This is due to an increase in reservoir diversity.When the mismatch α is increased, virtual nodes are connected through feedback to nodes that are not connected through system dynamics.This improves reservoir diversity, and a larger capacity can be achieved.
In the non-linear case with θ/T = 0.2, Figures 4A,B show that regardless of the non-linearity, C s increases with α.This increase can not be attributed only to a change in the contribution of linear and non-linear capacities to the total capacity C nl T .As seen for the linear case, when θ/T = 0.2 the total capacity C l T increases with α due to an increase in reservoir diversity.This should also lead in the non-linear case to an increase in the total capacity C nl T with α.It is worth mentioning that in the case of T = θ/0.2we obtain a similar C s for low-to-moderate (see Figure 4B) and strong (Figure 4A) non-linearity.However, the relative contribution of the linear memory capacity is higher for low non-linearity.Finally, note that regardless of the nonlinearity and T, higher order capacities such as QMC and CMC remain almost constant with α and the change of C s is due to LMC and XMC.

Memory Capacities of Two-Delay Reservoir Computers
We have shown that the computational capacity is boosted for small values of θ/T when the delay time of the non-linear node is greater than the data injection time.This mismatch between τ and T p allows higher processing speeds of delay-based reservoir computers without performance degradation.This is due to the increase in reservoir diversity.To further increase reservoir diversity in the case of T = θ/0.2,we explore the effect of adding a extra feedback line to the non-linear node.Figure 5 shows the C s of the two-delay reservoir computer vs. the misalignment of the second delay when γ = 0.1.The mismatch of the first delay is fixed at α 1 = 73 (Figure 5, left) and α = 1 (Figure 5, right).In both cases the maximum of C s reached for the two-delay system is C s ∼ 61.This value is obtained in the two cases, α 1 = 1 and α 1 = 73, for α 2 ∼ 70 when β 2 = 0.75 and just in the case of α 1 = 73 also for α 2 ∼ 82 and β 2 = β 1 = 0.4.The maximum C s obtained for the two-delay system is slightly higher than the one reached for its one-delay counterpart.In the one-delay system the maximum capacity is C s ∼ 57 that is obtained for α ∼ 80 (see Figure 4B).Therefore, the calculated information processing capacity for high sampling output rates can be further increased by using an extra feedback line and delay times greater than the information processing time.However, the second delay does not significantly improve the computational capacity of the one-delay system.Moreover, when the first delay mismatch is fixed near its optimal value for the one-delay system (α ∼ 80), the effect of the second delay feedback strength or misalignment is small [see Figure 5 (right)].However, when the first delay mismatch is not close to its optimal value for the one-delay system, the maximum C s reached for the one-delay system is outperformed by adding a second delay with a high strength (β 2 = 0.75) and a mismatch 10 < α 2 < 90 [see Figure 5 (left)].
The contributions of the individual memory capacities to C s for the two-delay system are depicted in Figures 6, 7 for α 1 = 1 and α 1 = 73, respectively.Figure 6 shows that the increase in C s obtained for α 1 = 1 is mainly due to the increase in LMC and QMC.It is interesting that in the case of α 2 = 73, the same C s ∼ 61 can be obtained with different relative contributions of the memory capacities to C s .The case of α 2 ∼ 70 and β 2 = 0.75 yields to a higher LMC and a lower XMC than in the one-delay system.The case of α 2 ∼ 82 and β 2 = 0.4 gives the C s ∼ 61 thanks mainly to the increase in the XMC.

Delay-Based Reservoir Computer Performance
Finally we study the effect of increasing the mismatch α on the performance of a delay-based reservoir computer for two different response times of the non-linear node dynamics: T = 0 and T = 0.2θ .Two tasks are considered: the NARMA-10 task and the equalization of a wireless communication channel.These two tasks are benchmarking tasks used to assess the performance of RC [1,10].
The NARMA-10 task consists in predicting the output of an auto-regressive moving average from the input u(t).The output y(t + 1) is given by: y(t +1) = 0.3y(t) + 0.05y(t) 9 i=0 y(k−i) + 1.5u(t −9)u(t)) + 0.1 (8) The input u(t) is independently and identically drawn from the uniform distribution in [0, 0.5].Solving the NARMA-10 task requires both memory and non-linearity.Figure 8 (left) shows the normalized-root-mean-square error (NRMSE) of the NARMA-10 task as a function of α for γ = 0.1.We consider a small value of γ = 0.1 because a long memory is required to obtain a good performance for NARMA-10 task.Regardless the response time (T = 0 or T = θ/0.2), the NRMSE decreases when the processing and delay times are mismatched (α > 0).However, for T = 0 the NRMSE is almost the same for a wide variety of values of α, and a mismatch α = 1 is enough to obtain a NRMSE = 0.31 close to the absolute minimum (NRMSE = 0.28 for α = 78).When the response time of the nonlinear node is larger than node separation (T = θ/0.2), the NRMSE decreases from a NRMSE ≈ 0.46 at α = (0, 1) to a NRMSE = 0.34 at α ∼ 72.This is due to the long memory required to obtain a good performance for NARMA-10 task.In the case of T = θ/0.2, the required LMC is not reached until α ∼ 72 (see Figure 4B).Our results show that a similar  performance can be obtained for small and large values of T/θ thanks to the mismatch α.Therefore, increasing α allows a faster processing information (higher sampling output rate) without causing system performance degradation.
The equalization of a wireless communication channel consists in reconstructing the input signal s(i) from the output sequence of the channel u(i) [1].The input to the channel is a random sequence of values s(i) taken in {−3, −1, 1, 3}.The input s(i) first goes through a linear channel yielding: q(i) = 0.08s(i+2) − 0.12s(i+1) + s(i) + 0.18s(i−1) − 0.1s(i−2) + 0.091s(i−3) − 0.05s(i−4) + 0.04s(i−5) + 0.03s(i−6) + 0.01s(i−7) It then goes through a noisy non-linear channel: u(i) = q(i) + 0.036q(i) 2 − 0.011q(i) 3 + v(i), where v(i) is a Gaussian noise with zero mean adjusted in power to give a signal-to-noise ratio (SNR) of 20 dB.The performance is measured using the Symbol Error Rate (SER), that is the fraction of inputs s that are misclassified.The SER for the equalization with a SNR of 20dB is depicted as a function of α for γ = 1 in Figure 8 (right).In the case of T = 0, there is a clear improvement of the performance from α = 0 to α = 1 but the errors are almost constant when α is further increased.When T = θ/0.2performance improves with α until a minimum SER = 0.012 is reached when α ∼ 4.This SER is similar to that obtained when T = 0.Then, regardless the value of T/θ , a similar performance is obtained by using the mismatch α.A SER of 0.01 for the channel equalization task has been obtained using an optoelectronic reservoir computer [15].It is not straightforward how the processing capacity will translate into the performance for specific tasks.Different tasks require to compute functions with different degrees of non-linearity and memory.Information processing capacity should be complemented with those requirements to identify optimized operating conditions for the reservoir.For the channel equalization task, when T = 0 the capacities LMC and XMC increase with α showing a very large increase from α = 0 to α = 1 (see Figure 4C).The SER shows also a clear decrease from α = 0 to α = 1 but it is almost constant when α > 1 [see Figure 8 (right)].The capacities LMC and XMC achieved for α = 1 when T = 0 are enough to solve the channel equalization task.However, the quadratic capacity QMC is almost constant when α > 1.As a consequence the SER is almost constant for α > 1.When taking a small node separation (θ = 0.2T) the capacities LMC and XMC increase with α (see Figure 4A).This increase in processing capacity leads to a better performance with α and the SER decreases from 0.017 for α = 0 to a minimum error of 0.012 for α = 4.This is an improvement in performance of around 30%.However, the increase in the total capacity for α > 4 (mainly due to the LMC) does not translate into the

FIGURE 1 |
FIGURE 1 | Schematic illustration of delay-based RC.NL stands for Non-linear Node.The NL can have one (β 2 = 0) or two delay lines.The points r i (n) represent the virtual nodes separated by time intervals θ.The masked input u(n + 1) ⊗ Mask is injected directly following u(n) ⊗Mask.

FIGURE 2 |
FIGURE 2 | Computational capacity of the linear delay-based RC with one delay line as a function of θ/T for (A) α = 0 and (B) α = 1.The solid line with blue circles is the total computational capacity (C l T) and the dashed line with black points is the total quality computational capacity calculated for q = 0.9.

FIGURE 3 |FIGURE 4 |
FIGURE 3 | Memory capacities of the non-linear delay-based RC with one delay line as a function of θ/T for (A) α = 0 and (B) α = 1 when γ = 0.1.The blue stars, red circles, green crosses, pink diamonds correspond to the LMC, QMC, CMC, and XMC.The black solid line is the C s .

FIGURE 5 |
FIGURE 5 | C s of the two-delay-based RC as function of α 2 .Left: α 1 = 1.Right: α 1 = 73.The solid black line is the value of C s for the one-delay case with α = α 1 .Red circles, green diamonds and blue starts correspond to the C s with two delays and a β 2 of 0.05, 0.4, and 0.75, respectively.These results are obtained for T = θ/0.2and γ = 0.1.

FIGURE 6 |
FIGURE 6 | Memory capacities for the two-delay RC as function of α 2 for a fixed α 1 = 1, T = θ/0.2and γ = 0.1.The red circles, green diamonds and blue stars correspond to β 2 equal to 0.05, 0.4, and 0.75, respectively.The solid black line is for β 2 = 0 and corresponds to the one-delay system with α = 1 and β = 0.8.

FIGURE 7 |
FIGURE 7 | Memory capacities for the two-delay-based RC as function of α 2 for a fixed α 1 = 73, T = θ/0.2and γ = 0.1.The red circles, green diamonds and blue stars correspond to β 2 equal to 0.05, 0.4, and 0.75, respectively.The solid black line is for β 2 = 0 and corresponds to the one-delay case with α = 1 and β = 0.8.

FIGURE 8 |
FIGURE 8 | Performance of the non-linear one delay-based RC for two tasks as function of α.Left: NARMA-10 for γ = 0.1.Right: Equalization with SNR = 20 dB and γ = 1.The blue stars correspond to the case of T = 0 and the red circles to the case of T = θ/0.2.