# Distributed Kerr Non-linearity in a Coherent All-Optical Fiber-Ring Reservoir Computer

^{1}Applied Physics Research Group, Vrije Universiteit Brussel, Brussels, Belgium^{2}Laboratoire d'Information Quantique, Université Libre de Bruxelles, Brussels, Belgium

We investigate, both numerically and experimentally, the usefulness of a distributed non-linearity in a passive coherent photonic reservoir computer. This computing system is based on a passive coherent optical fiber-ring cavity in which part of the non-linearities are realized by the Kerr non-linearity. Linear coherent reservoirs can solve difficult tasks but are aided by non-linear components in their input and/or output layer. Here, we compare the impact of non-linear transformations of information in the reservoirs input layer, its bulk—the fiber-ring cavity—and its readout layer. For the injection of data into the reservoir, we compare a linear input mapping to the non-linear transfer function of a Mach Zehnder modulator. For the reservoir bulk, we quantify the impact of the optical Kerr effect. For the readout layer we compare a linear output to a quadratic output implemented by a photodiode. We find that optical non-linearities in the reservoir itself, such as the optical Kerr non-linearity studied in the present work, enhance the task solving capability of the reservoir. This suggests that such non-linearities will play a key role in future coherent all-optical reservoir computers.

## 1. Introduction

In this work, we discuss an efficient, i.e., high speed and low power, analog photonic computing system based on the concept of reservoir computing (RC) [1, 2]. This framework allows to exploit the transient dynamics of a non-linear dynamical system for performing useful computations. In this neuromorphic computing scheme, a network of interconnected computational nodes (called neurons) is excited with input data. The ensemble of neurons is called the reservoir, and the interneural connections are fixed and can be chosen at random. For the coupling of the input data to the reservoir an input mask is used: a set of input weights which determines how strongly each of the inputs couples to each of the neurons. The randomness in both the input mask and internal reservoir connections ensures diversity in the neural responses. The reservoir output is constructed through a linear combination of neural responses (possibly first processed by a readout function) with a set of readout weights. The strength of the reservoir computing scheme lies in the simplicity of its training method, where only the readout weights are tuned to force the reservoir output to match a desired target. In general, a reservoir exhibits internal feedback through loops in the neural interconnections. As a result any reservoir has memory, which means it can retain input data for a finite amount of time, and it can compute linear and non-linear functions of the retained information.

Within the field of reservoir computing two main approaches exist: in the network-based approach networks of neurons are implemented by connecting multiple discrete nodes [3], and in the delay-based approach networks of virtual neurons are created by subjecting a single node (often a non-linear dynamical device) to delayed feedback [4]. In the latter, the neurons are called virtual because they correspond with the traveling signals found in consequent timeslots in the continuous delay-line system. On account of this time-multiplexing of neurons, the input weights are translated into a temporal input mask, which is mixed with the input data before it is injected into the reservoir. Besides ensuring diversity in the neural responses, this input mask also keeps the virtual neurons in a transient dynamic regime, which is a necessary condition for good reservoir computing performance.

Multiple opto-electronic reservoirs have been implemented, both delay-based [5–8] and network-based [9]. Several all-optical reservoirs have been realized, both network-based systems [9–13] and delay-based systems [14–16]. An overview of recent advances is given in reference [17]. We observe that in the field of optical reservoir computing, some implementations operated in an incoherent regime, while others operated in a coherent regime. Coherent reservoirs have the advantage that they can exploit the complex character of the optical field, exploit interferences, and can use the natural quadratic non-linearity of photodiodes. As a drawback, coherent bulk optical reservoirs typically need to be stabilized, but this is not a problem for on chip implementations. Here we investigate the potential advantage of having a coherent reservoir with non-linearity inside the reservoir. We show that it can increase the performance of the reservoir on certain tasks and we expect that future coherent optical reservoir computers will make use of such non-linearities.

State of the art photonic implementations target simple reservoir architectures [13], which can easily be upscaled to increase the number of computational nodes or neurons, thereby enhancing the reservoirs computational capacity. Even a linear photonic cavity can be a potent reservoir [16], provided that some non-linearity is present either in the mapping of input data to the reservoir, or in the readout of the reservoirs response. Despite advances toward all-optical RC [18], many state of the art photonic reservoir computers inherently contain some non-linearity as they are usually set up to process and produce electronic signals. This means that even if the reservoir is all-optical, the reservoir computer in its entirety is of an opto-electronic nature. Commonly used components like a Mach-Zehnder modulators (MZM) and photodetectors (PD) provide means for transitioning back and forth between the electronic and optical domains, and they also—almost inevitably—introduce non-linearities which boost the opto-electronic reservoir computers performance beyond the merits of the optical reservoir itself. When transitioning toward all-optical reservoir computers, such non-linearities can no longer be relied on, and thus the required non-linear transformation of information must originate elsewhere. One option is then to use multiple strategically placed non-linear components in the reservoir, but this can be a costly strategy when upscaling the reservoir [10].

In this paper, we study a delay-based reservoir computer, based on a passive coherent optical fiber ring cavity following reference [16] and exploit the inherent non-linear response of the waveguiding material to build a state-of-the-art photonic reservoir. This means that the non-linearity of our photonic reservoir is not found in localized parts, but rather it is distributed over the reservoirs entire extent. To correctly characterize the effects of such distributed non-linearity, we also consider in this study all other non-linearities that may surround the reservoir. In terms of the reservoirs input mapping, we examined the system responses when receiving optical inputs (linear mapping), and when receiving electronic inputs coupled to the optical reservoir through a Mach-Zehnder modulator with a non-linear mapping. For the reservoirs readout layer, we examined both linear readouts (coherent detection) and non-linear readouts through the quadratic non-linearity of a photodiode measuring the power of the optical field. Taking these different options into account, we then constructed different scenarios in terms of the presence of non-linearities in the input and/or output layer of these reservoir computers. In all these scenarios we numerically benchmarked the RC performance, thus quantifying the difference in performance between systems which do or do not have such distributed non-linearity inside the reservoir. In the next sections, we show our numerical results, which show a broad range of optical input power levels at which these RCs benefit from the self-phase modulation experienced by the signals due to the non-linear Kerr effect induced by the waveguide material. We also show the results of our experimental measurements that indicate how much this distributed non-linearity boosts the reservoir's capacity to perform non-linear computation. In the discussion section, we analyze the impact of these findings on the future of photonic reservoir computing.

## 2. Materials and Methods

### 2.1. Setup

Our reservoir computing simulations and experiments are based on the set of dynamical systems which are discussed in this section. The reservoir itself is implemented in the all-optical fiber-ring cavity shown in Figure 1, using standard single-mode fiber. A polarization controller is used to ensure that the input field *E*_{in} (originating from the green arrow) excites a polarization eigenmode of the fiber-ring cavity. A fiber coupler, characterized by its power transmission coefficient *T* = 50%, couples light in and out of the cavity. The fiber-ring is characterized by the roundtrip length *L* = 10 m (or roundtrip time *t*_{R}), the propagation loss α (taken here 0.18 dB km-1), the fiber non-linear coefficient γ (which is set to 0 to simulate a linear reservoir, and set to γ_{Kerr} = 2.6 mrad m^{-1} W^{-1} to simulate a non-linear reservoir), and the cavity detuning δ_{0}, i.e., the difference between the roundtrip phase and the nearest resonance (multiple of 2π). This low-finesse cavity is operated off-resonance, with a maximal input power of 50 mW (17 dBm). A network of time-multiplexed virtual neurons is encoded in the cavity field envelope. The output field *E*_{out} is sent to the readout layer (through the orange arrow) where the neural responses are demultiplexed.

**Figure 1**. Schematic of the fiber-ring cavity of length *L* used to implement an optical reservoir. The green (orange) arrow indicates a connection with an input (output) layer. A polarization controller maps the input polarization onto a polarization eigenmode of the cavity. A coupler with power transmission coefficient *T* couples the input field ${E}_{in}^{(n)}(\tau )$ to the cavity field *E*^{(n)}(*z*, τ) and couples to the output field ${E}_{out}^{(n)}(\tau )$, where *n* is the roundtrip index, τ is time (with 0 < τ < *t*_{R}) and *z* is the longitudinal position in the ring cavity.

The input field *E*_{in} can originate from one of two different optoelectronic input schemes. Firstly we consider a scenario where the input signal *u*(*n*) (with discrete time *n*) is amplitude-encoded in an optical signal *E* ~ *u*(*n*), as shown in Figure 2A. The reservoir's input mask *m*(τ) is mixed with the input signal by periodic modulation of the optical input signal using an MZM. This scheme was implemented in reference [7], but the non-linearity of the MZM was avoided through pre-compensation of the electronic input signal. Note that the discrete time *n* corresponds with the roundtrip index. And as delay-based reservoirs are typically set up to process 1 sample each roundtrip, *n* also corresponds with the sample index. However, we have chosen to hold each input sample over multiple roundtrips, for reasons which are explained in the Results section [that is, *u*(*n*) is constant over multiple values of *n*]. Secondly we consider a scenario where we use the MZM to modulate a CW optical pump following reference [14], as shown in Figure 2B. Here the input signal is first mixed with the input mask and then used to drive the MZM. It is known that the MZM's non-linear transfer function can affect the RC system's performance [16], but the implications for a coherent non-linear reservoir have not yet been investigated.

**Figure 2**. Schematics of input and output layers connecting to the reservoir shown in Figure 1. In the linear input scheme **(A)** the Mach-Zehnder modulator (MZM) superimposes the reservoir's input mask *m*(τ) on the optical signal *E* ~ *u*(*n*) carrying the input data. In the (possibly) non-linear input scheme **(B)** the input data is mixed with the input mask and then drives the MZM to modulate a CW optical pump. In the linear output scheme **(C)** a reference field *E*_{LO} is used to implement coherent detection, allowing a quadrature of the complex optical field to be measured. Note that coherent detection requires two such readout arms with phase-shifted reference fields in order to measure the complex output field *E*_{out}. In the non-linear output scheme **(D)** only a photodetector (PD) is used, thus only allowing the optical output power $|{E}_{out}{|}^{2}$ to be recorded.

Similarly, the output field *E*_{out} can be processed by two different optoelectronic readout schemes. Firstly we consider a coherent detection scheme as shown in Figure 2C. Mixing the reservoir's output field with a reference field *E*_{LO} allows to record the complex neural responses, time-multiplexed in the output field *E*_{out}. Secondly, we consider a readout scheme where a photodetector (PD) measures the optical power of the neural responses $|{E}_{out}{|}^{2}$, as shown in Figure 2D.

With high optical power levels and small neuron spacing (meaning fast modulation of the input signal), dynamical and non-linear effects other than the Kerr non-linearity may appear, such as photon-phonon interactions causing Brillouin and Raman scattering, and bandwidth limitations caused by the driving and readout equipment. We want to focus in the present work on the effects of the Kerr non-linearity. Combined with the memory limitations of the oscilloscope, we therefore limit our reservoir to 20 neurons, with a maximal input power of 100 mW.

The current setup is not actively stabilized. We have found that the cavity detuning δ_{0} does not vary more than a few mrad over the course of any single reservoir computing experiment, where a few thousand input samples are processed. A short header, added to the injected signal, allows us to recover the detuning δ_{0} post-experiment. We effectively measure the interference between a pulse which reflects off the cavity and a pulse which completes one roundtrip through the cavity. However, we find that the precise value of δ_{0} has no significant influence on the experimental reservoir computing results.

### 2.2. Physical Model

Here we discuss the mean-field model used to describe the temporal evolution of the electric field envelope *E*^{(n)}(*z*, τ) inside the cavity, where *n* is the roundtrip index, 0 < τ < *t*_{R} is time (bound by the cavity roundtrip time *t*_{R} and 0 < *z* < *L* is the longitudinal coordinate of the fiber ring cavity with length *L*. The position *z* = 0 corresponds to the position of the fiber coupler. The position *z* = *L* corresponds to the same position, but after propagation through the entire fiber-ring. We will describe the evolution on a per-roundtrip basis (i.e., with varying roundtrip index *n*). With this notation *E*^{(n)}(*z*, τ) represents the cavity field envelope measured at position *z* at time τ during the *n*-th roundtrip. For each roundtrip we model propagation through the non-linear cavity to obtain *E*^{(n)}(*z* = *L*, τ) from *E*^{(n)}(*z* = 0, τ). We then express the cavity boundary conditions to obtain *E*^{(n+1)}(0, τ) from *E*^{(n)}(*L*, τ) and to obtain the field ${E}_{out}^{(n)}(\tau )$ at the output of the fiber-ring reservoir. For now we will omit τ.

Firstly, to model propagation in the fiber-ring cavity we take into account propagation loss and the non-linear Kerr-effect. Since the non-linear propagation model is independent from the roundtrip index *n*, this subscript is omitted in the following description. The non-linear propagation equation is given by

Here, α is the propagation loss and γ is the non-linear coefficient which is set to γ = 0 to simulate a linear reservoir, and set to γ = γ_{Kerr} to include the non-linear Kerr effect caused by the fiber waveguide. We do not include dispersion effects at the current operating point of the system, since the neuron separation is much larger than the diffusion length, hence also τ can be omitted in the non-linear propagation model. The evolution of the power |*E*(*z*)|^{2} is readily obtained by solving the corresponding propagation equation

With ϕ_{z} the non-linear phase acquired during propagation over a distance *z*, we know that the solution of *E*(*z*) will be of the form

Since this non-linear phase depends on the power evolution given by Equation (2), an expression for ϕ_{z} is found to be

At this point, we can introduce the effective propagation distance *z*_{eff} as

In general (since α ≥ 0) we have *z*_{eff} ≤ *z*. Substituting these result in Equation (4) yields the complete solution for propagation of the cavity field envelope

Finally, we reinstitute the roundtrip index *n* and the time parameter τ which allows us to combine this non-linear propagation model with the cavity boundary conditions.

In these equations, *T* represents the power transmission coefficient of the cavity coupler, and δ_{0} represents the cavity detuning (i.e., difference between the roundtrip phase and the closest cavity resonance). Further, the input field ${E}_{in}={E}_{in}^{(n)}(\tau )$ changes with the roundtrip index *n* as new data samples can be injected into the system, and is modulated in time using the input mask to create a network of virtual neurons. The output field ${E}_{out}={E}_{out}^{(n)}(\tau )$ containing the neural responses is sent to a measurement stage.

### 2.3. Reservoir Computing

The framework of reservoir computing allows to exploit the transient non-linear dynamics of a dynamical system to perform useful computation [1, 2]. For the purpose of reservoir computing, virtual neurons (dynamical variables, computational nodes) are time-multiplexed in τ-space of the physical system described by Equation (8), following the delay-based reservoir computing scheme originally outlined in reference [4]. As such, the input field ${E}_{in}^{(n)}(\tau )$ varies with *n* as new input samples arrive, and varies with τ to implement the input mask, which excites the neurons into a transient dynamic regime. Subsequently, the neural responses are encoded in the output field ${E}_{out}^{(n)}(\tau )$ and need to be demultiplexed from τ-space. As in references [5, 16] the length *t*_{M} of the input mask *m*(τ) is deliberately mismatched from the cavity roundtrip time *t*_{R}. Instead, we set *t*_{M} = *t*_{R}*N*/(*N*+1) which provides interconnectivity between the *N* virtual neurons in a ring topology. The input mask *m*(τ) is a piecewise constant function, with intervals of duration θ = *t*_{M}/*N*. The signal *I*^{(n)}(τ) injected into the RC is constructed by multiplying the input series *u*(*n*) with the input mask, *I*^{(n)}(τ) = *u*(*n*)*m*(τ). When the input is coupled linearly to the reservoir then ${E}_{in}^{(n)}(\tau )~{I}^{(n)}(\tau )$. This would be the case when *u*(*n*) is an optical signal periodically modulated with the input mask signal *m*(τ). When a MZM modulator with transfer function *f* is used to convert the electronic signal *I*^{(n)}(τ) to the optical domain then ${E}_{in}^{(n)}(\tau )~f({I}^{(n)}(\tau ))$, where *f* can be non-linear.

Note that in reference [16] the sample duration *t*_{S} is matched to the length of the input mask *t*_{M}, allowing the reservoir to process 1 input sample approximately every roundtrip, as *t*_{S} = *t*_{M} ≲ *t*_{R}. However, for reasons explained in the Results section, we will study different sample durations by holding input samples over multiple durations of the input mask, *t*_{S} = *k t*_{M} with integer *k* as illustrated in Figure 3. This inevitably slows the reservoir down, as it only processes 1 input sample approximately every *k* roundtrips. But it also provides practically straightforward means to accumulate more non-linear processing of the data inside the reservoir, which can then be measured and quantified.

**Figure 3**. Schematic of input and output timing, with *t*_{S} the sample duration, *t*_{M} the input mask duration and *t*_{R} the roundtrip time. Input samples are injected during (integer) *k* roundtrips (bars in alternating colors) and the neural responses are recorded at times {τ_{i}} (blue tick marks) during the last of those *k* roundtrips.

Since the virtual neurons are time-multiplexed in this delay-based reservoir computer, they need to be de-multiplexed from ${E}_{out}^{(n)}(\tau )$ in the readout layer by sampling this output field at a set of times {τ_{i}} (with *i* the neuron index and 1 < *i* < *N* when *N* neurons are used) as shown in Figure 3. The dynamical neural responses ${x}_{i}(n)={E}_{out}^{(n)}({\tau}_{i})$ are recorded and used to train the reservoir to perform a specific task. That is, we optimize a set of readout weights *w*_{i} which are used to combine the neural readouts into a single scalar reservoir output *y*(*n*). In general the reservoir output is constructed as

where the neural responses *x*_{i}(*n*) are first parsed by an output function *g*(*x*) taking into account the operation of the readout layer and readout noise ν. In all simulations the fixed level of readout noise is matched to the experimental conditions. When the complex-valued reservoir states are directly recorded, then *g*(*x*) = *x* + ν and the readout weights *w*_{i} are complex too, such that *y* is real. If however, a PD measures the power of the neural responses, then *g*(*x*) = |*x*|^{2} + ν which is real-valued, and the readout weights will be real-valued too. Tasks are defined by the real-valued target output ŷ. Optimization of the readout weights occurs over a training set of *T*_{train} input and target samples, and is achieved through least squares regression. This procedure minimizes the mean squared error between the reservoir output *y* and target output ŷ, averaged over all samples.

These optimized readout weights are then validated on a test set of *T*_{test} new input and target samples. A common figure of merit to quantify the reservoir's performance is the normalized mean square error (NMSE) defined as

### 2.4. Balanced Mach-Zehnder Modulator Operation

Here we briefly investigate the relevant non-linearities which occur when mapping an electronic signal to an optical signal using an MZM. The operation of our balanced MZM can be described as

where *E*_{0} represents the incident CW pump field, *E*_{in} is the transmitted field which will be the input field to the optical reservoir, *V*_{π} determines at which voltage the zero intensity point occurs (point of no transmission), and *V* is the voltage of the applied electronical signal consisting of a bias contribution *V*_{b} and a zero-mean signal *V*_{s}, i.e., *V* = *V*_{b} + *V*_{s}. For our numerical investigation, we will set the amplitude of the signal voltage to |*V*_{s}| = *V*_{π}/2. First, we investigate the zero intensity bias point, *V*_{b} = *V*_{π}. In this case, we can approximate Equation (12) with the following Taylor expansion

With (*E*_{in}/*E*_{0})_{max} representing the maximal value of $\frac{{E}_{in}}{{E}_{0}}$ with the given bias voltage *V*_{b} and signal amplitude |*V*_{s}|, the relative error *r*.*e*. of the Taylor expansion (14)

is smaller than 1%. When the cubic term ($~{V}_{s}^{3}$) of the approximation *f*(*V*_{s}) is omitted, this error increases to 11%. This means that at this operating point of the MZM, there is a significant non-linearity which scales with the input signal cubed.

Next, we investigate the linear intensity operating point, *V*_{b} = *V*_{π}/2. Although the MZM's transfer function at this operating point is the most linear in terms of the transmitted optical power, it is highly non-linear in terms of the transmitted optical field. In this case, we replace Equation (14) with

as we need all polynomial terms up to order 4 to keep the relative error defined by Equation (15) below 1%. In this case, omitting terms of orders above 1 in the approximation *f*(*V*_{s}) increases the relative error of the Taylor expansion to 26%. This means that at this operating point of the MZM there are multiple polynomial non-linearities and that the total non-linear signal distortion is stronger compared with the zero intensity bias point.

Furthermore, during our experiments we have decided to operate the MZM in a linear regime. This allows for the non-linear effects inside the reservoir to be more readily measured. To this end, we tuned the MZM close to the zero intensity operating point, *V*_{b} = *V*_{π} − δ_{V} with δ_{V} ≪ *V*_{π} and reduced the signal amplitude |*V*_{s}|. The small deviation δ_{V} is used to generate a bias in the optical field injected into the reservoir.

### 2.5. Memory Capacities

To benchmark the performance of an RC, one can train it to perform one or several benchmark tasks. Alternatively, there exists a framework to quantify the system's total information processing capacity. This capacity is typically split into two main parts: the capacity of the system to retain past input samples is captured by the linear memory capacity [19], and the capacity of the system to perform non-linear computation is captured by the non-linear memory capacity [20]. It is known that the total memory capacity has an upper bound given by the number of dynamical variables in the system, which in our system is the number of neurons in the reservoir. It is also known that readout noise reduces this total memory capacity, and that there is a trade-off between linear and non-linear memory capacity, depending on the operating regime of the dynamical system. In order to measure these capacities for our reservoir computer a series of independent and identically distributed input samples *u*(*n*) drawn uniformly from the interval [−1, 1] is injected into the reservoir, with discrete time *n*. The RC is subsequently trained to reconstruct a series of linear and non-linear polynomial functions depending on past inputs *u*(*n*−*i*), looking back *i* steps in the past. Following reference [20] these functions are chosen to be Legendre polynomials *P*_{d}(*u*) (of degree *d*), because they are orthogonal over the distribution of the input samples. As an example, we can train the reservoir to reproduce the target signal ŷ(*n*), given by

The ability of the RC to reconstruct each of these functions is evaluated by comparing the reservoir's trained output *y* with the target *ŷ* for previously unseen input samples. This yields a memory capacity *C* which lies between 0 and 1 [20],

where 〈.〉 denotes the average over all samples used for the evaluation of *C*. Due to the orthogonality of the polynomial functions over the distribution of the input samples, the capacities corresponding to different functions yield independent information and can thus be summed to quantify the total memory capacity, i.e., the total information processing capacity of the RC. The memory functions are typically grouped by their total degree, which is the sum of degrees over all constituent polynomial functions, e.g., Equation (17) has total degree 3. Summing all memory capacities corresponding with functions of identical total degree yields the total memory capacity per degree. This allows to quantify the contributions of individual degrees to the total memory capacity of the RC, which is the sum over all degrees. As the memory capacities will become small for large degrees, the total memory capacity is still bound.

Since the reservoirs are trained and their performance is evaluated on finite data sets, we run the risk of overestimating the memory capacities *C*, whose estimator Equation (18) is plagued by a positive bias [20]. Therefore, a cutoff capacity *C*_{co} is used (*C*_{co} ≈ 0.1 for 1,000 test samples) and capacities below this cutoff are neglected (i.e., they are assumed to be 0).

Note that the trade-off between linear and non-linear memory capacity is typically evaluated by comparing the total memory capacity of degree 1 (linear) with the total memory capacity of all higher degrees (non-linear). However, special attention is due when a PD is present in the readout layer of our RC. If a reservoir can (only) linearly retain past inputs *u*(*n*−*i*) (*i* steps in the past) then any neural response *x*(*n*) consists of a linear combination (with a bias term *b* and fading coefficients *a*_{i}) of those past inputs

and subsequently the optical power *P*_{x} measured by the PD is given by

which consists of polynomial functions of past inputs of degrees 1 and 2. Thus, in this case the total linear memory capacity of the RC is represented by the total memory capacity of degrees 1 and 2 combined. In case the bias term *b* is lacking, only memory capacities of degree 2 will be present. On the other hand, if a PD is used in the output and memory capacities of degree higher than 2 are present, then this indicates that the reservoir itself is not linear, i.e., cannot be represented by a function of the form Equation (19).

## 3. Results

### 3.1. Numerical RC Performance: Sante Fe Time Series Prediction

For the injection of input samples to the optical reservoir, we consider two strategies as discussed in section 2.1 and in Figures 2A,B, referred to here as the linear and non-linear input regimes, respectively. The exact shape of the non-linearity in the non-linear regime depends, among other things, on the operating point (or bias voltage) of the MZM, as discussed in section 2.4. We will demonstrate this by showing results around both the linear intensity operating point and the zero intensity operating point of the MZM. For the readout of the reservoir response, we also consider two cases as discussed in section 2.1 and in Figures 2C,D, referred to here as the linear and non-linear output regimes, respectively.

We have thus identified four different scenarios based on the absence or presence of non-linearities in the input and output layer of the reservoir computer. As we will show, we have for each of these cases numerically investigated the effect of the distributed non-linear Kerr effect, present in the fiber waveguide, on RC performance. For this evaluation, we have used 100 neurons to solve the Santa Fe time series prediction task [21] and each input sample is injected during six roundtrips (*t*_{S} = *kt*_{M} with *k* = 6) for reasons which will become clear in section 3.2. Here, a pre-existing signal generated by a laser operating in a chaotic regime is injected into the reservoir. The target at each point in time is for the reservoir computer to predict the next sample. Performance is evaluated using the NMSE, where lower is better. Figure 4 has four panels corresponding to these four scenario's. Each panel shows the NMSE as function of the average optical power per neuron inside the cavity. Dashed blue lines correspond with simulation results of linear reservoirs (i.e., with the non-linear coefficient γ set to 0), and full red lines correspond with simulation results of reservoirs with Kerr non-linear waveguides (i.e., γ set to γ_{Kerr}).

**Figure 4**. Numerical results of fiber-ring reservoir computer on Santa Fe time series prediction tasks. In all panels the prediction error (NMSE) is plotted vs. the average neuron power 〈*P*_{x}〉. **(a,b)** Correspond with a linear input layer, where **(c,d)** correspond with a non-linear input layer using the MZM's non-linear transfer function. The non-linear input regime shows results for two different operating points of the MZM with different strengths of non-linear transformation. **(a,c)** Correspond with a linear output layer, where **(b,d)** correspond with a non-linear output layer using the PD.

In Figure 4a both the input and output layers of the reservoir are strictly linear (i.e., optical input and coherent detection). It is clear that the linear reservoir (γ = 0) scores poorly, with the NMSE approaching 20%. For a wide range of optical power levels, the presence of the Kerr non-linear effect (γ = γ_{Kerr}) induced by the fiber waveguide boosts the RC performance, with an optimal NMSE just below 1%. This can be readily understood as it is well-known that for this task, some non-linearity is required in order to obtain good RC performance. Note that the average neuron power 〈*P*_{x}〉 can be used to estimate the average non-linear phase ϕ_{Kerr} the signals will acquire during the sample duration *t*_{S}, as ϕ_{Kerr} = γ_{Kerr}〈*P*_{x}〉*Lt*_{S}/*t*_{M}. We observe that without the presence of phase noise in the cavity, the boost to the RC performance due to the Kerr effect starts at very small values of the estimated non-linear phase, and breaks down when ϕ_{Kerr} ≳ 1. Switching to Figure 4b we have now introduced the square non-linearity by using a PD in the readout layer. Focusing on the results obtained with a linear reservoir, we see that the PD's non-linearity alone decreases the NMSE down from 20 to ~5% (γ = 0). Although the PD's non-linearity clearly boosts the RC performance on this task, its effect is rather restricted. The PD only generates squared terms, and linear terms if a bias is present, see section 2.5, depending on the MZM's operating point. Furthermore, this non-linearity does not affect the neural responses nor the operation of the reservoir itself, as it only applies to the readout layer. It can thus be understood that the introduction of the Kerr non-linearity inside the reservoir warrants an additional significant drop in NMSE, to below 1% (γ = γ_{Kerr}). In Figure 4c, the output layer is linear again, but now we have introduced the MZM in the input layer. The closed markers correspond with simulations where the MZM operates around the zero intensity operating point or the point of minimal transmission (*V*_{bias} = *V*_{π}). In terms of the optical field modulation, this is the most linear regime. It is thus no surprise that the performance of both linear and non-linear reservoirs mimics that Figure 4a where no non-linearity was present in the input layer. The only difference is that the error of the linear reservoir drops from 20% to about 13%(γ = 0, *V*_{bias} = *V*_{π}) because of the small residual non-linearity at this operating point of the MZM. The round markers correspond with simulations where the MZM operates around the linear intensity operating point (*V*_{bias} = *V*_{π}/2). In terms of the optical field modulation, the non-linearity in the mapping of input samples to the optical field injected into the reservoir is more non-linear at this operating point. This is why even the linear reservoir manages to achieve errors below 4% (γ = 0, *V*_{bias} = *V*_{π}/2). Again we see that the introduction of the non-linear Kerr effect allows the NMSE to drop even further, to below 1% (γ = γ_{Kerr}). In fact, this scenario is similar to the scenario with linear input mapping and non-linear output mapping, Figure 4b. Finally, in Figure 4d, non-linearities are present in both the input mapping and readout layer. With the MZM operating around the zero intensity operating point, there is only a weak non-linearity in the input mapping and thus, as expected, both linear and non-linear reservoirs show trends which are very similar to the scenario where the input mapping is linear, Figure 4c. With the MZM operating around the linear intensity operating point (*V*_{bias} = *V*_{π}/2) however, we observe a scenario in which the RC does not seem to benefit from the presence of the Kerr non-linear effect. It seems that with significant non-linearities present in both input and output layers of the RC the distributed non-linear effect inside the reservoir cannot further decrease the NMSE below values attained by the linear reservoir, which is below 1% (*V*_{bias} = *V*_{π}/2). In all other cases, Figures 4a–c, we find that the distributed non-linearity inside the reservoir significantly boosts RC performance, and we find that its presence is critical when no other non-linearities are available.

### 3.2. Experimental Verification: Linear and Non-linear Memory Capacity

In this section we compare experimental results with detailed numerical simulations. For the experimental verification of our work, we are currently limited to operate with 20 neurons, as explained in section 2.1. Therefore, we have chosen not to perform the reservoir computing experiment on the Santa Fe task. With this few neurons, tasks like the Santa Fe task become hard for the reservoir. Instead we turn to a more academic task which allows us to quantify the reservoir's memory and non-linear computational capacity in a more complete and task-independent way. We experimentally measure the linear and non-linear memory capacities considered in section 2.5. Even with this few neurons the evaluation of the memory capacities can yield meaningful results while taking up comparatively little processing time.

For these experiments, the input layer to our fiber-ring reservoir contains a balanced MZM tuned to operate in a linear regime as outlined in section 2.4. The output layer employs a PD to measure the neural responses. That is, we use the setups of Figures 2B,D but with the MZM operated as in Equation (2.4). Following reference [20], we have driven the reservoir with a series of independent and identically distributed random samples and trained the RC to reproduce different linear and non-linear polynomial functions of past input samples. The capacity of the reservoir to reconstruct these functions was then evaluated and results were grouped according to the function's polynomial degree. To retain oversight on the results, we will only show the total capacity per degree, by summing all capacities corresponding with functions of the same total polynomial degree. In Figure 5 we show the total memory capacity per degree, encoded in the height of vertically stacked and color-coded bars. The stacking allows to visualize the contributions of individual degrees to the total overall memory capacity (summed over all degrees). Capacities of degree higher than 4 are not considered, as they were found not to contribute significantly to the total memory capacity of the system. For results labeled *bias off* the MZM operates at the zero-intensity point (*V*_{bias} = *V*_{π}), and moving toward the *bias on* label, we tuned the MZM's bias voltage (*V*_{bias} = *V*_{π} − δ_{V}, with δ_{V} ≪ *V*_{π}). This introduces a small bias component to the optical field injected into the reservoir, without compromising the linear operation of the MZM. The experiment was also repeated for different values of the sample duration *t*_{S} with respect to the input mask periodicity *t*_{M} (approximately equal to the cavity roundtrip *t*_{R}). We expect the sample duration to play a very important role, since it determines how much time a piece of information spends inside the cavity, and thus how much non-linear phase can be acquired. The ratio *t*_{S}/*t*_{M} is gradually increased from *t*_{S} = 2*t*_{M} in (first row) Figures 5A–C, to *t*_{S} = 6*t*_{M} in (middle row) Figures 5D–F, and finally to *t*_{S} = 10*t*_{M} in (bottom row) Figures 5G–I. The experimental results in (left column) Figures 5A–G are compared with numerical results on a linear reservoir (γ = 0) in (middle column) Figures 5B–H, and a non-linear reservoir (γ = γ_{Kerr}) in (right column) Figures 5C–I.

**Figure 5**. Comparison between experimental results **(A,D,G)** and numerical models with linear (γ = 0) **(B,E,H)** and non-linear (γ = γ_{Kerr}) reservoirs **(C,F,I)**. The stacked vertical bars are color-coded to represent the total memory capacities (TMC) of degree 1 (blue), 2 (red), 3 (orange), and 4 (purple). As such, the total height represents the total overall memory capacity. A control variable to the MZM δ_{V}, is varied to include a small bias component to the injected optical field, where *bias off* corresponds with δ_{V} = 0 and *bias on* corresponds with a small non-zero value 0 < δ_{V} ≪ *V*_{π}. The sample duration *t*_{S} is varied from 2 times **(A–C)**, to 6 times **(D–F)** and finally to 10 times **(G–I)** the input mask period *t*_{M} (≈ cavity roundtrip time *t*_{R}).

Firstly, in Figure 5A we observe that without bias to the optical input field (*V*_{bias} = *V*_{π}) the total memory capacity originates almost completely from the polynomial functions of degree 2 which means (given the presence of the PD in the readout layer) that the optical system is almost completely linear. Then, as an optical field bias is introduced we find that the total linear memory capacity of the system is now shared between degrees 1 and 2. As expected on account of quadratic non-linearity due to the PD, Equation (20), the contribution of (odd) degree 1 grows with the increasing bias. Beyond these capacities of degrees 1 and 2, we also observe a small contribution of capacities of degrees 3 and 4. We ascribe these contributions to the imperfect tuning of the MZM and thus a small residual non-linearity in the input mapping. Note that the simulations take into account the quasi-linear input mapping of the MZM, but seemingly underestimate the residual non-linearities to be insignificant. The imperfection of the MZM tuning also leads to a small residual bias component to the optical injected field, resulting in a small non-zero capacity of degree 1. Numerical simulations of linear (γ = 0) and non-linear (γ = γ_{Kerr}) reservoirs in Figures 5B,C, respectively, show the same growth in the memory capacity of degree 1 at the expense of the memory capacity of degree 2 when the bias is changed. Note that both simulations seem to overestimate the minimal bias required to obtain a significant memory capacity of degree 1. At this sample duration (*t*_{S} = 2*t*_{M}) neither simulations indicate any significant contributions of capacities with degrees beyond 2.

When increasing the sample duration (*t*_{S} = 6*t*_{M} and *t*_{S} = 10*t*_{M}), the experimental results in Figures 5D,G show a steady increase in the contributions of capacities with degrees 3 and 4. This increase is attributed to the non-linear Kerr effect, due to the larger accumulation of non-linear phase during the time each sample is presented to the reservoir. At the same time we see a decrease in the capacities of degrees 1 and 2. As explained before, due to the PD these capacities capture the reservoir's capacity to linearly retain past samples. This trade-off between linear memory capacity (here degrees 1 and 2) and non-linear computational capacity (here degrees 3 and 4) is well-documented [20]. Because we use the sample duration (*t*_{S} = *kt*_{M}≈*kt*_{R}) to control the cumulative non-linear effect inside the reservoir, we inevitably increase the mismatch between the inherent timescale of the input data (i.e., the sample duration *t*_{S}) and the inherent timescale of the reservoir (i.e., the cavity roundtrip *t*_{R}). and alter the reservoirs internal topology. When each sample is presented longer, past samples have spent more time inside the lossy cavity by the time they are accessed through the reservoirs noisy readout. Thus, on the longer timescales (*t*_{S}) at which information is now processed, it is harder for the reservoir (operating at timescale *t*_{R}) to retain past information. These aspects explain why the overall total memory capacity (summed over all degrees) decreases with increased sample duration *t*_{S}. The numerical results on both the linear reservoir (γ = 0) in Figures 5E,H and the non-linear reservoir (γ = γ_{Kerr}) in Figures 5F,I correctly predict a drop in the total linear memory capacities (degrees 1 and 2). Due to the memory capacity cutoff explained in section 2.5, small capacities are harder to quantify accurately and systematic underestimation can occur. This explains why the small total memory capacities obtained experimentally are larger than the small total memory capacity obtained numerically. The correspondence for large total memory capacities is better as they are largely unaffected by the cutoff. But besides the drop in linear memory capacities, only the non-linear reservoir model can explain the steady increase in non-linear memory capacities (degrees 3 and 4) with longer sample durations. With increasing sample duration *t*_{S} the simulated non-linear reservoir shows the contribution of the total non-linear memory capacity (degrees 3 and 4) to the total memory capacity (all degrees) growing from 0 to 25.4%, and in the experiment this contribution starts at 6.4% and grows up to 23.6%. This sizable increase in non-linear computation capacity can be of considerable significance to the reservoir's performance on other tasks, as shown earlier. When comparing the experimental results with the non-linear reservoir model for all given sample durations *t*_{S}, the main difference is that the capacities of degree 3 seem to appear sooner (i.e., for smaller sample duration) in the experiment. This can be explained by the residual bias component to the optical injected field. Such a bias makes it easier to produce polynomial functions of odd degrees, thus explaining their earlier onset. This can be explained by the quadratic nature of the Kerr non-linearity, as the reasoning previously applied to the quadratic non-linearity of the PD in Equation (20) can be generalized to memory capacities of higher degree.

## 4. Discussion

We have identified and investigated the role of non-linear transformation of information inside a photonic computing system based on a passive coherent fiber-ring reservoir. Non-linearities can occur at different places inside a reservoir computer: the input layer, the bulk and the readout layer. State-of-the-art opto-electronic RC systems often include one or several components which inevitably introduce non-linearities to the computing system. On the reservoir's input side, we have compared a linear input regime with the usage of a MZM, which has a non-linear transfer function, to convert electronic data to an optical signal. On the reservoir's output side, we have compared a linear output regime with the usage of a PD which measures optical power levels, that scale quadratically with the optical field strength of the neural responses. We numerically evaluated such systems using a benchmark test and found that non-linear input and/or output components are needed to obtain good RC performance when the optical reservoir itself (i.e., the core of the RC system) is a strictly linear system.

Internal to the reservoir, we investigated the effect of the optical Kerr non-linear effect on RC performance. Our numerical benchmark test showed a large band of optical powers where the presence of this distributed non-linear effect, caused by the waveguiding material of the reservoir, significantly decreased the RC's error figure. Our numerical and experimental measurements of the linear and non-linear memory capacity of this RC system showed that the accumulation of non-linear phase due to the distributed non-linear Kerr effect strongly improves the system's non-linear computational capacity. We can thus conclude that for photonic reservoir computers with non-linear input and/or output components, the presence of a distributed non-linear effect inside the optical reservoir improves the RC performance. Furthermore, the distributed non-linearity is essential for good performance in the regime where non-linearities are absent from both the input and output layer. This may be the case in an all-optical reservoir computer (i.e., with optical input and output layers). We have shown that the effect of the distributed non-linearity is strong enough to compensate for the lack of non-linear transformation of information elsewhere in the system, and that it allows to build a computationally strong photonic computing system.

Finally, we expect a design approach including distributed non-linear effects to improve the scalability of these types of computational devices. In general, when harder tasks are considered, larger reservoirs are required. One way to increase the size of a delay-based reservoir is to implement a longer delay-line. This increase in length of the signal propagation path naturally increases the effect of distributed non-linearities as considered in this work. Similarly, increasing the size of a network-based reservoir will also lead to more and/or longer signal paths, resulting in the increased accumulation of non-linear effects, although waveguides with stronger non-linear effects may have to be considered to compensate for the shorter connection lengths in on-chip implementations. We believe that the natural increase in the strength of non-linear effects, following the increase in size of the reservoir, may diminish the need to place discrete non-linear components inside large networks used for strongly non-linear tasks. As such, both the complexity and cost of such systems would be reduced. Since the waveguiding material itself is used to induce non-linear effects, the waveguide properties (such as material and geometry) determines the optical field confinement and thus regulate the strength of non-linear interactions. Consequently it may be possible to create reservoirs where deliberate variations in the waveguide properties are used to tune the strength of the distributed non-linear effect in different regions of the system. This would allow for a trade off between the system's linear memory capacity and its non-linear computational capacity, such that a large number of past input samples can be retained (in some parts of the system) and then non-linearly processed to solve difficult tasks (in other parts of the system). These considerations indicate why distributed non-linear effects may play a major role in future implementations of powerful photonic reservoir computers.

## Data Availability Statement

The data used in this study for the Sante Fe prediction task [21] is one of the data sets from the “Time Series Prediction Competition” sponsored by the Santa Fe Institute, initiated by Neil Gershenfeld and Andreas Weigend in the early 90s, no licenses/restrictions apply. No further datasets were used or generated.

## Author Contributions

The idea was first conceived by GVa and finalized together with GVe and SM. JP was responsible for the physical modeling, the numerical calculations, the experimental verification, and wrote most of the manuscript. All coauthors contributed to the discussion of the results and writing of the manuscript.

## Funding

We acknowledge financial support from the Research Foundation Flanders (FWO) under grants 11C9818N, G028618N, and G029519N, the Fonds de la Recherche Scientifique (FRS-FNRS), the Hercules Foundation and the Research Council of the VUB.

## Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer VB declared a shared affiliation, with no collaboration, with the authors JP and SM to the handling editor at time of review.

## References

1. Maass W, Natschläger T, Markram H. Real-time computing without stable states: a new framework for neural computation based on perturbations. *Neural Comput.* (2002) **14**:2531–60. doi: 10.1162/089976602760407955

2. Jaeger H, Haas H. Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication. *Science.* (2004) **304**:78–80. doi: 10.1126/science.1091277

3. Verstraeten D, Schrauwen B, dHaene M, Stroobandt D. An experimental unification of reservoir computing methods. *Neural Netw.* (2007) **20**:391–403. doi: 10.1016/j.neunet.2007.04.003

4. Appeltant L, Soriano MC, Van der Sande G, Danckaert J, Massar S, Dambre J, et al. Information processing using a single dynamical node as complex system. *Nat Commun.* (2011) **2**:468. doi: 10.1038/ncomms1476

5. Paquot Y, Duport F, Smerieri A, Dambre J, Schrauwen B, Haelterman M, et al. Optoelectronic reservoir computing. *Sci Rep.* (2012) **2**:287. doi: 10.1038/srep00287

6. Larger L, Soriano MC, Brunner D, Appeltant L, Gutiérrez JM, Pesquera L, et al. Photonic information processing beyond turing: an optoelectronic implementation of reservoir computing. *Opt Express.* (2012) **20**:3241–9. doi: 10.1364/OE.20.003241

7. Duport F, Smerieri A, Akrout A, Haelterman M, Massar S. Fully analogue photonic reservoir computer. *Sci Rep.* (2016) **6**:22381. doi: 10.1038/srep22381

8. Larger L, Baylón-Fuentes A, Martinenghi R, Udaltsov VS, Chembo YK, Jacquot M. High-speed photonic reservoir computing using a time-delay-based architecture: million words per second classification. *Phys Rev X.* (2017) **7**:011015. doi: 10.1103/PhysRevX.7.011015

9. Bueno J, Maktoobi S, Froehly L, Fischer I, Jacquot M, Larger L, et al. Reinforcement learning in a large-scale photonic recurrent neural network. *Optica.* (2018) **5**:756–60. doi: 10.1364/OPTICA.5.000756

10. Vandoorne K, Dambre J, Verstraeten D, Schrauwen B, Bienstman P. Parallel reservoir computing using optical amplifiers. *IEEE Trans Neural Netw.* (2011) **22**:1469–81. doi: 10.1109/TNN.2011.2161771

11. Vandoorne K, Mechet P, Van Vaerenbergh T, Fiers M, Morthier G, Verstraeten D, et al. Experimental demonstration of reservoir computing on a silicon photonics chip. *Nat Commun.* (2014) **5**:3541. doi: 10.1038/ncomms4541

12. Katumba A, Heyvaert J, Schneider B, Uvin S, Dambre J, Bienstman P. Low-loss photonic reservoir computing with multimode photonic integrated circuits. *Sci Rep.* (2018) **8**:2653. doi: 10.1038/s41598-018-21011-x

13. Harkhoe K, Van der Sande G. Dual-mode semiconductor lasers in reservoir computing. In: *Neuro-Inspired Photonic Computing*. Vol. **10689**. Straatsburg: International Society for Optics and Photonics (2018). p. 106890B.

14. Duport F, Schneider B, Smerieri A, Haelterman M, Massar S. All-optical reservoir computing. *Opt Express.* (2012) **20**:22783–95. doi: 10.1364/OE.20.022783

15. Brunner D, Soriano MC, Mirasso CR, Fischer I. Parallel photonic information processing at gigabyte per second data rates using transient states. *Nat Commun.* (2013) **4**:1364. doi: 10.1038/ncomms2368

16. Vinckier Q, Duport F, Smerieri A, Vandoorne K, Bienstman P, Haelterman M, et al. High-performance photonic reservoir computer based on a coherently driven passive cavity. *Optica.* (2015) **2**:438–46. doi: 10.1364/OPTICA.2.000438

17. Van der Sande G, Brunner D, Soriano MC. Advances in photonic reservoir computing. *Nanophotonics.* (2017) **6**:561–76. doi: 10.1515/nanoph-2016-0132

18. Bienstman P, Dambre J, Katumba A, Freiberger M, Laporte F, Lugnan A. Photonic reservoir computing: a brain-inspired approach for information processing. In: *Optical Fiber Communication Conference*. San Diego, CA: Optical Society of America (2018). p. M4F–4.

19. Jaeger H. Short term memory in echo state networks. GMD-Report 152. In: *GMD-German National Research Institute for Computer Science*. Citeseer (2002). Available online at: http://www.faculty.jacobs-university.de/hjaeger/pubs/STMEchoStatesTechRep.pdf

20. Dambre J, Verstraeten D, Schrauwen B, Massar S. Information processing capacity of dynamical systems. *Sci Rep.* (2012) **2**:514. doi: 10.1038/srep00514

Keywords: photonic, reservoir computing, passive, coherent, distributed non-linearity, Kerr, fiber-ring

Citation: Pauwels J, Verschaffelt G, Massar S and Van der Sande G (2019) Distributed Kerr Non-linearity in a Coherent All-Optical Fiber-Ring Reservoir Computer. *Front. Phys.* 7:138. doi: 10.3389/fphy.2019.00138

Received: 23 May 2019; Accepted: 06 September 2019;

Published: 03 October 2019.

Edited by:

Claudio Mirasso, Institute of Interdisciplinary Physics and Complex Systems (IFISC), SpainReviewed by:

Apostolos Argyris, Institute of Interdisciplinary Physics and Complex Systems (IFISC), SpainVasileios Basios, Free University of Brussels, Belgium

Luis Pesquera, University of Cantabria, Spain

Copyright © 2019 Pauwels, Verschaffelt, Massar and Van der Sande. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Jaël Pauwels, jael.pauwels@vub.be