Machine Learning With Observers Predicts Complex Spatiotemporal Behavior

Neofotistos, George; Mattheakis, Marios; Barmparis, Georgios D.; Hizanidis, Johanne; Tsironis, Giorgos P.; Kaxiras, Efthimios

doi:10.3389/fphy.2019.00024

ORIGINAL RESEARCH article

Front. Phys., 05 March 2019

Sec. Quantum Engineering and Technology

Volume 7 - 2019 | https://doi.org/10.3389/fphy.2019.00024

Machine Learning With Observers Predicts Complex Spatiotemporal Behavior

George Neofotistos^1,2^*

Marios Mattheakis¹

Georgios D. Barmparis²

Johanne Hizanidis²

Giorgos P. Tsironis^1,2

Efthimios Kaxiras^1,3

¹School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, United States
²Department of Physics, University of Crete, Heraklion, Greece
³Department of Physics, Harvard University, Cambridge, MA, United States

Chimeras and branching are two archetypical complex phenomena that appear in many physical systems; because of their different intrinsic dynamics, they delineate opposite non-trivial limits in the complexity of wave motion and present severe challenges in predicting chaotic and singular behavior in extended physical systems. We report on the long-term forecasting capability of Long Short-Term Memory (LSTM) and reservoir computing (RC) recurrent neural networks, when they are applied to the spatiotemporal evolution of turbulent chimeras in simulated arrays of coupled superconducting quantum interference devices (SQUIDs) or lasers, and branching in the electronic flow of two-dimensional graphene with random potential. We propose a new method in which we assign one LSTM network to each system node except for “observer” nodes which provide continual “ground truth” measurements as input; we refer to this method as “Observer LSTM” (OLSTM). We demonstrate that even a small number of observers greatly improves the data-driven (model-free) long-term forecasting capability of the LSTM networks and provide the framework for a consistent comparison between the RC and LSTM methods. We find that RC requires smaller training datasets than OLSTMs, but the latter require fewer observers. Both methods are benchmarked against Feed-Forward neural networks (FNNs), also trained to make predictions with observers (OFNNs).

Introduction

Predicting the state of complex, non-linear dynamical systems as a function of time is an important problem of great practical utility. Recent advances of artificial neural networks and machine learning (ML) methods have made possible significant applications in science, industry, and technology [1–15], with reliable prediction comprising one of the most promising areas of research. In this report, we investigate the long-term forecasting capability of two widely used ML methods, the Long Short-Term Memory (LSTM) and the reservoir-computing (RC) recurrent neural network architectures, to predict the spatiotemporal evolution of two distinct complex dynamical phenomena: (i) multi-clustered turbulent chimera states, that is collective, self-organized patterns of coexisting coherence and incoherence in coupled oscillators systems; and (ii) the onset of branching singularities in electronic flow in two dimensional disordered materials. These two phenomena represent opposite ends in complex spatiotemporal evolution with chimeras relating to dynamic self-organization and branching relating to the stochastic onset of singular motion.

LSTM networks [4] have proven to be successful in predicting sequence-involving tasks such as speech recognition [5], machine translation [6], and human dynamics [7] among others. While they have the ability to learn and reproduce long, isolated sequences, they do not seem to provide robust predictions in complex physical systems that involve chaotic behavior or to capture dependencies between multiple correlated sequences [8–14]; in many cases they are outperformed by simpler methods, such as residual multilayer perceptrons [8], while in other cases their predictions diverge [10, 14]. Recently, hybrid LSTM models have been proposed to improve forecasting in applications ranging from chaotic systems [8–10], to pedestrian trajectories [11], particle tracking in high-energy physics [12], and anomaly detection in the post-mortem time series of superconducting magnets [13]. For example, in the method of predicting human trajectories by “social LSTMs” [14], one LSTM network is assigned to each pedestrian, but in order to improve the prediction, each LSTM network is not independent of all other LSTMs, but are connected to those corresponding to nearby pedestrians' sequences by the introduction of a “social” pooling layer which allows the LSTMs of spatially proximal sequences to share their hidden-states with each other. In another approach, aiming at improving the forecasting of higher-dimensional chaotic systems, a hybrid architecture extends the global LSTM applied to the system with a mean stochastic model (MSM-LSTM) in order to ensure convergence to the invariant measure [10].

Reservoir computing (RC), first proposed by Jaeger and Haas [15], comprises a linear input layer, a recurrent non-linear reservoir network and a linear output layer. This approach has recently been applied to inference problems in chaotic systems [1, 16, 17]. Specifically, “observers” trained by RC (the reservoir observers), provide continual “ground truth” system state measurements as input to the prediction method, deducing the state of the chaotic dynamical system as a function of time from the limited number of the concurrent system state measurements. At each time step, RC estimates the desired unmeasured variables from the measured variables and predicts the evolution the physical system. Lu et al. [3] have recently demonstrated the effectiveness of the method by applying it to the Rössler system, the Lorenz system, and the spatiotemporally chaotic Kuramoto–Sivashinsky equation, carefully pointing out that the method addresses the inference of unmeasured state variables rather than their prediction.

Introducing a New Method: “Observer LSTM” (OLSTM)

LSTM networks have proven successful in learning and generalizing sequential tasks from isolated sequences, such as handwriting and speech. Inspired by this success, we first considered a model with a single LSTM network assigned to each system node, which is independent of all other LSTMs. The prediction error for this approach turned out to be very large. This is not surprising since chimera states are collective phenomena and the simplistic use of one LSTM network per node, independent of all others, does not capture well the interaction between the nodes; the sequences of different nodes are not isolated but correlated.

In order to address the independent LSTMs' limited ability to capture dependencies between multiple correlated sequences, we propose and demonstrate a new method, which we call “Observer LSTM (OLSTM),” based on the extension of the notion of “reservoir observers” to the LSTM networks. In the OLSTM method, we assign one LSTM to each (non-observer) system node but also assign “observer” status to certain system nodes (“LSTM observers,” taken at equidistant positions for simplicity) which provide continual “ground truth” measurements as input to the prediction method. We demonstrate that their presence even in small numbers (of order <10% of the total number of nodes) greatly improves the long-term data-driven forecasting capability of the LSTM networks and provides the framework for a consistent comparison between the RC and LSTM methods.

Time-series data are used to train each network while no knowledge of the underlying system equations is required. Each individual LSTM network is trained by taking as input a number of past values (denoted as N_p) for the node at hand, plus the ground-truth values provided by all observers. Thus, the OLSTM method generates a generalized sequence, which combines the N_p past values and the time-varying systemic interaction. Using the trained networks, long-term predictions are made by iteratively predicting one step forward for each node, using as input the node's previous values and the values provided by the relatively very small number of observers.

We study the networks' long term forecasting capability as a function of the number of observers and as a function of training-set size, and we compare the OLSTM performance with that of “reservoir observers” trained by RC, which utilizes a single (“global”) network for the entire system. We benchmark both methods against a standard Feed-Forward neural network (FNN) method, with the same number of observers (OFNN). We compare quantitatively the networks' performance by calculating the normalized root mean square error (RMSE) at each time step, for all system nodes, over the predicted time steps, as in [3], Equation (22), without counting observer nodes and training time steps, since they do not contribute to the prediction error.

Results and Discussion

We structure our study of predicting complex dynamics in extended physical systems with ML approaches in two parts: the first part concerns chimeras and the second concerns branching in flows. Together, these two cases capture extremes of complex spatiotemporal behavior that severely challenge any method aspiring to predict the long-term behavior. Chimera states challenge the ML methods to predict partial, self organized coherence while the stochastic onset of branching challenges the ML methods to predict stochastic yet singular events.

Predicting Turbulent Chimeras in Coupled Arrays

Chimera states are collective, self-organized patterns of coexisting coherence, and incoherence in coupled oscillator systems. Following the first discovery of chimeras for symmetrically coupled Kuramoto identical oscillators in 2002 [18], this counterintuitive symmetry breaking of partially coherent and partially incoherent behavior has received enormous attention. Many recent theoretical works have focused on the study of chimera states in a variety of physical systems such as superconducting metamaterials [19–21] quantum systems [22], and laser arrays [23, 24], to mention only a few. Chimeras have also been studied in models addressing neuron dynamics in hierarchical and modular networks [25, 26]. It has been suggested that chimera states may be related the phenomenon of unihemispheric sleep observed in mammals and birds [27], epileptic seizures [28], and blackouts in power-grids [29]. For finite systems, chimera states are known to be chaotic transients [30], can be stabilized by various newly developed control schemes [31–33]. Following the theoretical predictions, chimera states were experimentally verified for the first time in populations of coupled chemical oscillators [34] and in optical coupled map lattices realized by liquid-crystal light modulators [35] and, later on, in mechanical [36], electronic [37], and electrochemical oscillators [38]; for a recent review see [39].

Chimeras can be stationary or turbulent. Turbulent chimeras have been observed experimentally [35] and have been classified in numerical studies of large arrays of SQUIDs (Superconducting QUantum Interference Devices) and in arrays of lasers with various types of interactions [19–21, 23], among other physical systems. Their actual trajectories are highly non-linear and comprise an immense challenge to predicting their occurrence. In the following sections, we present the long-term prediction results of the ML methods applied on turbulent chimeras in simulated SQUID and semiconductor laser arrays.

SQUID metamaterials constitute a subclass of superconducting artificial media whose function relies both on the geometry and the extraordinary properties of superconductivity and the Josephson effect [40, 41]. Recent experiments on SQUID metamaterials in the superconducting state have demonstrated their wide-band tuneability, significantly reduced losses, and dynamic multistability [42, 43]. The simplest version of a SQUID consists of a superconducting ring interrupted by a Josephson junction [44]; the device is a highly non-linear resonator with a strong response to applied magnetic fields. SQUID metamaterials exhibit peculiar magnetic properties including negative diamagnetic permeability that were predicted both for the quantum [45] and the classical [46, 47] regime.

We investigate the long-term prediction capability of the ML methods under study on turbulent single-headed and double-headed chimeras (“head” stands for incoherent cluster) observed numerically in an array of N identical rf SQUIDs coupled together magnetically through dipole-dipole forces. In this system, the magnetic flux Φ_n threading the n-th SQUID loop is given by:

\begin{array}{l} Φ_{n} = Φ_{e x t} + L I_{n} + L \sum_{m \neq n} λ_{| m - n |} I_{m}, \end{array}

where the indices n and m run from 1 to N, Φ_ext is the external flux in each SQUID, $λ_{| m - n |} = \frac{M_{| m - n |}}{L}$ is the dimensionless coupling coefficient between the SQUIDs at positions m and n, M_|m−n| being their corresponding mutual inductance, and

\begin{array}{l} - I_{n} = C \frac{d^{2} Φ_{n}}{d t^{2}} + \frac{1}{R} \frac{d Φ_{n}}{d t} + I_{c} sin (2 π \frac{Φ_{n}}{Φ_{0}}) \end{array}

is the current in the n-th SQUID given by the resistively and capacitively shunted junction (RCSJ) model [48], with Φ₀ the flux quantum and I_c the critical current of the Josephson junctions. Each individual SQUID is a highly non-linear oscillator exhibiting multistability in a certain parameter regime. This is crucial for the occurrence of the chimera states when considering the coupled system. The number of possible states in a SQUID metamaterial is not merely the sum of the combinations of individual SQUID states, since their collective behavior provides many more possibilities. Depending on the choice of initial conditions, various space-time flux patterns may be obtained. “Wild” turbulent chimeras are important for testing the prediction methods because they offer non-trivial dynamical evolution (trajectories) upon which we can test the long-term data-driven (model-free) forecasting of the ML methods. Such chimeras have been generated in Hizanidis et al. [20] (see their Figure 4d for a double-headed chimera with large size of incoherent clusters and Figure 4e for a single-headed chimera, in which the largest part of the metamaterial is occupied by an incoherent cluster with varying size and position in time), where the evolution of the individual SQUID fluxes Φ_n is monitored.

We apply RC, OLSTM, and OFNN methods for long-term prediction of the dynamics of these single-headed and double-headed chimeras. Prediction snapshots for the single-headed chimera are presented in Figure 1A as insets on the actual (ground-truth) spatiotemporal evolution of the fluxes. In predicting the evolution of this chimera, 17 “observers” were used in the positions marked by the tips of the arrows. The predicted time series for the flux of SQUID #223 is being shown (Figure 1A, right vertical inset) as well as the predicted fluxes for all SQUIDs (entire metamaterial) at time step t_n = 11,000 (Figure 1A, bottom inset). Similar results for the double-headed chimera are presented in Figure 1B, also for 17 observers; the predicted time series for the flux of SQUID #163 is being shown in Figure 1B (right vertical inset) along with the predicted fluxes for all SQUIDs at time step t_n = 12,000 (Figure 1B, bottom inset). The specific SQUID nodes (#223 and #163) have been chosen in order to depict the long-term forecasting capability of the ML methods in challenging regimes that include both coherent and incoherent behavior. We emphasize that in both cases these are very long-time predictions, not just short-term prediction of just a few time steps beyond the training time; specifically, these predictions comprise more than twice (for OLSTMs and OFNNs) and three times (for RC) the training time. Nevertheless, the ML methods produce non-divergent predictions.

FIGURE 1

Figure 1. Spatiotemporal plots of single- and double-headed chimeras and predicted time series and fluxes. (A) Spatiotemporal plot of a single-headed chimera state, generated in the 1-dimensional SQUID array of Hizanidis et al. [20] Figure 4e, depicting the evolution of the fluxes (values are color-coded), for the one-headed chimera observed in a large array of 256 coupled SQUIDs by dipole-dipole moments, which has been studied numerically for a nonlocal coupling scheme. In predicting the spatiotemporal evolution of this chimera, 17 “observers” have been placed in the positions marked by the tips of the arrows. The thick horizontal black line marks the end of the RC training time, while the dotted horizontal black line marks the end of the OLSTM (and OFNN) training time. Right, vertical inset: predicted time series for the flux of SQUID #223: shaded blue color depicts the actual (ground-truth) data, red line depicts prediction by OLSTM, green line depicts prediction by RC, blue line depicts prediction by OFNN. Bottom inset: predicted fluxes for all SQUIDs (entire metamaterial) at time step t_n = 11,000: symbols are the same as in the right inset, but with pink depicting the actual (ground-truth) data. (B) same as in part (a), but for a double-headed chimera state generated in the 1-dimensional linear SQUID array of Hizanidis et al. [20] Figure 4d. Right, vertical inset: the predicted time series for the flux of SQUID #163. Bottom inset: the predicted fluxes for all SQUIDs (entire metamaterial) at time step t_n = 12,000.

In our implementation of OLSTMs we found that a large value of the number of past steps N_p is not necessary, and in fact even with N_p = 1 the results are quite satisfactory; this implies that the presence of observers more than compensates the need for a short memory to guide the predictions. Thus, our results demonstrate that the “neighborhood” of the node is important; it changes dynamically and the assistance of observers in predicting the future values is more appropriate to process the temporal variation of the “neighborhood” state. The same considerations apply to both SQUID-array and laser-array chimeras.

In laser systems, chimeras were first reported both theoretically and experimentally in a virtual space-time representation of a single laser system subject to long delayed feedback [34]; small networks of globally delay-coupled lasers have also been studied and chimera states were found for both small and large delays. In the present report, we apply the ML methods to predict the trajectories of the turbulent chimeras presented in Figure 4V of the recent work of Shena et al. [23] on multi-clustered chimeras in a large array of coupled semiconductor lasers with non-local coupling. This array is a ring of M = 200 semiconductors lasers of class B. Each node j is symmetrically coupled with the same strength to its R neighbors on either side (non-local coupling). The evolution of the slowly varying complex amplitudes $E_{j} = E_{j} e x p (i φ_{j})$ , with E_j is the amplitude and ϕ_j the phase of the electric field, and the corresponding population inversions N_j are given by equations

\begin{array}{r} \frac{d E_{j}}{d t} = (1 + i a) E_{j} N_{j} + \frac{k e^{- i 2 C_{p}}}{2 R} \sum_{l = j - R}^{j + R} E_{l}, \\ \frac{d N_{j}}{d t} = \frac{1}{T} [p - N_{j} - (1 + 2 N_{j}) {| E}_{j} |^{2}], j = 1, \dots, M, \end{array}

where all indices have to be taken modulo M. T is the ratio of the lifetime of the electrons in the excited level and that of the photons in the laser cavity. Lasers are pumped electrically with the excess pump rate p = 0.23 [23]. The linewidth enhancement factor a models the relation between the amplitude and the phase of the electrical field; a value of a = 2.5 was used, typical for semiconductor lasers. The coupling strength k and the phase C_p are the control parameters that are used to tune the collective dynamics of the system. As a measure for phase and amplitude synchronization, the characterization of the phase synchronization of the system can be calculated by means of the Kuramoto local order parameter [49]:

\begin{array}{l} Z_{j} = | \frac{1}{2 ζ} \sum_{| l - j | \leq ζ} e^{i φ_{l}} | \end{array}

Shena et al. [23] have used a spatial average with a window size of ζ = 3 elements (a Z_j value close to unity indicates that the j-th laser belongs to the coherent regime, whereas Z_j is closer to 0 in the incoherent part). This quantity can measure only the phase coherence and gives no information about the amplitude synchronization of the electric field, and for this reason the authors have used the classification scheme presented in Kemeth et al. [50] for spatial coherence, based on the calculation of the so called local curvature at each time instance, by applying the absolute value of the discrete Laplacian |DE| on the spatial data of the amplitude of the electric field:

\begin{array}{l} | D E |_{j} (t) = | E_{j + 1} (t) - 2 E_{j} (t) + E_{j - 1} (t) |, j = 1, \dots, M . \end{array}

In the synchronization regime the local curvature is close to zero while in the asynchronous regime it is finite and fluctuating.

Since our objective is to obtain “wild” turbulent chimeras in order to test the long-term predictions of the ML methods, we have chosen the turbulent chimera of Shena et al. [23] Figure 4V, generated with the choice of parameter values: R = 64, C_p = 0.4π, k = 0.225, T = 392, p = 0.23, a = 2.5. Prediction snapshots are presented in Figure 2, as insets on the actual (ground-truth) spatiotemporal evolution of the local curvature for the flux of laser #145 and for all lasers, for similar choices as in Figure 1, that is, with 17 “observers” and at time step t_n = 1,500. These are extremely long-time predictions, with time horizons almost five times the training time (for OLSTMs and OFNNs) and ten times (for RC) the training time. In this case, as in that of the SQUID system, all ML methods under study produce non-divergent predictions.

FIGURE 2

Figure 2. Spatiotemporal plot of a turbulent chimera state [as is generated in the 1-dimensional semiconductor Class B laser array of Shena et al. [23] Figure 4V], depicting the evolution of the local curvature for each laser (values are color-coded), studied numerically for a non-local coupling scheme. Seventeen “observers” have been placed in the positions marked by the tips of the arrows. The thick horizontal black line represents the size of the training dataset used for training the reservoir observers with RC, while the dotted horizontal black line represents the size of the training dataset for training the OLSTM (and OFNN) observers. Right, vertical inset: the predicted time series for laser #145: shaded blue color depicts the actual (ground-truth) data, red line depicts prediction by OLSTM, green line depicts prediction by RC, blue line depicts prediction by OFNN. Bottom inset: snapshot of the spatial profile of the predicted local curvatures for the entire array at time step t_n = 1,500: color code is as in the right inset, but with pink color depicting the actual (ground-truth) data.

In Figure 3 we present the calculated values of the normalized root mean squared error for long-term prediction (top inset) of the RC, OLSTM, and the OFNN methods, and as a function of the number of observers and the size of the training dataset for a randomly selected time step for the chimera depicted in Figure 2; similar results are obtained for all systems. The RMSE values remain, on average, close to 10⁻¹, for time horizons up to 2,000 time steps in the future. This figure also presents graphs of how the RMSE varies as a function of the number of observers (given as percentage of the number of the system nodes), and as a function of the size of the training datasets (given as percentage of the entire ground-truth time series. In all cases, low RMSE values are attained after a minimum number of observers have been included in the system, comprising about 5% of the total number of oscillators, and the size of the training data reaches at least 15% of entire dataset. These results demonstrate that, even very small numbers of observers are very important for achieving non-diverging long-term predictions. Furthermore, they show that RC achieves low levels of RMSE with smaller size of training datasets, whereas OLSTM achieves low levels of RMSE with smaller number of observers.

FIGURE 3

Figure 3. Normalized RMSE (<R>) of RC, OLSTM, and the OFNN methods calculated for each time step over all system nodes at each predicted time step (top panel) and over the predicted time steps and all system nodes (bottom 3-dimensional plots) as a function of the number of observers (given as percentage of the entire number of system nodes) and the size of the training dataset (given as percentage of the entire ground-truth time series) for the chimera depicted in Figure 2. The black dots in each of the 3-dimensional plots represent the normalized RMSE for each ML method, respectively, at a randomly picked time step, calculated overall system nodes and previous time steps, but not counting observer nodes and training time steps. Red: OLSTM, green: RC, blue: OFNN. Axes, tick marks, tick labels, and grid lines are the same in all 3-dimensional plots.

Predicting Singular Events

Wave focusing due to refractive index variation is a common occurrence in many physical systems, as, for example, in optical media where the index of refraction changes in a statistical way due to small imperfections or distributions of defects in the medium through which the wave propagates. Random spatial variability of the index leads to local focusing and defocusing of the waves and the formation of caustics (or wave “branches”) with substantially increased local wave intensity [51, 52]. Quantum particles like electrons also exhibit wave properties and as a result, electrons traveling in disordered media can form coalescing trajectories and exhibit phenomena similar to wave motion in optical random media. A case in point, where branching may have implications for technological applications, is the ultra-relativistic electronic flow in a two-dimensional (2D) random potential as manifested in graphene and other Dirac solids [52]. In these situations, branching arises in the flow of electrons through a region of inhomogeneous distribution of charge impurities in the substrate, which create a random potential for the electrons [53]. An additional bias voltage is introduced to induce the electronic propagation. Recently, it was shown that the onset of the electronic branched flow is determined by the statistical properties of the random substrate potential, which is quantified through a scaling-type relationship to capture the emergence of branches [54].

It is a formidable challenge to predict singular events like branching in wave propagation or electron flow because of the stochastic nature of the onset of such events. An important question is whether or not ML methods can dissect the stochastic nature of branching and “learn” the interactions that take place among trajectories, thereby providing an accurate detection mechanism for the caustics that mark the onset of branching. We attempt to resolve this issue with results from the RC, OLSTM, and OFNN methods, on singular branched flows in graphene with random potentials. Our results demonstrate that the ML methods we considered can adequately capture the stochastic temporal dependencies of the time series in this prototypical complex dynamical system.

Caustic event prediction in a 2D electron flow is facilitated if we consider one of the spatial dimensions (we refer to it as the “longitudinal” x-direction) as the “time-coordinate,” and therefore, map the stationary phenomenon of caustic formation onto a 1D spatio-temporal dynamical problem. In the framework of this approach, we model the motion of electrons as individual rays whose density matrix is transformed onto a vector of time series with dimension N, the number of mesh points of the remaining spatial dimension (the “transverse,” or y-direction). In the system we studied, N = 210, spanning a total length of 84 nm [54]. Our goal is to predict the onset of branching in time and its location in the electronic flow. Figure 4 presents prediction snapshots of the onset of branching in electronic flows in graphene with random potentials, depicting the intensity of the flows, using 10 different “observers” along the “time” axis of the flows at different positions on the transverse axis. These results demonstrate that the RC, OLSTM, and OFNN methods are able to predict well the branching in the electronic flows in this system, even at very long prediction times. In these methods, observers are very important to achieve long-term forecasting capability; they act as real-time sensors that help predict the future values of the flows.

FIGURE 4

Figure 4. Branching in electronic flows in graphene with random potentials (intensity values are color-coded); the graphene sheet has size 176 (vertical) × 84 nm (horizontal). The snapshots in insets depict the intensity of the flows as found in the simulation and predicted by the ML methods. In predicting the time evolution of the flows, 10 “observers” have been placed in the positions marked by the tips of the arrows, monitoring the entire “time” (vertical) axis. The thick horizontal white line marks the end of the RC training, while the dotted white line marks the end of the OLSTM (and OFNN) training. Inset outlined in pink: the actual and predicted time series for the entire system at “time coordinate” point x_n = 100 nm, corresponding to 501 time steps. The pink-shaded curve represents the actual (ground-truth) data, red line is the OLSTM prediction, green is the RC prediction and blue the OFNN prediction. Inset outlined in blue: same as for the other inset, but at “time coordinate” point x_n = 160 nm, corresponding to 801 time steps, and with blue-shaded curve depicting the actual (ground-truth) data.

Conclusions and Outlook

The issue of predicting complex spatiotemporal behavior using ML approaches is one of central importance for their potential applications in the physical sciences and beyond. Here, we attempted to address this issue by considering two distinct prototypical phenomena, viz. partially coherent chimera states and the stochastic onset of branching in 2D wave flows, as these are realized in coupled arrays of SQUIDs or lasers, and in the flow of electrons in graphene with random potentials, respectively. We find that ML approaches like LSTM and RC recurrent neural networks can perform well in predicting complex dynamics in extended physical systems when they involve “observers” that monitor the system evolution throughout its time dimension. The presence of observers is an inherent requirement of the second approach (RC), but not of the first (LSTM). Accordingly, we proposed a new method, which we call “Observer LSTM (OLSTM),” to address the limitations of single, independent LSTM networks in capturing dependencies between multiple correlated sequences. We have also considered an observer-enhanced Feed-Forward network (OFNN) and tested the long-term prediction performance of the three approaches, OLSTM, OFNN, and reservoir observers trained by RC, on the two difficult problems of turbulent chimeras and 2D branching flows. Our results quantify how the prediction error (root mean squared error, RMSE, of the predicted values) varies as a function of the number of observers and of the size of the training datasets.

We conclude that “observers” comprise sine qua non elements for robust data-driven (model-free) long-term forecasting capability of the LSTM and FNN networks, and they provide the framework for a consistent comparison between the LSTM and RC methods. Thus, observer-enhanced ML methods, like OLSTM, acting as high-level “intelligent” interpolation schemes, are capable of successfully predicting the non-linear spatiotemporal evolution of complex dynamical systems. Many issues remain to be evaluated and questions to be answered in establishing the robustness and efficiency of the observer-enhanced approaches. An example of such issues is the extent to which the presence of observers, while enhancing long-term predictability, affects the chaotic behavior of the system. A thorough investigation of such issues is the focus of on-going and future research.

Methods

The RC network architecture used in predicting turbulent chimeras comprises of 1,000 reservoir nodes, with spectral radius ρ = 1.0, average degree D = 80, scale of inputs weights σ = 1.5, bias constant ξ = 0.0, leakage rate α = 0.9, ridge regression parameter β = 0.5, and time interval Δt = 0.01. The RC network applies to the system as a whole (single network architecture). For branching, the RC network used comprises of 3,000 reservoir nodes, with spectral radius ρ = 0.9, average degree D = 50, scale of inputs weights σ = 1, bias constant ξ = −0.4, leakage rate α = 0.5, ridge regression parameter β = 0.05, and Δt = 0.05. The network applies to the system as a whole (single network architecture).

The OLSTM network architecture used in this study constitutes 400 LSTM cells with RELU activation functions (one hidden layer). A single LSTM network is applied to each (non-observer) system node (SQUID or laser oscillator or electron flow). Similarly, a single fully-connected (dense) Feed Forward network (OFNN), with RELU activation functions, is applied to each (non-observer) system node (SQUID or laser oscillator or electron flow). Optimization for LSTMs during training is performed using the Adam stochastic optimization method [55] with a learning rate of 0.001 as implemented in Keras ¹.

In case of the SQUIDs system (Figure 1), the OLSTM and OFNN models were trained with 5,900 time steps (40% of the total 14,750 time steps) and RC with 3,000 time steps (about 20% of the total number of time steps). In case of the Laser Chimeras system (Figure 2), the OLSTM and OFNN models were trained with 400 time steps (20% of the total number of time steps and RC model with 200 time steps (10% of the total number of time steps). Finally, in case of branching (Figure 4), OLSTM and OFNN were trained with 352 time steps (40% of the total number of time steps) and RC with 300 time steps (about 34%).

The data of the time series used (chimeras and electronic flows in graphene) was preprocessed as in [3] Equation (15) and smoothed by means of Matlab's 2D Gaussian Smoothing Kernel with standard deviation σ = 3.5 for chimeras and σ = 2.5 for the electronic flows in graphene. The time series data have been divided into two separate sets, the training dataset and the validation dataset. The data is stacked in batches (of size 50 data points) in order to form the training (and validation) input and output of the networks. These training batches are used to optimize the parameters of the networks (weights and biases). The training proceeds by optimizing the network weights iteratively for each batch. The training loss function is a weighted version of the root mean square error. When the network parameters have been optimized once for all training data batches, one epoch is completed. After every epoch the RMSE in the validation data set is computed. Training is stopped after 200 epochs and the OLSTM (OFNN) network with the smallest validation error is selected in order to avoid over-fitting.

Each trained OLSTM network is then used to forecast the node's state in the next time step in an iterative fashion (getting also input from the observers). All the hyper-parameters in OLSTM and OFNN models including the number of training epochs, the number of training batches, the number of neurons and the number of hidden layers are optimized so that they are leading to the smallest RMSE. Similarly, for RC we optimized the parameters with criterion the smallest value for RMSE.

As a comparison measure for the networks' performance, we use the (normalized) root mean square error calculated at each time step, for all system nodes and over the predicted time steps (unless otherwise specified).

Data Accessibility

The machine learning library Keras in python 3.6 was used for the implementation of the OLSTM and OFNN architectures (utilizing the “Metropolis” Supercomputing Facility of the Center for Quantum Complexity and Nanotechnology of the Physics Department of the University of Crete) while our code written on Matlab was used for the RC architectures.

Data Availability

The raw data supporting the conclusions of this manuscript will be made available by the authors, without undue reservation, to any qualified researcher.

Author Contributions

GN, MM, and GDB designed and performed research, analyzed data, and wrote the manuscript. JH provided data on chimeras, made contributions to the manuscript, and constructive comments. GPT and EK contributed to the design of research, interpretation of results, and to the writing of the manuscript.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer AZ declared a past co-authorship with one of the authors GT to the handling editor.

Acknowledgments

GPT acknowledges useful discussions with Prof. Edward Ott. GN and GPT acknowledge support by the European Commission under project NHQWAVE (MSCA-RISE 691209). JH acknowledges support by the General Secretariat for Research and Technology (GSRT) and the Hellenic Foundation for Research and Innovation (HFRI) (Code: 203). MM and EK acknowledge partial support from EFRI 2-DARE NSF Grant No. 1542807 and from ARO MURI Award No. W911NF14-0247. We used computational resources on the Odyssey cluster of the FAS Research Computing Group at Harvard University.

Footnotes

1. ^Available online at: https://keras.io (Accessed February 15, 2018).

References

1. Hinton G, Deng L, Yu D, Dahl GE, Mohamed A-R, Jaitly N, et al. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Proc Mag. (2012) 29:82–97. doi: 10.1109/MSP.2012.2205597