Fuzzy-Weighted Echo State Networks

A novel echo state network (ESN), referred to as a fuzzy-weighted echo state network (FWESN), is proposed by using the structural information of data sets to improve the performance of the classical ESN. The information is incorporated into the classical ESN via the concept of Takagi–Sugeno (TS) models/rules. We employ the fuzzy c-mean clustering method to extract the information based on the given data set. The antecedent part of the TS model is determined by the information. Then, we obtain new fuzzy rules by replacing the affine models in the consequent part of each TS rule with a classical ESN. Consequently, the output of the proposed FWESN is calculated through inferring these new fuzzy rules by a fuzzy-weighted mechanism. The corresponding reservoir is consisted of the sub-reservoirs of the new fuzzy rules. Furthermore, we prove that the FWESN has an echo state property by setting the largest spectrum radium of all the internal weight matrices in the sub-reservoirs less than one. Finally, a nonlinear dynamic system and five nonlinear time series are employed to validate the FWESN.


Summary of the Echo State Network
The recurrent network model describes the change process of the states of research object with time and space. Since the complexity of the problem increases and the computing power enhances, various recurrent networks have been successfully applied to different application fields, such as echo state networks in time series prediction (Jaeger and Haas, 2004), Boolean networks in games (Le et al., 2021;Le et al., 2020), and optimal control (Chen et al., 2019;Toyoda and Wu, 2021;Wu et al., 2021).
Echo state networks (ESNs) are a special case of recurrent neural networks (RNNs) proposed by Jaeger and Haas (2004). Unlike the traditional RNN, the recurrent layer of ESN uses a large number of neurons, and the connection weights between neurons are randomly generated and sparse. In an ESN, the recurrent layer is called a reservoir. The input signals drive the reservoir, and the trainable output neurons combine the output of the reservoir to generate task-special temporal patterns. This new RNN paradigm is referred to as reservoir computing. Similar to ESNs, liquid state machines (Maass et al., 2002), temporal recurrent neural networks (Steil, 2006), and decorrectation-backpropagation learning (LukošAevicius and Jaeger, 2009), and convolution and deep echo state networks (Ma et al., 2021;Wang et al., 2021) are all the instances of reservoir computing. The difference between ESNs and them is that the former employs analog neurons. The problem of traditional RNN is the lack of an effective supervised training algorithm. This problem is largely overcome by ESNs since only output weights are trained. ESNs have been successfully applied in a wide range of temporal tasks (Jaeger and Haas, 2004;Holzmann and Hauser, 2010;Song and Feng, 2010;Babinec and Pospichal, 2012;Xu et al., 2019;Yang and Zhao, 2020), especially in the prediction of nonlinear chaotic time series (Jaeger and Haas, 2004;Wang et al., 2021).

Summary of the Related Work and Motivation
The random and sparse connection weights between neurons in the reservoir bring much convenience for ESN applications. However, just simply creating at random is unsatisfactory for a specific modeling task (LukošAevicius and Jaeger, 2009). Recently, one of main streams for ESN research has been focused on developing a suitable reservoir to improve its performance (Jaeger, 2007;Holzmann and Hauser, 2010;Song and Feng, 2010;Babinec and Pospichal, 2012;Sheng et al., 2012). The fact shows that a specific architectural variant of the standard ESN leads to better results than that of a naive random creation. For examples, a new ESN with arbitrary infinite impulse response filter neurons is proposed for the task of learning multiple attractors or signal with different time scales. Then, the trainable delays in the synapse connection of output neurons are added to improve the memory capacity of ESNs (Holzmann and Hauser, 2010). Inspired by the simulation results of some nonlinear time series prediction, a complex ESN is proposed, in which the connection process of its reservoir is determined by five growth factors (Song and Feng, 2010). A complex prediction system is created by combining the local expert ESN with different memory length for solving the troubles of ESN with fixed memory length in applications (Babinec and Pospichal, 2012). A hierarchical architecture of ESN is presented for multi-scale time series. Its core ingredient of each layer is an ESN. This architecture as a whole is trained by a stochastic error gradient descent (Jaeger, 2007). An improved ESN is proposed to predict the noisy nonlinear time series, in which the uncertainties from internal states and outputs are meanwhile considered in accordance with the industrial practice (Sheng et al., 2012).
Note that uncertain information, noises, and structure information often exist in the systems (Liu and Xue, 2012;Shen et al., 2020;Shen and Raksincharoensak, 2021a,b). Thus, an extensive work has been carried out on designing a specific reservoir for a given modeling task as mentioned previously. However, the structure information for the input/output data is ignored when the reservoir is designed or revised. In fact, for many temporal tasks and pattern recognition problems, the data sets appear in homogenous groups, and this structural information can be exploited to facilitate the training process, so that the prediction accuracy can be further improved (Wang et al., 2007;Liu and Xue, 2012). Thus, it becomes a necessary requirement to consider the effects of data structure information on the ESN and then to design a suitable reservoir for a specific modeling task.

Main Idea and Contributions
This study aims at constructing a new type of ESN, referred to as a fuzzy-weighted echo state network (FWESN). The FWESN is able to incorporate the structural information of data sets into the classical ESN via the TS model. Actually, the FWESN can be regarded as a certain ESN, in which the output is calculated by a fuzzy-weighted mechanism, and the corresponding reservoir consists of sub-reservoirs corresponding to TS rules. Similar to the ESN, the echo state property for the FWESN is obtained when all weighted matrices of sub-reservoirs satisfy that their spectrums are less than one.
The contribution of this article lies in the following aspects: first, the structural information of the data set is incorporated into the classical ESN to enhance its performance in applications.
Second, the structure of FWESN is parallel, which is distinguished from the hierarchical architecture of ESN. The FWESN is trained efficiently by a linear regression problem, which is the same as the training algorithms of the ESN and TS model. Thus, the FWESN avoids the problem of vanishing gradients, as the hierarchical ESN, deep feedforward neural networks, and fully trained recurrent neural networks based on gradient-descent methods.
The remaining article is structured as follows: preliminaries are given in Section 2. The architecture, echo state property, and training algorithm of FWESN are discussed in Section 3. Experiments are performed by comparing FWESN with the ESN and TS model in Section 4. Finally, conclusions are drawn in Section 5.

PRELIMINARIES
In this section, we give a brief introduction to typical ESNs and TS models. A more thorough treatments concerning them can be referred to Takagi and Sugeno (1985), Jaeger and Haas (2004), and Holzmann and Hauser (2010).

Echo State Networks
An ESN can be represented by state update and output equations. We formulate the ESN as shown in Figure 1.
The activation of internal units in a reservoir is updated according to the following equations:.
Here, x(n) (x 1 (n), . . . , x N (n)) T is a state vector of the reservoir, u(n) (u 1 (n), . . . , u Nin (n)) T ∈ R N in is an input vector, y(n − 1) (y 1 (n − 1), . . . , y Nout (n − 1)) T ∈ R N out is an output vector, and W in ∈ R N×N in , W ∈ R N×N , and W back ∈ R N×N out are the input, internal, connection weight, and feedback matrices, respectively. R is the real number. f(·) (f 1 , . . . , f N ) T stands for an activation function vector. For example, f i (·) tanh (·), i 1, 2, . . ., N. The full connection of internal units in the reservoir is shown in Figure 2. The output y(n) can be expressed as There are several notions of stability relevant to ESNs, where the echo state property is the most basic stability property (Jaeger and Haas, 2004).
. . for some k ∈ Z, leftinfinite, and finite h input sequences, respectively. Z is the integer. The network state update operator G is defined as follows (Jaeger and Haas, 2004): to denote the network state that results from an iterated application of Eq. 1. If the input sequence u h (u(n + 1), . . . , u(n + h)) is fed into the network, the network is in state x(n) and has output y(n) at time n. In the network without output feedback, Eq. 3 is simplified to Definition 1: Assume that the inputs are drawn from a compact input space U, network states lie in a compact set A, and the network has no output feedback connections. Let N be the natural numbers. Then, the network has echo states, if the network state x(n) is uniquely determined by any left-infinite input sequences u ∞ . More precisely, this means that for every input sequence . . . , u(n − 1), u(n) ∈ U −N , for all state sequences . . ., The condition of Def. 1 is hard to check in practice. Fortunately, a sufficient condition is given in Jaeger and Haas (2004), which is easily checked.
Proposition 1: Assume a sigmoid network with unit output functions f i tanh. Let the weight matrix W satisfy σ max ∧ < 1, where σ max is the largest singular value of W. Then,

Takagi-Sugeno Models
Among various fuzzy modeling themes, the TS model (Takagi and Sugeno, 1985) has been one of the most popular modeling frameworks. A general TS model employs an affine model in the consequent part for every fuzzy rule. We formulate the TS model as shown in Figure 3.
A TS model can be represented with r fuzzy rules and each fuzzy rule has the following form: where u(n) [u 1 (n), . . . , u Nin ] T ∈ R N in is the input vector of the antecedent part of the fuzzy rule at time n. r is the number of the rule. M i j are fuzzy sets.
is the output from the ith fuzzy rule, where a i (a i 1 , . . . , a i Nin ) is the vector of consequent parameters of the ith fuzzy rule.
Given an input u(n), the final output of the fuzzy system is inferred as follows: where is the membership grade of u j (n) in M i j and i 1, 2, . . ., r, j 1, 2, . . ., N in .

FUZZY-WEIGHTED ECHO STATE NETWORKS
In this section, we propose a new framework based on the ESN and TS model, which is referred to as a fuzzy-weighted echo state network (FWESN). We further prove that an FWESN has the

Architecture of Fuzzy-Weighted Echo State Networks
FWESNs are designed by taking advantage of TS models to improve ESN (1). The basic idea is to replace the affine model of each fuzzy rule (4) with ESN (1). FWESN is formulated as shown in Figure 4. The FWESN can be represented by the fuzzy rules as follows: If u 1 n ( ) is M i 1 and . . . and u Nin is M i Nin , then y n ( ) W out i S i n ( ), i 1, 2, . . . , r , where y(n) is the output for the ith fuzzy rule (6). y(n) is determined by the following state update equations: Here, S i (n) (u T (n), (x i (n)) T , y T (n − 1)) T , x i (n) ∈ R Ni is the state vector of the reservoir, W in i ∈ R Ni×Nin , W i ∈ R Ni×Ni , W back i ∈ R Ni×Nout , and W out i ∈ R Nout×(Nin+Ni+Nout) are internal input, internal connection weight, and output weight matrices for the ith fuzzy rule (6), respectively. f i (·) ∈ R Ni is the neuron activation function vector, applied element-wise for the ith fuzzy rule (6). Then, the corresponding output of FWESN is inferred by the fuzzy-weighted mechanism. From Eqs. 5, 6, it follows that Ni , Ni .
By Eq. 6, a new reservoir can be reformulated, where the state update equations are written as X n ( ) F W in u n ( ) + WX n − 1 ( )+ W back y n − 1 ( ) . (9) Additionally, the same shorthand is used for the FWESN and ESN. Thus, from Eqs. 3, 9, it follows that which denotes the network state resulting from an iterated applications. For an FWESN without feedback, Eq. 10 is simplified as For clarity, we use (β, W in , W, W back , W out ) to denote an FWESN, where β (β 1 , β 2 , . . . , β r ) T . We use (W in , W, W back ) to denote an untrained ESN.

Discussion on Several Special Cases for Fuzzy-Weighted Echo State Networks
Case 1: From the architecture of FWESN, the classical ESN can be regarded as a special case of FWESN. That is, let r 1 and M 1 j u j n ( ) 1, u j u j n ( ), 0, else, j 1, 2, . . . , N in (12) in Eq. 6. Then, the final output of FWESN (8) is rewritten as The corresponding update Eq.7 is expressed as which is the same as ESN (1). Case 2: The TS model (4) can be regarded as a special case of FWESN (6). That is, let f i (1,0,. . .,0) T in Eq. 6. It follows that Then, we have the output of the ith fuzzy rule (6) as follows: y n ( ) W out i S i n ( ) a i 0 + a i 1 u 1 n ( ) + / + a i Nin u Nin n ( ).
It is obvious that the fuzzy rule (6) has the same form as that of the fuzzy rule (4) based on the aforementioned conditions. Thus, the FWESN degrades into the TS model (4).

Echo State Property of Fuzzy-Weighted Echo State Networks
In this section, we will prove that the FWESN has the echo state property for the case of the network without output feedback. Similar to Proposition 1, we give a sufficient condition for the echo state property of the FWESN. Proposition 2: Let U and X be two compact sets. · 2 is the operator norm on the space of matrices corresponding to 2norms for vectors. Assume a sigmoid network (β, W in , W, W back , W out ) with unit output functions f i j tanh, i 1, 2, . . ., N in , j 1, 2, . . ., N out . Let σ(W i ) < 1 for i 1, 2, . . ., r, where W diag (W 1 , W 2 , . . ., W r ). Then, This implies the echo states for all inputs u ∈ U and states X,X ∈ X .
Proof: Considering W diag (W 1 , W 2 , . . ., W r ) and σ(W i ) < 1, we have Here, λ max (·) is the largest absolute value of an eigenvector of matrix. For two different states X(n) andX(n), by Eqs. 9, 10, we have For f i j tanh, it follows that That is, the Lipschitz condition obviously results in echo states for the FWESN. Remark 1: From the proof of Proposition 2, we select that the updated Eq. 1 is a special form based on the conditions σ(W i ) < 1 for i 1, 2, . . ., r.

Training Algorithm of Fuzzy-Weighted Echo State Networks
We state the training algorithm of FWESN based on the given training input/output pairs (u(n), z(n)) (n 0, 1, 2, . . ., k). First, we employ a subtractive clustering approach (Bezdek, 1981) to determine the membership grade M i j (u j (n)) for the ith fuzzy rule (6), where i 1, 2, . . ., r. Second, we randomly generate the untrained networks (W in i , W i , W back i ), which satisfy the echo state property. Third, we update the network states x i (n) by Eq. 7 and collect the concatenated input/reservoir/previous-output states (u(n), x i (n), y (n − 1)), i 1, 2, . . ., r. Fourth, we calculate W out i (i 1, 2, . . . , r) using the output y(n) of FWESN (8) to approximate z(n) (n 0, 1, 2, . . ., k) by the mean square error. That is, the trained FWESN is obtained.
The procedure of the proposed training algorithm is described by four steps as follows: Step 1 Calculate β i (u(n)) (i 1, 2, . . ., r) in Eq. 8 by the fuzzy c-mean clustering approach.
Step 2 Procure the untrained network (W in i , W i , W back i ) for i 1, 2, . . ., r.
1) Suppose the dimension of the state vector is N for r reservoirs corresponding to r fuzzy rules (5). 2) Initiate i 1.
3) Randomly generate an input weight matrix W in , an output backpropogation weight matrix W back , and a matrix W 0 ∈ R N×N . Normalize W 0 to a matrix W 1 by letting W 1 1 ρ W 0 , where ρ is the spectral radius of W 0 . Scale W 1 to W 2 γW 1 (γ < 1).
Step 3 Sample network training dynamics for each fuzzy rule (4).

1) Let i
1. Initial the state of the untrained network (W in i , W i , W back i ) arbitrarily, typically x i (0) 0 and y (0) 0. 2) Drive the network (W in i , W i , W back i ) for time n 1, 2, . . . , T , by presenting the teacher input u(n), by presenting the teacher output y (n − 1), and by computing x i (n) f i (W in i u(n) + W i x(n − 1) + W back i y(n − 1)) for time n 1, 2, . . . , T . 3) For each time equal or larger than an initial washout time T 1 , collect x i (n), u(n), and y(n) for T 1 ≤ n ≤ T . One has obtained S i (n) (x T i , u T (n), y T (n − 1)) T , T 1 ≤ n ≤ T . 4) i i + 1, if i > r, end; else go to Step 2.
Step 4 Calculate the output weights.

1) Let
Collect β i (u(n))S i (n) as a state matrix S for n T 1 , 2) By the least square method, the output weight W out is calculated by W out (SS T )YS T .
Remark 2: By Step 2, we obtain untrained networks (W in i , W i , W back i ) for i 1, 2, . . ., r. Note that we limit the spectral radius of the internal weight matrix W i (i 1, 2, . . ., r) less than one, which guarantees that the network has the echo state property.

EXPERIMENTS
We have performed some experiments to validate the FWESN in this study. We have shown that the FWESN has better performance than the ESN owing to the incorporation of structural information of data sets. The following terms are used in the experiments: Data sets: A nonlinear dynamic system (Juang, 2002) and five nonlinear time series, i.e., Mackey-Class, Lorenz, ESTSP08(A), ESTSP08(B), and ESTSP08(C), are used in the experiments. Here, the nonlinear dynamic system is y p (k) and u(k) are the output and input, respectively. In the experiment, (u(k), y p (k − 1)) and y p (k) are the inputs and outputs of algorithms, respectively. The generate method of samples are the same with that in Juang (2002). Algorithms: Three algorithms, i.e., FWESN, ESN, and TS model, are used in the experiments. The neurons in the form of hyperbolic tangent functions are used for the ESN and FWESN.
Parameters: r is the number of fuzzy rules. The main parameters of the reservoir are the scale of the reservoir N, the sparseness of the reservoir SD, the spectrum radium of the internal weight matrices in the reservoir SR, the input-unit scale IS, and shifting IT. In the experiments, FWESN and ESN use the same scale N, where N rN i for FWESN, where N i represents the scales of sub-reservoirs corresponding to Eq. 6, where i 1, 2, . . ., r. Moreover, N 1 N 2 . . . N r . Additionally, SR, IS, IT, and SD in all sub-reservoirs of FWESN and the reservoir of ESN are the same. Thus, from Eq. 13, it follows that the spectra radius of W in Eq. 9) is the same as that in Eq. 1.
Finally for the FWESN and TS model, both the parameters in the antecedent part and the total number of fuzzy rules are the same.
Performance Indices: We choose the training and test errors as the performance indices. All the errors refer to the mean square errors in the experiment.
Experimental Results: The simulation results are summarized in Table 1.
From Table 1, the FWESN achieves better performance than the ESN and TS model under same conditions. The bold values in Table 1 highlight the minimal test errors for each data set. For example, by the FWESN and dynamic system Eq. 1, the training and test errors are, respectively, 6.7 014e-6 and 0.001 3, which are far less than the errors based on the ESN and TS model. Thus, the

CONCLUSION
In this work, a novel framework with the advantages of the ESN and TS model is proposed. As a generalization of both ESN and TS model, the ESN and TS model are improved and extended. Similar to the classical ESN, we prove that if the largest spectrum radium of the internal unit weight matrix is less than one, the FWESN has the echo state property. The FWESN shows higher accuracy than the TS model and ESN. For future work, we plan to continuously investigate the underlying theory problem of FWESN, such as the tighter stability conditions and approximation capability to a dynamical system or static function. We attempt to more different applications, for e.g., remaining useful life predictions. Additionally, we will consider hardware, for e.g., field-programmable gate array (FPGA) and implementation of FWESN oriented to real-time applications. Actually, with the development of computing power and access to big data, the convolutional neural networks are very popular owing to their obvious advantages. Thus, one further research will focus on the deep ESN based on the structural information of big data. We believe that some better results will be obtained through incorporating FWESN and deep-learning methods.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
ZY contributed to the architecture, property, and training algorithm of fuzzy-weighted echo state networks. YL drafted the manuscript and contributed to the experiments and conclusions. All authors agree to be accountable for the content of the work.

FUNDING
This work was financially supported by the China Postdoctoral Science Foundation (Grant No. 2020M670785).