Monitoring and Identifying Wind Turbine Generator Bearing Faults Using Deep Belief Network and EWMA Control Charts

Wind turbines are widely installed as the new source of cleaner energy production. Dynamic and random stress imposed on the generator bearing of a wind turbine may lead to overheating and failure. In this paper, a data-driven approach for condition monitoring of generator bearings using temporal temperature data is presented. Four algorithms, the support vector regression machine, neural network, extreme learning machine, and the deep belief network are applied to model the bearing behavior. Comparative analysis of the models has demonstrated that the deep belief network is most accurate. It has been observed that the bearing failure is preceded by a change in the prediction error of bearing temperature. An exponentially-weighted moving average (EWMA) control chart is deployed to trend the error. Then a binary vector containing the abnormal errors and the normal residuals are generated for classifying failures. LS-SVM based classification models are developed to classify the fault bearings and the normal ones. The proposed approach has been validated with the data collected from 11 wind turbines.


INTRODUCTION
Wind energy is the fastest growing form of renewable energy. Continuous operations in all environmental conditions contribute to failures of wind turbine components, assemblies, and systems. The generator of a wind turbine is one of the most failure-prone assemblies due to the variable loads (Kusiak and Verma, 2012). Bearing failures account for more than 40% of the overall wind turbine generator failures leading to unexpected energy losses (Tavner et al., 2012). Hence, a solution for effective condition monitoring of generator bearings and early identification of failure symptoms is needed.
Deteriorating performance of a generator bearing manifests itself on abnormal changes of the vibration signal, torque, and bearing temperature (Yang et al., 2017;Feng et al., 2020). Vibration analysis and data-driven approaches have been applied for condition monitoring of generator bearings . The frequently used classical vibration analysis approaches include Fourier transformation (Klein et al., 2001), wavelet transform (Yan et al., 2014), Hilbert-Huang transform (Peng et al., 2005;Huang and Wu 2008), and empirical model decomposition (EMD) . Other models have been developed. Teng et al. (2016) utilized a complex Gaussian wavelet to obtain the multi-scale enveloping spectrogram for extracting weak features. Lei et al. (2013) applied an ant colony algorithm to form adaptive stochastic resonance method for failure detection. Peeters et al. (2018) integrated automated spectrum editing procedure, band-pass filtering and envelop analysis to detect bearing failures based on the vibration signal. Vibration analysis approaches are valuable in monitoring and diagnosis of generator bearing failures. However, high frequency data from multiple vibration sensors is needed to perform such analysis. However, at present high frequency data is not available from industrial turbines due to the excessive cost and data sharing practices.
Most commercial wind turbines are equipped with the supervisory control and data acquisition (SCADA) systems collecting data that can be used to model behavior of generator bearings. Kusiak and Verma. (2012) applied a neural network to model bearing temperature for failure prediction and identification. Guo. (2012) introduced nonlinear state estimate technique (NSET) for temperaturebased failure detection. Yang et al. (2013) applied correlation analysis and quantitative assessment based on the SCADA data. The published literature indicates that the data-driven methods provide robust bearing monitoring solutions for wind turbines.
Deep learning is a recent addition to the modeling suite with promising applications in multiple domains (Ouyang et al., 2017;Sun et al., 2020a;Sun et al., 2020b;Shen et al., 2021a;He et al., 2018;Li et al., 2018). The deep learning algorithms are capable of extracting in-depth features and patterns within the training dataset (Gritsenko et al., 2017;Ouyang et al., 2019;Li et al., 2020;Shen et al., 2021b;Shen and Raksincharoensak 2021). Within the wind energy sector, it has been applied in the prediction tasks of wind speed (Hu et al., 2016), wind power (Wang et al., 2017), and wind direction (Wang et al., 2016a;Li et al., 2021a). Extensive research has also been published using the deep-learning approaches: Wang et al. (2016b) developed deep autoencoders to compress the time-series SCADA dataset and the blade breakages are extracted from the deep-learned features. Yang et al. (2018) applied stacked Restricted Boltzmann Machines (RBMs) to capture the system-wide patterns and then performed condition monitoring with promising results. Bach-Andersen et al. (2018) selected 1dimensional convolutional neural networks (CNN) to extract temporal features to classify failures of gearbox bearings. Overall, deep-learning algorithms support development of higher complexity models.
In this research, a deep-learning approach is explored to monitor generator bearings. A deep belief network (DBN) integrated with back-propagation (B-P) fine-tuning and layerwise training is developed to model normal generator bearing temperature using SCADA data. Four data-driven models predicting normal bearing temperature are constructed. Their performance is assessed with the absolute percentage error (APE), the mean absolute percentage error (MAPE) and the root mean square error (RMSE). The analysis of industrial SCADA data indicates that that bearing failure is preceded by the error shift. The exponentially weighted moving average (EWMA) control chart is applied to monitor the error shift. A temporal binary vector is generated in real-time, and a final failure classification model is developed. The benefits of the proposed approach are demonstrated with computational experiments.

RESEARCH METHODOLOGY
The use of deep-learning algorithms in prediction and condition monitoring is growing (LeCun et al., 2015). Deep learning originates from the research in neural networks. Deep-learning algorithms avoid the local optima dilemma and contains superior power in extracting globally robust features from the dataset (Deng and Yu 2013;Qiu et al., 2017).

Deep Belief Network
In this research, a deep belief network (DBN) is applied to model the generator bearing temperature. Proposed by Hinton et al. (2006), the classical DBN algorithm multilayers of restricted Boltzmann machines (RBMs) and a logistic regression layer (Wang et al., 2016c).
The restricted Boltzmann machine (RBM) is a commonly used generative stochastic neural network (Hinton et al., 2006). It includes a visible layer of binary-valued neurons and a hidden layer of Boolean neurons (see Figure 1). The connection between the hidden layer and the visible layer is bidirectional and symmetrical. There are no inter-connections between neurons in the same layer.
Training a single restricted Boltzmann machine (RBM) involves the weight matrix between the two layers. The configuration of weight matrix is based on the energy function expressed in Eq. 1 (Wang et al., 2016c). The joint distribution of a visible layer vector and the hidden layer vector is expressed in Eq. 2 (Hinton et al., 2006). The activation functions of neurons in the visible and hidden layer are presented in Eqs 3, 4 (Hinton et al., 2006 where: v i is the number of neurons in the visible layer; h i is the number of Boolean neurons within the hidden layer; w j,i is the weight matrix between the visible layer and hidden layer; a i and b i are the biases of the two layers; and sig() denotes the logistic sigmoid function. Hence, the weight matrix and the layer biases are obtained in a layer-wise unsupervised pre-training described in the Section 2.2.

Layer-wise Pre-training
A deep belief network (DBN) includes multiple layers of restricted Boltzmann machines (RBMs) (Ouyang et al., 2019). Figure 2 shows the architecture of the proposed DBN. The first RBM of the DBN model consisting of a visible and a hidden layer (hidden layer 1) is pre-trained as an independent RBM. Then, the weight matrix of the first RBM is computed. The output of the first RBM becomes the input to the second RBM that includes two layers. The first layer (hidden layer 1) is treated as a visible layer of the second RBM while the second layer (hidden layer 2) is treated as the hidden layer. The weight matrix of the second RBM is computed. Hence, the weight matrices between the remaining hidden layers are obtained iteratively. Training each restricted Boltzmann machine (RBM) is accomplished with a stochastic gradient descent method (Hinton et al., 2006). Based on vector Eq. 2 of the joint distribution function between the visible and hidden layer, the objective function of the stochastic gradient descend method is expressed in Eq. 5 (Wang et al., 2016c).
where: a is the bias vector of the visible layer; b is the bias vector of the hidden layer; and w is the weight matrix between the two layers. The parameters of the objective function (a, b, w) are updated based on the gradients of the function expressed in Eqs 6-8. The updating rules are formulated in Eqs 9-11 (Hinton et al., 2006).
where: η is the learning rate; 〈〉 P(h|v) is the expectation of the conditional distribution with respect to the original input data; 〈〉 recon is the i-step reconstructed distribution obtained by the alternating Gibbs sampling scheme. The expectation of the reconstructed distribution is computed following the rules of contrastive divergence (Hinton, 2002).

Data-Driven Algorithms
Performance of the deep belief network (DBN) is compared with three algorithms, support vector regression machine (SVR), neural network (NN), and extreme learning machine (ELM). The support vector regression machine (SVR) is considered in this study includes a Gaussian kernel function (Drucker et al., 1997). The values of the model parameters (c and γ) are selected based on the 10-fold cross-validation. The neural network (NN) contains two hidden layers. By testing on a small portion of the training data, the sigmoid activation function is selected based on the satisfactory performance. The extreme learning machine (ELM) algorithm (Liang et al., 2006) is utilized to model the normal bearing temperature. As a single-hidden layer feedforward network, the ELM learning model is expressed in Eqs 12, 13 (Liang et al., 2006).
where: x j represents the input parameters; o j represents the predicted output values; f L () is the non-linear function representing the ELM algorithm; a i is the weight vector connecting the i th hidden node and the input nodes; b i is the threshold of the i th hidden node; β i is the weight vector connecting the i th hidden node and the output nodes; and t j is the actual output value.

Performance Evaluation Metrics
To assess prediction accuracy of the deep belief network, three performance evaluation metrics are computed: the absolute percentage error (APE) Eq. 14, the mean absolute percentage error (MAPE) Eq. 15, and the root mean square error (RMSE) Eq. 16.
where: o j is the j th predicted generator bearing temperature; t j is the j th actual generator bearing temperature; N denotes the number of data points.

Exponentially Weighted Moving Average Control Chart
The increasing value of the prediction bearing temperature error of a data-driven model reflects deterioration of the generator bearing conditions. In this research, an exponentially weighted moving average (EWMA) (Jones et al., 2001) control chart is applied to monitor the error. The weighted average of the past bearing temperatures reduces the noise and allows detecting small process shifts.
To compute the upper and lower confidence limits of the EWMA control chart, the EWMA t is obtained from Eq. 17 (Wang et al., 2016b). The upper and lower confidence limits can be computed from Eqs 18, 19 (Horng Shiau and Ya-Chen 2005).
where: µ APE is the mean of absolute percentage error (APE); σ APE is the standard deviation of APE; N denotes number of samples. According to Horng Shiau and Ya-Chen. (2005), the value of the parameter L is commonly set to 3 and λ is usually set to 0.2.

Binary Vectors Generated by Control Chart
The EWMA control charts used statistical thresholds to label the prediction error (residuals) as normal and abnormal. The normal residual usually denotes the bearing temperature is within the normal range and the wind turbine is at healthy status. On the other hand, the abnormal values often indicate abnormal bearing temperature change and it can be the warning signal for bearing failures. Hence, in this research, the normal and abnormal residuals identified by the EWMA control charts are transformed into binary vectors as described in Figure 3 as follows. According to Figure 3, the statistical thresholds classified the residuals into normal and abnormal ones. Each data point can be simply labeled as 0 (normal) and 1 (abnormal). Hence, the binary vectors can be generated in real-time and be utilized as the inputs for the final classification models introduced in the Section 2.7.

Classification Models
Using the real-time vectors generated by the EWMA control charts, the final failure classification models are constructed in this research. Here, the dimension of the input vector is determined as 20 which represents all normal/abnormal prediction residuals of bearing temperatures. In total of four state-of-art machine learning algorithms including support vector machine (SVM), least-square support vector machine (LS-SVM), extreme learning machine (ELM), and kernel-based extreme learning machine (KELM) are selected to classify the vectors representing generator bearing failures and vectors from normal bearing behaviors.
The SVM is the state-of-art supervised learning algorithm used for classification and function approximation (Cherkassky and Ma, 2004). It is based on kernel functions and it avoids the difficulty of using linear functions in the high dimensional parameter space, and the optimization problem is transformed into a dual convex quadratic programming problem.
The LS-SVM is developed based on statistical theory and considered as the improved version of SVM (Zhu et al., 2018). Compared with the vanilla SVM, the LS-SVM modifies the inequality constraint in the SVM to the equality constraint. Meanwhile, the training error square is used to replace the slack variable in order to transform quadratic programming problem into the linear equation problem for greatly improving the speed and accuracy of model parameters. The LS-SVM has the unique superiority in dealing with the smallsample learning problem. The ELM is a feedforward neural network which contains the input layer, the output layer and one single hidden layer. Compared with other computationally expensive and timeconsuming neural networks, the ELM adopts Penn Moore pseudo inverse to determine the weights and biases between the hidden layer and output layer (Li et al., 2021b). This method enables ELM to learn faster and attain higher generalization capability compared with other neural networks.
The KELM uses the kernel method over the vanilla ELM and it solves the problem of random initialization of ELM and has high classification accuracy (Pandey et al., 2018;Ouyang 2021), good generalization ability and high degree of robustness. The Gaussian kernel function is the most frequently used kernel function and thus is selected in this study.

COMPUTATIONAL ANALYSIS
The data used in this research has been collected from SCADA systems of a large wind farm. The data 10 min resolution data from 11 wind turbines is used to investigate failure of a generator bearing. Two bearing failure instances have been reported during the period covered by the dataset.

Dataset Description and Preprocessing
The ranges of the generator bearing temperature of the 11 turbines are provided in Table 1. The bearing failure incidents are also included in Table 1. Based on the maintenance records, Turbine B, H, I, and K have been affected by bearing failures and are not considered for modeling normal bearing behavior discussed in the Section 3.2. Rather they are selected to test abnormal behavior of the bearing temperature.

Parameter Selection
To capture the normal behavior of a generator bearing, 33 parameters relevant to the bearing temperature have been initially considered. Using domain expertise, the number of parameters of interest was reduced to 12. Next, three algorithms (i.e., the wrapper with genetic search (WGS) (Kohavi and John, 1997), boosting-tree algorithm (BTA) (Sbihi, 2007), and the relief algorithm (RA)  were applied to select the most relevant parameters for predicting the generator bearing temperature. The wrapper approach uses supervised learning to perform 10-fold cross validation in selecting relevant parameters. The boosting-tree algorithm evaluates the importance of parameters by constructing a sequence of decision trees and computing the prediction residuals. The relief algorithm selects the parameter set by detecting conditional dependence between the parameters. The eight most important parameters selected by the three datamining algorithms are listed in Table 2.

Modeling Bearing Behavior
Data from three wind turbines (i.e., Turbine C, Turbine D, Turbine E) have been merged to train the neural network, support vector regression machine, the extreme-learning machine presented in Section 2.2, and the proposed deep belief network (DBN). Data collected from Turbine A, B, F and G are used as validation dataset to validate prediction performance of the proposed DBN algorithm. Data from Turbine G, J, I and K are used as testing dataset respectively. To design the DBN, the number of hidden neurons in each layer is set at 10% of the training data (Mitchell, 1999). The data from the remaining 2 healthy turbines (i.e., Turbine 9 and 11) are designated as test datasets to evaluate performance of the four algorithms. Table 3 presents prediction results produced by the four algorithms based for the test and validation datasets. The mean absolute percentage error (MAPE) and the root mean square errors (RMSE) produced by the DBN algorithm are the smallest which confirms the accuracy of the DBN model. This superior performance may be attributed to the layer-wise pretraining. Figure 4 illustrates prediction error from testing and validation produced by the deep belief network (DBN). The APEs of healthy wind turbines and turbines with bearing failures demonstrate different behaviors. Hence, the emerging bearing failure is indicated by the APE of the DBN model.

Condition Monitoring
In this section, behavior of the prediction error associated with the bearing failure is discussed. The APE was monitored for  1 week prior to the bearing failure. The upper confidence limit (UCL) and the lower confidence limit (LCL) of the exponentially-weighted moving average (EWMA) control chart are computed from Eqs 18, 19 of Section 2.4.
The monitored examples of healthy turbines and the turbines with emerging bearing failures are illustrated in Figures 5, 6. Figure 5 illustrates the EWMA charts of healthy turbines (Turbine G and J) while Figure 6 shows the wind turbines (Turbine I and K) with problematic generator bearings of the same wind farm. In Figure 5, all statistics fall within the control limits which indicates normal bearing behavior. Meanwhile, outliers in Figure 6 begin to emerge 1 week prior to the bearing failure and an early alarm is issued. According to the results presented in Figure 6, bearing failures are visible several days ahead of the occurrence. The proposed approach provides sufficient time to react and thus minimize power loss and downtime.
The outcomes of the EWMAs are transformed into the realtime binary vectors and then the bearing failure classification models are developed to classify the actual failures. However, in the temporal domain, the optimal size of the EWMA vectors are uncertain. Hence, this research performed several experiments by trying difference size of the EWMA vectors (i.e., K 10, 20, 30, 40). All algorithms introduced in the Section 2.7 are tested and the computational results are illustrated in Figure 7 below. The AUC is selected as the measurement It is obvious that all algorithms reached their peak classification performance when K 20 and thus it is selected as the optimal setting for the dimension of the input EWMA vector in our study.   As illustrated in Figure 8 below, the ROC curves for the four state-of-art algorithms are obtained with respect to the testing dataset. Among them, the LS-SVM achieves the highest area under the ROC curve (AUC) as 0.88 which demonstrates its superior performance in classifying bearing failures from the binary vector mixed with normal and abnormal prediction residuals. Meanwhile, the other performance metrics including accuracy, sensitivity and specificity along with the 95% confidence intervals are also provided in Table 4. The LS-SVM still performs best among all algorithms tested according to all evaluation metrics. Hence, using the vectors generated from the DBN and EWMA control charts, the LS-SVM is capable of classify the majority of the bearing failures in the temporal domain.

DISCUSSION
The condition-monitoring framework proposed in this study has provided promising results using field SCADA data. Overall, the advantages of the proposed framework can be summarized into the following three points: First, it uses deep belief network as the backbone regressor. It has shown superior power in extracting temporal abnormal features from the dataset. Second, the framework is designed to be implemented on SCADA data which is the standard data collection system for almost all wind farms across the globe. Hence, it can be widely implemented on practice. Third, the classification part can save a lor of labor and time. Conventional control chart-based identification of mechanical failures requires humans to detect the statistical outliers. Instead, in this research, the machinelearning classifiers enables the automation of this process. In sum, it can be widely applied in wind farms for condition monitoring tasks.
On the other hand, there are also few shortcomings at current stage. For example, the sensor errors can be a misleading factor that cause false classification of mechanical failures. The reliability of the SCADA sensors is not considered in this framework. This can be a future direction of our current research.

CONCLUSION
In this research, a deep-learning based condition-monitoring framework to identify bearing failures was presented in this study. Historical data collected from healthy wind turbines was utilized to develop a model predicting bearing temperature with a deep belief network. Data from both healthy wind turbines and turbines to the bearing failures are served as the testing dataset. Comparative analysis demonstrated that the deep belief network model was more accurate in predicting generator bearing failures. An exponentiallyweighted moving-average control chart was applied to capture shifts in prediction error. The control charts generated binary vectors lead to identification of the emerging bearing failure in real-time in the temporal domain.
Computational results reported in the paper validated accuracy of the deep-learning framework in condition monitoring of wind turbine generator bearings. In the future  research, analysis of high frequency vibration data may be coupled with the bearing temperature data for multi-scale condition monitoring.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusion of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
HL conceptualized the study, contributed to the study methodology, and wrote the original draft. JD contributed to the study methodology, data curation and investigation. SY and PF contributed to data analysis and investigation. HL contributed to software and formal analysis. DA contributed to investigation and writing-original draft. All authors have read and agreed to the published version of the manuscript.