Cost-Sensitive LightGBM-Based Online Fault Detection Method for Wind Turbine Gearboxes

In practice, faulty samples of wind turbine (WT) gearboxes are far smaller than normal samples during operation, and most of the existing fault diagnosis methods for WT gearboxes only focus on the improvement of classification accuracy and ignore the decrease of missed alarms and the reduction of the average cost. To this end, a new framework is proposed through combining the Spearman rank correlation feature extraction and cost-sensitive LightGBM algorithm for WT gearbox’s fault detection. In this article, features from wind turbine supervisory control and data acquisition (SCADA) systems are firstly extracted. Then, the feature selection is employed by using the expert experience and Spearman rank correlation coefficient to analyze the correlation between the big data of WT gearboxes. Moreover, the cost-sensitive LightGBM fault detection framework is established by optimizing the misclassification cost. The false alarm rate and the missed detection rate of the WT gearbox under different working conditions are finally obtained. Experiments have verified that the proposed method can significantly improve the fault detection accuracy. Meanwhile, the proposed method can consistently outperform traditional classifiers such as AdaCost, cost-sensitive GBDT, and cost-sensitive XGBoost in terms of low false alarm rate and missed detection rate. Owing to its high Matthews correlation coefficient scores and low average misclassification cost, the cost-sensitive LightGBM (CS LightGBM) method is preferred for imbalanced WT gearbox fault detection in practice.


INTRODUCTION
With the increase in the capacity of wind turbine assembly machines, wind power generation brings economic benefits and also raised important crucial challenges related to reliability (Qiao and Lu, 2015;Wang et al., 2019). On the one hand, wind power generation technology has been developed rapidly, but wind turbine (WT) fault detection and condition monitoring technologies have not been improved accordingly, which results in frequent WT faults that cannot be timely maintained; on the other hand, WTs are often located in remote areas with rich wind resources and operated in harsh working environments for a long time, which can easily cause frequent WT faults . Gearboxes are often operating under tough circumstances, which will cause a high fault rate and irreversible damage to WT. The wind turbine gearbox faults will inevitably affect the performance of WT (Teng et al., 2016a). Therefore, fault detection of WT gearbox is of great significance for reducing the operation and maintenance cost of WTs and improving the power generation efficiency of the entire wind farm .
WT is a typical complex system, and its operating status is complex and changeable, which brings difficulties to the fault detection and condition monitoring of WTs (Ra et al., 2021;Song et al., 2021). The fault mechanism is complicated, and the correlation characteristics between kinds of feature vectors under various fault types will be different . Data in the wind turbine SCADA system are usually highdimensional data, so it is necessary to reduce the dimensionality of the big data in the SCADA system. For instance, Amirat et al. (2018) proposed an ensemble empirical mode decomposition fault diagnosis method. The Pearson correlation analysis method was implemented to select the closest intrinsic mode function and to analyse the data correlation. Yang et al. (2019) adopted the convolution neural network fault diagnosis method, in which the Spearman rank correlation analysis is used to sort the relevant image layers of the convolutional neural network and comprehensively extract data features (Long et al., 2018;Long et al., 2021a). This method effectively verifies the necessity of feature selection and improves the fault detection rate. Since data in the SCADA system are interrelated, feature selection and reducing the dimensionality of the big data of WTs will increase the availability of data samples and improve the accuracy of fault detection (Long et al., 2021b).
Many scholars and experts have carried out extensive and indepth research on WT fault detection and diagnosis methods, including signal processing methods, multivariate statistical methods, and classification algorithms (Jiang et al., 2015). For example, Teng et al. (2016b) proposed a complex wavelet transform for multifault detection of the WT fault detection method. By analyzing the multiscale enveloping spectrogram, the fault characteristics of weak bearings can be detected and fault diagnosis of WTs can be realized. Due to the nonlinear and nonstationary characteristics of the gearbox, Han et al. (2020) considered the correlation between variables and used a quantitative diagnosis method for gearbox faults based on generalized canonical correlation analysis, which can effectively identify the severity of gearbox faults under various conditions. Gao et al. (2018) explained the drawbacks of the current support vector machine (SVM) algorithm and proposed the WT fault diagnosis method based on the least squares support vector machine. Zheng and Peng (2019) used an improved AdaBoost-SVM method for WT converter fault diagnosis, the wavelet transform is employed to reduce signal noise, and fault feature vectors are input into the improved AdaBoost-SVM classifier to achieve fault diagnosis. Zhang et al. (2018) proposed a wind turbine fault diagnosis method combining Random Forest (RF) and extreme gradient boosting (XGBoost) that were used to establish the data-driven WT fault detection framework. RF is used to rank the features of WTs by importance, and XGBoost trains the ensemble classifier for each specific fault. This method is able to protect against overfitting, and it achieves better wind turbine fault detection results than SVM when processing multidimensional data. Tang et al. (2020) adopted the WT gearbox fault detection method that combines correlation analysis and improved LightGBM. The maximum information coefficient analysis method is adopted to select features for the big data of WTs. The improved LightGBM is implemented by the Bayesian optimization for classification so as to diagnose the fault of WT gearbox. However, the fault diagnosis performance needs to be improved when the data are imbalanced.
To this end, the current fault diagnosis methods for WTs are generally based on machine learning (Stetco et al., 2019), that is, dealing with the existing data to train a fault diagnosis model and using this model to realize fault diagnosis. Machine learning algorithms have been employed to solve the problem of WT fault detection, in which samples are assumed based on a balanced distribution. Most of the current data-driven machine learning methods assume that the number of normal samples and fault samples are close. However, normal samples are specifically much greater than the number of fault samples in the real industrial field. This means that many machine learning methods fail in dealing with imbalanced data and the majority class has higher recognition rate while the minority class fails. During the operation of WTs, faults occur for a short period of time and most of the condition are in normal conditions; therefore, the fault sample is the minority class, and the normal sample is the majority class. However, traditional machine learning methods for WTs fault diagnosis do not consider the data imbalance problems and the losses caused by fault alarms and missing detection. The Gini coefficient and information gain rate are considered as the optimization target, in which the misclassification cost is not introduced in the base classifier evaluation function and the fault detection performance is not very well.
The contributions of this article are summarized as follows: 1) The fault diagnosis method takes misclassification costs into account, and the optimization objective aims to minimize average total cost, which will effectively improve the fault detection rate. The efficiency of the base classifier has been improved, especially in terms of their ability in WT fault detection. 2) Since the fault sample is the minority class and the normal sample is the majority class, a method based on cost-sensitive LightGBM WT fault detection is proposed to deal with the imbalance data distribution problem. Specifically, the cost function is introduced in the weight formula of the LightGBM algorithm to replace the information gain, so that the algorithm pays attention to the minority class in each iteration update, thereby improving the classification effect of imbalanced data. 3) Spearman rank correlation method is used for WT feature selection replacing the raw dataset studied with new attributes ranked in order of correlation; thus, it can help to reduce both the redundancy and the dimension between WT feature datasets and ensure to remove redundant and irrelevant information in the original feature space. 4) Experiment shows that the proposed method can quickly perform fault diagnosis of WTs. Compared with other cost-sensitive ensemble algorithms, the cost-sensitive LightGBM is more suitable for highly imbalanced data and can achieve more accurate fault classification. The experiment verifies the effectiveness and validity of the proposed method.

RELATED WORK
The cost-learning methods aim to minimize total misclassification cost rather than total error. Cost-sensitive learning has attracted significant attention from researchers and scholars. Knoll et al. (1994) proposed misclassification costs to improve the classification accuracy of the decision tree. Domingos (1999) proposed MetaCost, which made use of the bagging algorithm to making the classifier cost sensitive by wrapping a cost-minimizing method. Many scholars attempted to adjust the classifier to be cost-sensitive by adding the cost function to train the algorithm. Among these works, Fan et al. (1999) presented the AdaCost method to reduce the cumulative misclassification cost more than AdaBoost. Fumera and Roli (2002) proposed cost-sensitive SVM under the framework of the structural risk minimization induction principle via minimizing the associate risk. Tremendous cost-sensitive learning has been conducted to improve the classifier performance. It is noteworthy that how to train LightGBM algorithm under an imbalanced fault diagnosis situation is still a problem for real WT fault diagnosis.

BACKGROUND Spearman Rank Correlation Method for Feature Selection
Since data from WTs are big and the feature correlation between data has some problems such as low correlation and redundant features, it is necessary to use feature selection on the big data of WTs. The commonly used correlation coefficients include the Pearson linear correlation coefficient, Kendall rank correlation coefficient, Spearman rank correlation coefficient, and tail dependence coefficients (Bonett and Wright, 2000). Since the correlation analysis of characteristics of WTs showed nonlinear correlation between variables, while the Kendall rank correlation coefficient and Spearman rank correlation coefficient have similar properties, the Spearman rank correlation coefficient is used in this work (Croux and Dehon, 2010). The Spearman rank correlation coefficient is designed to measure the linear or nonlinear relationship of variables. Given two discrete features of x and y and M data samples, the Spearman rank correlation coefficient can be calculated by the following formula: (1) Here, y i , which can also be rewritten as follows: r s cov x, y σxσy . (2) Here, cov represents the standard deviation, and σ is the covariance.
Spearman rank correlation coefficient r s ranges from −1 to 1. When r s 1, it means that x and y are relatively positively correlated, r s −1, it means that x and y are strictly negatively correlated, and r s 0, it means that the two features are independent of each other.
The Spearman rank correlation coefficient is designed to measure the correlation between features. If the index with a higher correlation coefficient is directly deleted, some features may be missing. To ensure that the redundancy between fault features is reduced and the information of different features is retained, the feature with the highest Spearman rank correlation coefficient in the raw dataset is selected, while the other fault features and features with high linear correlation are classified into a set of feature sets according to the threshold, until fault features in the original data set are eliminated or selected. The feature selection method is shown in Figure 1.

Cost-Sensitive Learning
Since traditional classification algorithms are not suitable for imbalanced data, a cost-sensitive method was developed (Turney, 1994), that is, by introducing a misclassification cost in attribute splitting instead of information gain, Gini coefficient, and other indicators, which aims to minimize the average total cost and improve the prediction of the minority samples (Tang et al., 2019).
The misclassification costs are usually described as a cost matrix, as shown in Table 1.
In Table 1, C F is the fault class, C N is the normal class, F(C F ,C F ) represents the cost of the fault class being correctly classified as the fault class, F(C F ,C N ) represents the cost of the fault category being wrongly classified as the normal category, F(C N , C F ) represents the cost of the normal class being wrongly classified into the fault class, and F(C N , C N ) represents the cost of the normal class being correctly classified into the normal class.
Given the misclassification cost matrix C, if the actual class is j and the predicted category class is i, if i j, the prediction is correct. The best prediction result of the sample x should be the class that minimizes the expected total sample: Here, P(C j |x) is the posterior probability of classifying sample is the class label of x i , and y i 1 indicates a small number of samples, namely, fault samples. Generally, F(C F ,C N ) > F(C F ,C F ) and F(C N , C F ) > F(C N , C N ). The essence of cost-sensitive classification is that even if the sample x is more likely to be assigned to a certain category, x needs to be classified into the class that minimizes the cost.

LightGBM Classifier
LightGBM is an improved variant gradient boosting decision tree (GBDT) framework based on the decision tree algorithm (Ke et al., 2017). Given the supervised learning dataset X {(x i , y i )} N i 1 , where x represents the samples data and y represents the class labels, the aims of LightGBM algorithm is to find a mapping relationship F(x) to approximate the function F(x), so as to minimize the loss function Ψ (y, F(x)), and the objective function Obj (t) can be expressed as follows: Here, Ω(f k ) represents the regular term.
In LightGBM, Newton's method is used to quickly approximate the objective function.
where g i and h i represent a first-order loss function and a secondorder loss function, respectively.
The information gain in LightGBM is defined as follows: Compared to the GBM algorithm, the LightGBM algorithm is more efficient in processing high-dimensional big data. This is because of exclusive feature bundling (EFB) algorithm and gradient-based one-side sampling (GOSS) algorithm in LightGBM. The GOSS method introduces a data instance with a constant multiplier and a small gradient, which can sample the data from the big dataset that has the same distribution and characteristics as the raw data and ensuring the classification accuracy while improving the classification speed. In the highdimensional space, the data are sparsely coded, while in the sparse feature space, nonzero values rarely appear at the same time. The EFB method is used for feature sampling to bundle two features to form a new feature which can decrease the data sample. Besides, the traditional gradient boosting method uses an exhaustive attack method to find segmentation features and thresholds, while LightGBM uses a histogram-based method to find suboptimal solution segmentation features and thresholds and reducing calculation time. Specifically, a certain feature of the data is discretized into a histogram algorithm and the discretized value is used as an index to accumulate statistics in the histogram. After data traversal, the histogram accumulates the required statistics and then according to the discrete value of the histogram, traverses to find the optimal split point. The tree of XGBoost is grown by the level-wise tree growth method (Mitchell and Frank, 2017;Chen and Guestrin 2016) and leaf-wise tree Frontiers in Energy Research | www.frontiersin.org August 2021 | Volume 9 | Article 701574 growth in LightGBM; however, leaf-wise splits lead to an increase in complexity and may lead to overfitting, and a tree grown with leaf-wise growth will be deeper when the number of leaves is the same. Figure 2 is a schematic presentation of two tree growth methods. Because the minority fault samples and majority normal samples of WTs and the LightGBM algorithm are more focusing on the classification of the majority sample, the cost function is introduced to replace the information gain in the weight formula of the algorithm to form the cost-sensitive LightGBM algorithm (Elkan 2001). In each iteration update processing, the algorithm will pay much attention to the minority class which improves identification of the minority class.

Cost-Sensitive LightGBM Algorithm
For binary classification problems, the commonly used logistic loss function of LightGBM is the logistic loss function, and the expression is as follows: where P represents the posterior probability. In the log loss function of the cost-sensitive LightGBM algorithm (Zheng and Peng, 2019), we replace P(x i )with the following: where δ F(CF ,CN )+F(CN ,CF ) 2 ,η 1 2 log F(CN ,CF ) F(CF ,CN ) , and the cost-sensitive logic loss function can be simplified asb follows: where P(c F|x i ) represents the posterior probability of dividing the sample x i into the fault class and P(c N|x i ) represents the posterior probability of dividing the sample x i into the normal class (Mitchell and Frank, 2017). Obviously, there is P(c F|x i ) 1 − P(c N|x i ). According to Eq. 5, the objective function of CS LightGBM can be written as follows: where Ψ is the loss function and Ω is the regular term. According to the second-order Taylor expansion, the objective function can be rewritten as The first-order loss function g i and the second-order loss function h i of x i are as follows: Given the structure of the tree, the optimal weight w p j of each leaf node is obtained as follows: The algorithm of cost-sensitive LightGBM (CS LightGBM) is given as follows.

COST-SENSITIVE LIGHTGBM FAULT DETECTION MODEL
In order to minimize the loss caused by fault alarm and missed detection due to the imbalanced data of WTs, the CS LightGBM fault detection model is established. The WT fault detection process can be divided into two parts: offline modeling and online detection, as shown in  The main steps of online fault detection are given as follows: Figure 3 gives out the basic framework of cost-sensitive LightGBM algorithm. The complete fault detection procedure including offline training and online detection is shown in Figure 3. Specifically, the procedure has gone through five phases, namely, data extraction, data preprocessing (normalization), feature selection (Spearman rank correlation), model optimization, and decision making.

EXPERIMENTAL CASE
The main structure of WT is shown in Figure 4. The main components of WT include a wind wheel, gearbox, generator, converter, yaw system, pitch system, and hydraulic system. Among these subsystems, the gearbox failure will cause a high fault rate and irreversible damage to WT. In order to verify the effectiveness of CS LightGBM compared with other cost-sensitive ensemble learning methods in the detection of WT gearbox faults, a comparative experiment was set up. The experimental steps are given as follows: 1) Collect raw data from the SCADA system and perform data preprocessing: 2) Use Spearman rank correlation analysis methods to perform feature selection on the extracted features 3) Divide the training set, test set, and validation set into the existing dataset and establish the CS LightGBM offline model 4) Perform online detection based on the established CS LightGBM model 5) Evaluate the fault detection method of CS LightGBM and calculate the false alarm rate, missing detection rate, and Matthews correlation coefficient

Feature Extraction
In order to verify the performance of the gearbox fault detection model, the 1.5 MW WT in a wind farm was used as the research object. A 3-year gearbox dataset is extracted from the SCADA data. The sampling interval is 2 s. Through the analysis of the WT gearbox mechanism and expert experience, the data within the period time from 30 min before the start of the fault to 30 min after the fault were selected as the experimental data. A part of the raw data is shown in Table 2.
We select the datasets containing gearbox oil temperature overrun fault, gearbox oil filter pressure fault, and gearbox lubrication oil level fault from the SCADA normal operating condition data and record them as Dataset 1, Dataset 2, and Dataset 3, as shown in Table 3.

Feature Selection
Dataset 1-Dataset 3 contain 3 types of gearbox faults, including the error gearbox oil temperature overrun, error gearbox oil filter pressure, and error gearbox lubrication oil level. To deal with the feature selection of WT, the fault mechanism and the correlated parameters of each fault are analyzed as shown in Table 4.
The gearbox bearing temperature information is used to evaluate the health of the gearbox. When selecting the state parameters, parameters that have a greater impact on parameters are mainly selected. According to the Spearman rank correlation coefficient analysis method, the correlation strength between each state parameter and the gearbox bearing temperature is calculated, as shown in Table 5.
From the correlation analysis results in Table 5, it can be seen that there is a large difference between features and the gearbox bearing temperature. In order to avoid the influence of irrelevant and weakly related features on the gearbox fault detection, the correlation coefficient is selected between ± 0.50 to ± 0.95, which are shown in bold in Table 5.

Fault Detection Performance Evaluation Criteria
The four states including the normal state, gearbox oil temperature overrun fault, gearbox oil filter pressure fault, and gearbox oil level fault are, respectively, marked as Q [0,1,2,3], and the dataset is divided into four parts. By combining the three types of faults with the normal state in turn, we perform WT fault detection through the CS LightGBM algorithm to obtain four sets of classification indicators. In order to measure the classification of imbalanced data, the Matthews correlation coefficient (MCC) is introduced to evaluate the fault detection model. At the same time, the false alarm rate (FAR) and missed detection rate (MDR) are used as fault detection evaluation indicators. The mixed matrix of two classification problems is shown in Table 6.
In this study, true positive (TP) is the number of samples correctly identified as faulty; false positive (FP) is the number of samples wrongly identified as fault free; true negative (TN) is the number of samples correctly identified as fault free; and false negative (FN) is the number of samples wrongly identified as faulty. The indicators under the binary classification are as follows:

Results and Discussion
The experimental data are extracted from a 3-year SCADA dataset of a wind farm. The experiment verifies the effectiveness of the proposed cost-sensitive LightGBM for fault detection of WT gearboxes. In order to further verify the The bold values represents the correlation coefficient between ± 0.50 and ± 0.95. superiority of the method, three advanced fault diagnosis methods were compared, including cost-sensitive AdaBoost (AdaCost), cost-sensitive GBDT (GBDTcost), and costsensitive XGBoost (XGBcost). By using different evaluation criteria in three different datasets, Figures 5, 6 show FAR and MDR under different algorithms, respectively. In order to avoid overfitting in the model, a five-fold cross-validation method is used to evaluate the model. The smaller FAR and MDR mean better performance.
The comparison results of the proposed method and the AdaCost algorithm, GBDTcost algorithm, and XGBcost algorithm under different fault conditions are shown in Figures 5, 6, respectively. It can be seen that the cost-sensitive LightGBM method is lower than the other three algorithms in terms of FAR and MDR, and the XGBcost criteria are generally better than the AdaCost and GBDTcost methods. When analyzing failure dataset 2, the FAR index of the CS LightGBM method is only 1.43% and the MDR index is only 1.01%. This method has good fault detection performance. The traditional cost-sensitive Boost method has high false positives and high false positives in the fault detection process, while the false negative and false positive  rates of the CS LightGBM method are lower than those of the other three methods. Figure 7 shows the MCC of three different fault datasets. The MCC can also be used in the case of imbalanced samples. The closer the MCC is to 1, the better the performance of the method. It can be seen from Figure 6 that the MCC of the cost-sensitive lightGBM method in dataset 2 is as high as 99.61% and the MCC of the remaining datasets is higher than that of the AdaCost, GBDTcost, and XGBcost.

CONCLUSION AND FUTURE WORK
WT gearboxes are operated in harsh conditions for a long time, the fault rate will increase, and it is extremely prone to faults. The accuracy of its diagnosis is often affected by many factors such as harsh environments and extreme weather. In order to improve the accuracy of fault diagnosis, the shortcomings of traditional algorithms are analyzed and compared, and a fault detection method based on CS LightGBM is proposed. The innovation is mainly reflected in the following two aspects: 1) The fault characteristics of the WT gearbox are analyzed, the fault features are extracted, and its fault feature indexes are obtained, by using the correlation between the feature correlation to improve the fault diagnosis performance 2) A method based on CS LightGBM is proposed and applied to the actual fault diagnosis of WTs and compared with the traditional cost-sensitive Boost methods The experimental study demonstrated that existing algorithms had a low ability of wind turbine's fault detection. Two points that should be noticed are that the existing algorithm did not perform well just because it is not specially designed for wind turbine fault detection and it still has distinguished competences in industrial fault diagnosis and other fields. The cost-sensitive LightGBM is mainly suitable for imbalanced data, but its ability for other fault diagnosis remains unknown.
A single algorithm cannot detect all the faults in the WT, so the combined algorithm will become the research topic in the future. The comprehensive simulation of WT fault conditions will also be our research topic in the future. That is for all units of WT that are interconnected, and their features are strongly coupled. The occurrence of a fault in a particular component affects all the remaining units. Therefore, it is necessary to establish more compound fault models to conduct a comprehensive analysis of the WT system (Iranmehr et al., 2019).

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material; further inquiries can be directed to the corresponding author.