Data-Driven Traction Substations’ Health Condition Monitoring via Power Quality Analysis

Electrified railway traction substations are an important part of the transportation system, the health of its operation condition indirectly affects the national economy. Generally, traction substations’ conditions are studied from their power quality, while the nonlinearity of loads and effects from the outside environment are factors mainly affecting the accuracy of condition monitoring. In order to recognize the status of traction substations intelligently and govern them with fast measurements, this paper proposed a data-driven approach for recognizing types of power quality problems, and developed a system with intelligent governance strategies. The proposed approach contains two parts. Firstly, a double discrete Fourier transform (DDFT) algorithm was developed to extract valid feature vectors from power data. Then, a well-known data-driven method, support vector machine (SVM), was applied to build classifiers. Finally, based on classification results, a strategy library for power quality problems was built. Industrial data of a real traction substation in Wuhan, China, was tested for the experiment. Compared with traditional methods, the proposed approach is validated to be useful in improving the classification performance of power quality problems, and fast and effective for governance in traction substations.


INTRODUCTION
As the major infrastructure in transportation systems, electrified railway traction substations are a powerful guarantee for national economic development and social progress . Due to the electric locomotive being a type of nonlinear load, it has nonlinearity, impact, and asymmetric distribution features in three phases of power electricity. Currently, there are serious challenges in the process of maintaining the power quality traction substation, guaranteeing the safe and stable operation of electrified railway (Hu et al., 2016). For example, the major power quality problems faced by traction substations include harmonics, voltage fluctuations, flicker, as well as negative-sequence current (Bitoleanu et al., 2016). These problems seriously reduced the power quality of both traction substations and superior power systems, and subsequently impact the safety and economic operation of the power system (Liu and Hu, 2017). Therefore, to ensure the stable power supply of traction substation systems and the safe operation of electrified railway, monitoring their health status and making targeted compensation measures are very important.
Power quality problems are dangerous not only to the traction power system but also to other power systems. Currently, the study of power quality problem analyses in literature are mainly based on voltage signals (Khadkikar et al., 2017), for example, voltage analysis via Fourier transformation, wavelet analysis, and so on. Considering the particularity of traction substations, higher computation speed is required to guarantee the real-time decision-making. It is not feasible to analyze the entire signal sequence as the object. Currently, according to global research, mathematical transformations were commonly used in power quality detection, such as the short-time Fourier transform, wavelet transform, s-transform, and bilinear time-frequency transform (De Yong et al., 2015;Mahela et al., 2015;Shen et al., 2022). Then, the transformed features have been utilized for fast and effective analysis. These transformation methods achieve some positive results to a certain extent. However, taking the actual requirements of rapid compensation in traction power quality into account, these methods still have some restrictions in practical applications, e.g. their complex calculations. On the other hand, according to the results of power quality analysis, how to make timely and effective control measures is also essential to traction substation systems. In power quality governance measures, the mixed compensation methods are usually applied, namely the coordinated control of fixed units and dynamic compensation devices (Mikkili and Panda, 2015;Lam et al., 2017). However, the power quality governance of traction substations is different from that of traditional power systems. It has something to do with the real-time operation status, and is related to factors such as weather, geographical distribution and so on, so it is a typical nonstationary random process. As the intelligent scheduling requirements have been proposed in recent years, when and what types of compensation methods are the major questions. Therefore, an intelligent governance system for a traction substation is required to solve power quality problems.
Aiming to tackle the mentioned issues above, in order to solve power quality problems in electrified railway traction substations intelligently, fast, and effectively, an intelligent system for substations' power quality problems is proposed based on data-driven method in this paper. It is mainly based on Double Discrete Fourier Transform (DDFT) with a sliding window and intelligent classification methods. Contributions of the proposed method are summarized as follows. First, considering power quality problems vary with time, only the latest period of data is used to analyze the power quality problem in this paper, instead of using a long sequence of historic data to extract features in traditional transformation methods. Meanwhile, a sliding window is proposed to guarantee the analysis process in real time. Second, in order to ensure less information is lost in the sliding window, DDFT is applied in the features analysis. DDFT can not only extract the basic physical characteristics of power quality at the first level as other transformation methods do, but also analyze the historical variation characteristics at the second level. In this way, less information will be lost, and the dimension of analyzed variables will not be increased dramatically. So DDFT could facilitate the rapid detection of power quality problems in traction substations. Third, advanced machine learning models are built for multi-category classification, they can be used to establish the power quality governance system combined with the expert system model. Then, in the power quality governance of real traction substations, the proposed system could analyze the power quality problems rapidly, then make a fast decision automatically based on the strategies library, and finally realize the effective governance of traction substation's power quality problems intelligently.

FRAMEWORK OF THE PROPOSED APPROACH
According to the description above, to maintain the safe and stable operation of electrified railway, the main purpose existing in a traction substation's power quality problems is to build a system for rapid analysis and intelligent governance. This paper proposed an intelligent system based on DDFT and machine learning (ML) algorithms, which mainly consists of two parts: fast detection of power quality problems based on a DDFT algorithm, and intelligent control and targeted governance based on ML. The framework of the main idea of the proposed approach is shown in Figure 1.
It can be seen from Figure 1 that the intelligent governance system can be divided into two parts: the training part and testing part. In the training part, in order to achieve the intelligent and rapid governance system for traction substation, some typical signals of power quality problems are generated firstly based on their mathematical definition models. The frequency-domain features of these power quality samples are extracted through the DDFT algorithm. On the other hand, based on the classes of different power quality problems, the expert system is used to determine their effective control measures or governance strategies, and these strategies consist of the governance library. Finally, taking the extracted features as inputs and the strategies library as the outputs, classification models are built based on ML algorithms. The optimal classifier is applied for the intelligent governance of power quality problems. In the testing part, the application objects are practical electric railways. The industrial data is often voltage signals of traction substation. The same as the training process, the frequency-domain features of industrial data should be extracted by DDFT at first. Then the feature vectors are input as the optimal classifier for decision-making. The results are the effective strategies based on the classifier and strategies library. Ignoring the classification process of determining what exact types of power quality problems the industrial signal belongs to, the optimal governance strategies can be given out directly according to the proposed system. In this way, it realizes intelligent governance of the traction substation's power quality problems, and keeps the safe operation of electric railways.

Notion of Power Quality Problems
The operation of electric railway causes several power quality problems to traction substations and outlet substations of power systems, including voltage fluctuation, voltage unbalance, voltage harmonics, and so on (Shen et al., 2021). In order to quantify these problems in practice, models of some common power quality problems including voltage rise, voltage drop, voltage flicker, and transient impact are defined (Singh et al., 2014;Wong et al., 2014;Shi et al., 2016), as expressed in (1-3).
1) Signal of voltage rise and drop Where, t 1 and t 2 represent the start and end time of disturbance respectively; ω is the angle frequency of carrier voltage on power frequency; α is the normalized voltage amplitude, when α> 1, f(x) represents the voltage rise signal, when α< 1, f(x) represent the voltage drop signal.
2) Signal of voltage flicker Where, m is the amplitude of amplitude-modulated (AM) wave, Ω is the angle frequency of AM wave.
3) Signal of transient voltage It can be seen that the transient voltage consists of multifrequency signals, such as mutations with damping attenuation Ae −λt and transient high-frequency signals N i 2 A i sin(iωt + φ i ), A represents the amplitude and φ i represents the phase deviation of the ith high-frequency signal. All these components lead to the power quality problems.

Double Discrete Fourier Transformation
According to the analysis of formula (1-3), the definition of power quality problems mainly considers the harmonics in the frequency domain and amplitude characteristics of the time domain. Conventional Fourier transform mainly analyzes signal characteristics from the perspective of frequency domain (Boashash, 2015). However, it has limitation in rapid detection of the traction substation's power quality problems in real-time. Wavelet transformation has the ability to express both time-domain and frequencydomain characteristics (Shen et al., 2020), but it is not easy to choose a suitable mother wavelet. After a comprehensive comparison, on the basis of discrete Fourier transform (DFT), this paper proposed the double discrete Fourier transform (DDFT) as the major tool to solve the electric power quality detection problems of traction substations.
To satisfy the real-time analysis of traction power quality, the DDFT algorithm is developed based on sliding-window iterative DFT (Zhan et al., 2016). Results of DFT usually reflect real-time information in the frequency domain, but they cannot distinguish all the power quality problems, which have an impact on the amplitude of fundamental frequency signals, e.g. voltage rise, voltage drop, and voltage flicker. Therefore, the proposed DDFT algorithm tries to extract fundamental sequences {X (k,t)} from the historical results of DFT transformation. The sequence is then used for Fourier transform for the second time. The purpose of this process is to obtain fundamental frequency information, which can be used to distinguish between temporary voltage rise, drop, and flicker problems. Figure 2 shows the technological process of DDFT. It can be seen that the original signal is analyzed based on Fourier transformation by taking sliding windows as units. In this way, the computational cost is reduced, and the calculation is speeded up for real-time analysis. After that, the fundamental sequence in the historical windows is extracted at the same size of sliding window, and processed with DFT algorithm once again. Therefore, the DDFT algorithm can not only extract the variation of the fundamental signal and other frequency information, but also maintain the real-time signal analysis, both of which are essential for the power quality analysis in traction substations.

Multi-Classes Classification Based on Machine Learning Models
For a given voltage signal in the traction substation {x n }, a series of transformed features are obtained through the DDFT algorithm. To determine the class that the power quality signal belongs to is a must for the decision-making of selecting appropriate control strategies and power compensation devices. This paper generated some typical power quality signals, and extracted their characteristics by DDFT. By regarding the mechanism of identifying power quality problems as the process of choosing a classifier, and collecting feature samples to be used to train the classifier, the classifier would complete the recognition of power quality problems. According to the above analysis, this paper applied ML-based algorithms to build the classifier.

EXPERIMENTS AND DISCUSSION
Training Part 1) synthetic power quality signals.
To train an optimal mechanism for power quality governance in a traction substation, the first task is to obtain the training dataset. In this paper, the training data is generated through the definitions of typical power quality problems. According to the definitions in (1-3), 10 sets of power quality signals with different parameters are constructed based on each mathematical definition.
2) Features extraction. For different types of power quality problems, it is subjective and not applicable to determine the classes that each power quality signal belongs to through by observation directly. It absolutely does not work in the analysis of actual operation of electric railways. In order to discriminate different types of power quality problems quantitatively, the DDFT algorithm is proposed to extract features of voltage signal for fast analysis, instead of the original signal sequences. Due to the amplitude of high-frequency harmonics always being low, to reduce the dimension and computation cost, only the DC component and 5-50 Hz harmonic components are taken as the main feature variables. Table 1 gives out the feature values of four typical power quality signals.
In Table 1, feature values of four typical power quality signals are presented, features of the corresponding original signals are also extracted and displayed. Here, X0 represents the DC component, and X1-X5 represent the 5-50 Hz harmonic components, respectively. It is seen that voltage rise and drop signals mainly reflect difference with other signals on the DC component and 5 Hz harmonics component. The voltage flicker signals mainly have some influence on the 10 Hz harmonics component. For more thorough feature analysis, a nonlinear model is required.
3) Establishing classifiers and strategies library.
To build a suitable classification model for analyzing power quality problems, the DDFT algorithm is applied to extract feature vectors x i at first. Taking feature vectors x i = [X (0,t), X (1,t), /X (N-1,t)] as inputs, the class labels of power quality problems as outputs, different ML classification models are established, such as SVM, ELM, NN (Ouyang, 2021;Ouyang Tang et al., 2021). Finally, the optimal ML model is selected. The essence of the proposed method is to construct classifiers based on ML algorithms, and to the constructed classifier to distinguish power quality problems automatically.

Testing and Validation Part
According to the above description, some typical power quality signals are generated as training data, and used to build classification models. Meanwhile, the governance strategies are given out according to expert systems and form the strategies library. Then, the intelligent governance system for traction substations' power quality problems is completed by combining classifiers and strategies libraries. In order to validate performance of the proposed system, including how effective it is to use the DDFT algorithm to extract features of power quality problems, the performance of using ML to classify different power quality problems, and the effectiveness and feasibility of the intelligent governance system in traction substation are discussed. For analyzing the effectiveness of the proposed approach quantitatively, this paper introduces the confusion matrix (Xiong et al., 2017) in performance evaluation. Here, four commonly used indicators are defined in (4), such as Recall (R), Precision (P), Accuracy (Acc), and Error Rate (ER) (Ohsaki et al., 2017).
In this paper, taking an actual traction substation in Wuhan China as an example, there are in total 1,000 voltage signals collected as the testing dataset. The frequency of these signals is 50Hz, and each signal has a period of 50 ms. First, necessary data preprocessing is needed, and the dataset is normalized. Then the power quality problems of the collected signals are tested according to the given intelligent governance system. According to the records of operators in traction substations, the distribution of these signals is displayed as 43 voltage variation, 16 voltage flicker, 19 voltage transient singles and 922 other signals. In other signals, most of the signals are normal signals, and a few of the signals are other undefined signals in this paper. Two typical signals collected from actual traction substations are shown in Figure 3.
In Figure 3, the variation of voltage signals for two typical power quality problems are shown. Figure 3A shows the signal of voltage flicker, and Figure 3B shows the impact of transient voltage signal. It is known that it is difficult for operators to determine what power quality problems a collected signal belongs to, and the period of a set of signals is too fast to capture. This is also the reason why an intelligent system is required to be developed for governing the power quality problems in traction substations.
To evaluate the performance of the proposed approach in the classification of power quality problems, 120 sets of actual signals are used for validation. Since normal signals take up more than 90% of the testing datasets, for classifying different types of power quality problems effectively, only 42 sets of normal signals are selected, and combined with other typical signals as the validation dataset. According to the description in Figure 1, the validation data is processed by DDFT to extracted feature vectors, then feature vectors are input into the obtained classifier. Based on the classification results, the suitable governance strategies could be determined according to the strategies library to control the harmful effects of power quality problems. In other words, the performance of classifying power quality problems of voltage signals of traction substation is directly related to the performance of the proposed intelligent governance system. Therefore, statistical analysis is completed to discuss the performance of the proposed approach. According to the definitions in (Eq. 4), values of four indicators are shown in Table 2.
In Table2, the performance of four indicators are presented. For comparison, three ML models using feature vectors extracted from DFT are also analyzed. From the results of Table 2, it is seen that all these methods have values of almost 85% on Recall, Precision, and Accuracy indicators, and the classifier based on ELM in power quality problems classification performs the best. Therefore, ELM could be selected for the construction of final intelligent system. On the other hand, comparing the feature vectors from DFT and DDFT, it is seen that the performance has been improved. Therefore, it is validated that the proposed approach has better performance on classification of power quality problems. While, for the governance of traction substations in the real world, the proposed system combined the advanced approach with the control strategies library, therefore it not only achieves a fast and accurate identification of power quality problems, but also makes a fast decision to govern traction substations' power quality by taking the corresponding measures from the strategies library. In this way, the proposed system can realize the intelligent governance of power quality in traction substations and guarantee the safety operation of electrified railways.

CONCLUSION
In this paper, an intelligent system based on data-driven and DDFT is proposed to govern the power quality problems of traction substations. First, definitions of several typical power quality problems are given out, and synthetic voltage signals based on these definitions are generated. By using the proposed DDFT algorithm to extract feature vectors of these signals, it shows that DDFT has advantages at extract features distinguishing different power quality problems. Using the extracted feature vectors as inputs, three ML-based classifiers are built to discriminate three types of power quality problems, and finally ELM is selected. Combined with the control strategies library from an expert system, the intelligent governance of power quality problems in traction substation is completed. The industrial data of actual traction substations were tested with the proposed approach and traditional approach, the numerical results validated the proposed approach improved the performance of classifying power quality problems. It is validated that the proposed system can realize the intelligent and fast governance of power quality problems in traction substations, and can guarantee the safe operation of electricity railways.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.