Capturing Spatial Influence in Wind Prediction With a Graph Convolutional Neural Network

Nowadays, wind power is playing a significant role in power systems; it is necessary to improve the prediction accuracy, which will help make better use of wind sources. The existing neural network methods, such as recurrent neural network (RNN), have been widely used in wind prediction; however, RNN models only consider the dynamic change of temporal conditions and ignore the spatial correlation. In this work, we combine the graph convolutional neural (GCN) with the gated recurrent unit (GRU) to do prediction on simulated and real wind speed and wind power data sets. The improvements of prediction results by GCN in all wind speed experiments show its ability to capture spatial dependence and improve prediction accuracy. Although the GCN does not perform well in short-term wind power prediction as the change of wind power data is not so smooth due to the limitation of turbine operation, the results of long-term prediction still prove the performance of GCN.


INTRODUCTION
In order to deal with the crises of fuel energy and environmental pollution, wind energy is being widely developed all over the world, and wind power is starting to play a hugely significant role in power systems (Liu and Guan, 2004). However, the large amount of wind power generation also brings great challenges to the stable operation of power systems due to the variable and stochastic nature of wind (Defferrard et al., 2016). For example, wind power fluctuation caused a challenge event reported by the Electric Reliability Council of Texas (ECROT). To mitigate the harmful impacts, accurate wind prediction is always required by system operators (Zhang et al., 2016).
Wind prediction, including wind speed and wind power prediction, uses a large amount of historical wind data to train a fitting model between inputs and outputs. Wind speed is the most relevant factor in determining wind power. In particular, the wind power curves for different types of wind turbines roughly have the same shape (Pinson, 2013); hence, one approach to modeling wind power is to model the wind speed and associate it to the wind power using wind power curves suitable for a given wind farm (Stohl et al., 1995).
Wind speed can be modeled as a spatiotemporal process as it evolves randomly in time and space. However, wind forecasting has always been a challenge due to its complex spatial and temporal dependences.
Existing wind speed forecasting models can be categorized as physical, statistical, and machine learning models. Physical models use mathematical expressions to model highly complex and nonlinear dynamics of atmospheric flow to produce numerical wind predictions (Yan and Ouyang, 2019). Gneiting et al. (2006) applied a second order model that uses the spatiotemporal covariance as a basis for estimation in Irish wind prediction. This model is quite suitable when the underlying process can be modeled as a Gaussian spatiotemporal field and is one basic type of spatiotemporal process used in the modeling of wind. Statistical methods rely on relevant historical data to predict future wind generation, traditionally using models such as the history average model (HA) (Liu and Guan, 2004) or the autoregressive integrated moving average model (ARIMA). A multichannel adaptive filter is used to forecast wind speed and direction using spatial correlations at multiple sites in Dowell et al. (2014). Markov chain-based statistical methods that leverage a graphic spatiotemporal learning-based model and statistical characteristics of aggregated wind generation are presented in He et al. (2014) to forecast wind generation output. As for machine learning methods, the recurrent neural network (RNN) is the most widely used neural network model in processing time series data. The long short-term memory (LSTM) model (Hochreiter and Schmidhuber, 1997) and the gated recurrent unit (GRU) model  are variants of the RNN that can avoid the gradient disappearance and explosion that can occur in RNN. The basic principles of the LSTM and GRU are roughly the same; they all use a gated mechanism to memorize as much long-term information as possible and are equally effective for various tasks (Chung et al., 2014). However, due to its complex structure, LSTM has a longer training time while the GRU model has a relatively simple structure, fewer parameters, and faster training ability.
However, both LSTM and GRU models only consider the dynamic change of temporal conditions and ignore the spatial dependence given that the change of wind condition is also restricted by the surrounding environment. Intuitively, the flow of wind in nearby regions affect each other. As there exists a strong correlation between wind sites located in a vicinity, the information collected from neighboring sites will help improve the prediction accuracy of target sites , which can be effectively handled by the convolutional neural network (CNN), which has shown its powerful ability to get spatial structural information (LeCun et al., 1998). A CNN uses filters to find relationships between neighboring inputs, which can make it easier for the network to converge on the correct solution.
The traditional CNN can obtain local spatial features in Euclidean space, but the relation network between each wind farm is in the form of a graph that is not a two-dimensional grid, which means the CNN model cannot deal with the complex topological structure of the wind relation network. However, the graph neural network (GNN) is good at handling the arbitrary graph structure data. To model the spatial correlations in graphstructured data, a graph CNN is developed to capture the spatial dependence, and a similar approach has been used in traffic prediction and shown good performance .
There has been some work that applied GNN in wind prediction such as Mei Yu's  and Mahdi Khodayar's work (Khodayar and Wang, 2018), but all these works were based on NREL data sets, which are simulation data sets and do not provide real wind speed or power data. They showed the results in a one-time experiment only, whereas the neural network model's performance will be slightly different each time as the character of itself.
In this work, to demonstrate that the GNN model does have the ability to utilize the spatial correlations to improve accuracy in real wind prediction, we combine the graph convolutional network (GCN) with the GRU model to do prediction on five different wind data sets, including three wind speed data sets and two wind power data sets. Moreover, the GCN-GRU model is validated not only in 10 m height but also in 100 m height wind speed prediction. Furthermore, the final results are based on a collection of 10 experiments for each data set. The forecasting results show a steady state under different prediction horizons, which indicates that the GCN-GRU model can not only achieve short-term prediction, but can also be used for long-term wind prediction tasks.

REVIEW OF THE LITERATURE
Recently, given RNN's strong ability in processing time series problems, improved LSTM and GRU models have been applied in wind prediction. Zhewen Niu proposed a novel sequence-tosequence model using the attention-based gated recurrent unit (AGRU) that improves accuracy of forecasting processes (Niu et al., 2020). A novel data-driven approach is proposed by Adam Kisvari in wind power forecasting by integrating data preprocessing, resampling, anomaly detection and treatment, feature engineering, and hyperparameter tuning based on gated recurrent deep learning models and critically compared with the algorithm of LSTM (Kisvari et al., 2021). Ruiguo Yu proposed an improved LSTM-enhanced forget-gate network model, abbreviated as LSTM-EFG, used in forecasting wind power (Yu et al., 2019). Xiaohui Yuan's simulation results showed that the PIWP obtained by the Beta-PSO-LSTM model has higher reliability and narrower interval bandwidth, which can provide decision support for the safe and stable operation of power systems (Yuan et al., 2019). In Yao Liu's work, a wind power short-term forecasting method based on discrete wavelet transform and LSTM networks (DWT_LSTM) is proposed. The discrete wavelet transform is introduced to decompose the nonstationary wind power time series into several components that have more stationarity and are easier to predict .
There is also some work that applies GNN in wind prediction, such as Mei Yu proposing the superposition graph neural network (SGNN) for feature extraction, which can maximize the use of spatial and temporal features for prediction. In the four offshore wind farms used in experiments, the mean square error of the method is reduced by 9.80%-22.53% compared with current advanced methods, and the prediction stability of the method has also been greatly improved . Mahdi Khodayar proposed GCDLA to capture spatial wind features as well as deep temporal features of the wind data at each wind site. Simulation results show the advantages of capturing deep spatial and temporal interval features in the proposed framework compared with the state-of-the-art deep learning models as well as shallow architectures in the recent literature (Khodayar and Wang, 2018). However, Mei Yu's work is about offshore wind, which is more stable than the wind on land as the surrounding terrain has a great influence on it. Mahdi Khodayar's work used the mutual information matrix to process the graph model. Furthermore, these two studies both worked on simulation data sets, and their results were based on a one-time experiment only, so it is hard to say that the improvement of prediction results by GNN is not occasional.

PROBLEM DEFINITION
In this paper, the goal of wind forecasting is to predict the wind speed or power in a certain period of time based on the historical wind information of several sites in a specific area. In our experiments, the wind information can be wind speed or wind power data.
Wind Network G As shown in Figure 1, the wind of the target site may be influenced by the wind flows from its surrounding sites, we describe these wind sites in a specific area as a wind network although this network is unseen. We use an undirected graph G (V, E) to describe the topological structure of the wind network. We treat each wind site as a node, where V is a set of nodes, V {v 1 , v 2 , . . . , v N }, N is the number of the nodes, and E is a set of edges between each wind site. The adjacency matrix A is used to represent the connection among wind sites, A ∈ R N×N . In this work, the adjacency matrix A is processed by normalization of a distance matrix, which is calculated by the distance between each wind site in the wind network. Note that distance is calculated in an "as the crow flies" sense and does not take the intervening topography into account.

Feature Matrix X N×P
We regard the wind speed or power as the attribute feature of the nodes in the network, expressed as X N×P , where P represents the number of node attribute features (the length of the historical time series) and X t ∈ R N×i is used to represent the speed or power on each site at time i. Again, the node attribute feature can be any  wind information, such as wind speed, wind direction, wind, or wind power.

MODEL
To evaluate the GCN's ability to capture the spatial features in wind prediction, we combine the GCN model with the GRU model and then compare the performance of the GCN-GRU model with the GRU model through several evaluation metrics. The flow chart of the GCN-GRU model is shown in Figure 2.

Spatial Model
The target site is influenced by a complex wind network formed by its surrounding sites, and in this work, we use the GCN to capture spatial correlations. Existing GCNs can be divided into two main categories: spatial-and spectral-based GCNs . Spatial-based GCNs directly define the convolutional operation by operating on spatially close-the first or second order neighbors-such that the convolutional operation using different weight matrices for nodes with different degrees (Duvenaud et al., 2015). Considering the wind farms are all connected if we treat them in a graph domain, we choose to use the spectral-based GCN to capture the effect of surrounding wind farms. In 2014, a spectral network was proposed in Bruna et al. (2013). It transforms the samples into Fourier domains to perform convolution operations through Fourier transform, and then the samples are transformed back to the graph domains through inverse Fourier transform. Specifically, the graph convolutional operation of the sample X ∈ R N can be defined as where p is the graph convolutional operation, U is the matrix of eigenvectors of the normalized graph Laplacian U ∧ U T with a diagonal matrix of its eigenvalues ∧ and U T X being the graph Fourier transform of X, and G W diag(U T G) is the filter parameterized by W ∈ R n . In Defferrard et al. (2016), the Chebyshev spectral CNN (ChebNet) that approximate G W by the truncated expansion of Chebyshev polynomials T k (X) up to K th is proposed. To circumvent the problem of being computationally expensive, G W can be approximated by a truncated expansion in terms of Chebyshev polynomials T k (X) up to the K th order: where W ∈ R K is a vector that consists of Chebyshev coefficients and λ max is the largest eigenvalue. Because the operation is a K th order polynomial in the Laplacian, it is K-localized. Moreover, if we set K 1 and λ max 2, we can get a linear function on the graph Laplacian spectrum: Furthermore, to restrain the number of parameters and alleviate overfitting problems, we assume that W W 0 W 1 (Berg et al., 2017) and then setÂ A + I n to do renormalization to avoid vanishing gradients problems (Kipf and Welling, 2016), which lead to the following formula: whereD denotes the diagonal matrix of node degree with D ii n j 1â ij . In summary, we use a two2-layer GCN model based on Eq. 5 to learn spatial features from wind data and it can be expressed as where X t represents the features matrix at time t, Λ is the adjacency matrix calculated by the distance between each wind site, Λ D − 1 2ÂD − 1 2 denotes preprocessing step in Eq. 4,Â A + I N is a matrix with self-connection structure, andD is a degree matrix. W 0 and W 1 represent the weight matrix in the first and second GCN layer, and σ( · ) and Relu() represent the activation function.

Temporal Model
After spatial processing by the GCN model, we get a new feature matrix X ' t , and we can have new time series data to do time prediction. In this work, as our key goal is to evaluate the GCN's performance in wind prediction, a simpler and faster model to do time forecasting is needed. As the GRU model has a relatively simple structure and fewer parameters than LSTM, we chose the GRU model to obtain temporal dependence from the wind data. Figure 3, h t−1 denotes the hidden state at time t − 1; X ' t denotes the wind information at time t; r t is the reset gate, which is used to control the degree of ignoring the status information at the previous moment; u t is the update gate, which is used to control the degree to which the status information at the previous time is brought into the current status; c t is the memory content stored at time t; and h t is output state at time t. The GRU obtains the wind status at time t by taking the hidden status at time t − 1 and the current wind information as inputs. While capturing the wind information at the current moment, the model still retains the changing trend of historical wind information and has the ability to capture temporal dependence.    h t u t ph t−1 + (1 − u t )pc t In these equations, X ' t is the new wind information processed by GCN, W and b represent the weights and deviations in the training process.

Loss Function
In the training step, the goal of the GCN-GRU model is to learn the well-fitted parameters by minimizing the error between the real wind data and the predicted wind value. We use Y t andŶ t to denote the real wind and the predicted speed or power data, respectively. The loss function is shown in Eq. 7. The first term is used to minimize the error between the real traffic speed and the prediction. The second term L reg is an L2 regularization term that helps to avoid an overfitting problem, and λ is a hyperparameter:

EXPERIMENTS
In this section, we evaluate the prediction performance of GCN-GRU model on five data sets.

Data Description
(1) Two Canadian climate wind speed data sets. One of them includes nine stations, and the other includes 19 stations. These real wind speed data are all recorded at 10 m height, and the stations are all located in Alberta. We chose the wind speed data from the historical climate data set, which can be downloaded from the Canada Environment and natural resources website. (https://climate.weather.gc.ca/historical_ data/search_historic_data_e.html).
(2) Two Eastern wind integration data sets. One includes wind speed data from 10 sites, and the other contains the corresponding wind power data at the same places. The wind speed data was downloaded from the National Renewable Energy Laboratory (NREL) website (https:// www.nrel.gov/grid/wind-integration-data.html), and the corresponding wind power data set was computed by the Renewable Energy Potential (reV) Model (Maclaurin et al., 2019) by a given set of latitude and longitude coordinates. Different from the Canada climate wind speed data, the NREL wind speed data are simulated by predictive models at 100 m height. (3) One NRGStream wind power data set. We chose seven wind farms listed by the Alberta Electric System Operator (AESO) and use the hourly wind power data from 2018 to 2019.
In the experiments, the input data was normalized to the interval [0,1]. In addition, 80% of the data was used as the training set and the remaining 20% was used as the testing set. We predicted the wind speed of the next 1, 2, and 3 h. The detailed information of these data sets can be found in Supplementary Materials.

Evaluation Metrics
To evaluate the performance of the GCN model, we use three metrics of the difference between the real wind data Y t and the prediction Y t , including (1) Root mean squared error (RMSE) (2) Mean absolute error (3) Accuracy

Parameter Settings
Our goal is to see whether the GCN can capture the environment information and improve the prediction result.
As the GCN-GRU and GRU share the same GRU model, we use the same parameters for GCN-GRU and GRU, such as the amount of GRU unit, the learning rate, the training epoch, and batch size. For the input layer, the training data set (80% of the overall data) is taken as input in the training process, and the remaining data is used as input in the testing process. The GCN-GRU model is trained using the Adam optimizer.

Experiment Results
In this part, we compare the performance of the GCN-GRU and GRU models and also show the prediction results with some other baseline methods, includin: (1) HA (Pinson, 2013), which uses the average wind information in the historical periods as the prediction; (2) ARIMA (Stohl et al., 1995), which fits the observed time series into a parametric model to predict future wind data; and (3) the support vector regression model (SVR) (Smola and Schölkopf, 2004), which uses historical data to train the model and obtains the relationship between the input and output and then predicts by giving the future wind data. We use the linear kernel, and the penalty term is 0.001. As the GCN-GRU model is also based on GRU, according to the experiments, we find that the GRU's prediction is not quite stable; that is, the result is a little different each time as the FIGURE 4 | Comparison of three wind speed prediction experiments between GCN-GRU and GRU; the light green areas are where GCN-GRU's prediction curves closer to the real curve, namely, GCN-GRU performs better than GRU in one-step wind speed prediction.
FIGURE 5 | Comparison of two wind power prediction experiments between GCN-GRU and GRU; the gray areas are where GRU's prediction curves closer to real curve, namely, GRU performs better than GCN-GRU in one-step wind power prediction.
Frontiers in Environmental Science | www.frontiersin.org March 2022 | Volume 10 | Article 836050 8 neural network optimization based on stochastic gradient descent to find the global minimum is different each time. To show that the prediction improvement by the GCN is not incidental, we repeat each experiment 10 times to calculate the standard error and show the range of the prediction results. As prediction accuracy of each station is different, Tables 1-3 give the result of a specific station whose prediction accuracy is best.
We can see that the neural network models, including the GCN-GRU and GRU models, achieve better prediction precision than other baselines, such as ARIMA, SVR, and HA. For example, for the 1-h wind speed prediction on the data set of "Canada climate wind speed experiment-nine stations," the RMSE of the GCN-GRU and the GRU models are approximately 5.4% and 4.1% lower than that of the ARIMA model, and the accuracy of these are improved by 1.4% and 1.1%. Compared with the SVR model, the RMSE of the GCN-GRU and the GRU models are reduced by 3.4% and 2.1%, and the accuracy is approximately 2.2% and 1.8% higher than that of the SVR model. The RMSE error of the GCN-GRU and GRU models are reduced by approximately 30.7% and 29.8% compared with the HA model, and the accuracies are approximately 14.2% and 13.9% higher than that of HA. Next, we focus on the comparisons between GCN-GRU's and GRU's results.
Tables 1, 2 show the prediction results of the GCN-GRU model, GRU model and other baseline methods for Canada climate wind speed data sets, NREL wind speed data, and NREL wind power data, respectively. By comparing the experiments of GCN-GRU and GRU in these four experiments, we can see that all the GCN-GRU results are better than the GRU model. It shows that GCN can capture some geographical information and improve the prediction performance.
In the nine-station Canada climate wind speed prediction, we can see that, comparing with GRU, there is 0.0831 improvement for RMSE evaluation and approximately 0.3% improvement in accuracy. In the nine-station Canada climate wind speed prediction, we can see that there is 0.3114 improvement for RMSE evaluation and approximately 1.8% improvement in accuracy. To explain the difference in the Canada climate nine-and 18-station experiments, although they are all wind prediction at 10 m height, we note that more wind stations can provide more surrounding information; that is why the 18 stations experiment's result is much better than nine stations.
By comparing the experiment results of NREL wind speed with the Canada climate wind speed experiment, we can see that the accuracy is much higher for all models. It mainly because the NREL data is about the wind at 100 m height, and the change of the wind is more stable than the wind of the Canada climate at 10 m height, which makes the model easier to predict. From these four experiments, we can see that the GCN model can improve the prediction accuracy.
However, we find that the prediction accuracy of all these models gets lower in NREL wind power prediction when we compare with NREL wind speed prediction. What is more, the GCN-GRU model almost achieves same prediction accuracy with the GRU model in 1-h step prediction. The reason for these unsatisfactory results mainly is that the NREL data sets are simulated by mathematic models other than real recorded data. Thus, we do another experiment based on the NRGStream data set, which records the real wind power from wind farms in Alberta, Canada.
According to the NRGStream wind power experiment, we can find that the GCN-GRU model has even worse performance in 1 h prediction. In the simulated and real wind power experiments, we can see that the GCN does not perform as well as in wind speed prediction. It is mainly because the wind power data is not so smooth as the wind speed data due to the operation intervention of wind farms, such as when the wind speed is higher than the limitation of the turbines; the wind power data will always be the highest value other than keeping increasing when the wind speed increases. If the wind speed is less than the specific speed that the turbines need, the wind power data is zero directly other than a small value like the wind speed. What is more, some wind farms may directly shut down wind turbines to protect them when the wind is extremely strong while others nearby may keep running. The character of wind power data leads to the lower prediction accuracy.
What is more, for the sake of showing the improvement of prediction results by GCN is not occasional, we also calculate the average value of prediction results of all wind stations in wind speed and power forecasting experiments. From Tables 4-6, we can see that GCN not only improves the prediction result for a specific station, it also owns the ability to enhance prediction accuracy for all wind stations although it still has a drawback when predict the wind power directly.

Comparison of the Prediction Between GCN-GRU and GRU
Figures 4, 5 are the prediction comparisons between GCN-GRU and GRU in wind speed and wind power prediction. In Figure 4, we can see that the change of wind speed is smooth although the curve fluctuates strongly. However, the interventions of wind turbines are obvious in both the NREL and NRGStream wind power, there exist many horizontal lines in the wind power data, and these interventions may lead to the bad performance of GCN in wind power predictions.

CONCLUSION
In this paper, a GCN model is proposed to improve precision of wind prediction; we evaluate the GCN-GRU model not only on 10 and 100 m height wind speed prediction, but also through wind power prediction. To show the performance of the GCN-GRU model, the final results are based on a collection of 10 experiments for each data set; the forecasting results are quite stable in different prediction steps. Although the GCN-GRU model does not achieve better performance than the GRU model in short-term NRGStream wind power prediction, the results of five different wind data sets prove GCN's ability to capture spatial influence and improve the prediction results. Finally, our study offers new findings and may add considerable value to the energy management field; our finding confirms that the GCN model has positive influence on wind prediction. What is more, the failure of GCN in short-term wind power prediction also tells us that, if we want to achieve a higher accuracy in wind power prediction, it is better to combine the wind speed prediction with the wind power data.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.