A Prediction Model of Significant Wave Height in the South China Sea Based on Attention Mechanism

Significant wave height (SWH) prediction plays an important role in marine engineering fields such as fishery, exploration, power generation, and ocean transportation. Traditional SWH prediction methods based on numerical models cannot achieve high accuracy. In addition, the current SWH prediction methods are largely limited to single-point SWH prediction, without considering regional SWH prediction. In order to explore a new SWH prediction method, this paper proposes a deep neural network model for regional SWH prediction based on the attention mechanism, namely CBA-Net. In this study, the wind and wave height of the ERA5 data set in the South China Sea from 2011 to 2018 were used as input features to train the model to evaluate the SWH prediction performance at 1 h, 12 h, and 24 h. The results show that the single use of a convolutional neural network cannot accurately predict SWH. After adding the Bi-LSTM layer and attention mechanism, the prediction of SWH is greatly improved. In the 1 h SWH prediction using CBA-Net, SARMSE, SAMAPE, SACC are 0.299, 0.136, 0.971 respectively. Compared with the CNN + Bi-LSTM method that does not use the attention mechanism, SARMSE and SAMAPE are reduced by 43.4% and 48.7%, respectively, while SACC is increased by 5%. In the 12 h SWH prediction, SARMSE, SAMAPE, and SACC of CBA-Net are 0.379, 0.177, 0.954 respectively. In the 24 h SWH prediction, SARMSE, SAMAPE, and SACC of CBA-Net are 0.500, 0.236, 0.912 respectively. Although with the increase of prediction time, the performance is slightly lower than that of 12 h, the prediction error is still maintained at a small level, which is still better than other methods.


INTRODUCTION
Wave disasters are the most common marine disasters in the world. When huge waves reach the coast, the waves will cause huge losses to people's lives and property (Hsiao et al., 2020;Gao et al., 2021). Therefore, accurate significant wave height (SWH) prediction can effectively improve the safety of marine activities and the efficiency of marine operations, reduce the occurrence of marine accidents, and is of great significance in marine engineering such as fishery, exploration, power generation, and marine transportation (Young and Ribal, 2019;Fan et al., 2020;Zhang et al., 2021).
Due to the importance and application value of SWH prediction, SWH prediction methods have been continuously developed in recent decades. Numerical and statistical models (Meńdez et al., 2008;Vanem, 2016;Wu et al., 2019;Emmanouil et al., 2020;Wu, 2021;Wu and Qiao, 2022) have been widely used in global sea state prediction. Among them, the common numerical models mainly include such as WAM (Group, 1988;Umesh and Swain, 2018;Swain et al., 2019), WAVEWATCH (Kazeminezhad and Siadatmousavi, 2017;Liu et al., 2019;Li et al., 2020), and SWAN (Akpınar et al., 2017;Liang et al., 2019;Lin et al., 2019). Both numerical model methods and statistical methods try to predict SWH by approximating mathematical relational models. However, due to the strong nonlinearity of the physical processes and mechanisms of ocean waves, especially in extreme cases (e.g., typhoons), such methods may largely fail to achieve high prediction accuracy and need to be improved (Huang and Dong, 2021). In addition, the numerical model requires expensive meteorological and oceanographic data and a large amount of calculation work, and the long-running time is an important bottleneck restricting the development of rapid and accurate SWH prediction (Zhou et al., 2021).
With the rapid development of artificial intelligence (AI), due to its advantages of fast calculation speed, low computational cost, and strong nonlinear learning ability, in recent years, the SWH prediction method based on deep learning has been highly valued by researchers. The deep learning method only needs to know which factors are related to the target physical quantity, establish an input-output prediction model, and predict the SWH for a while in the future. (Panchang and Londhe, 2006) used Artificial Neural Networks (ANN) based on existing wave data sets to predict the wave heights of six geographically separated buoy positions and found that this method has a better prediction effect in the future short-term time range. (Berbićet al., 2017) used ANN and Support Vector Machines (SVM) to predict significant wave heights between 0.5 and 5.5 h. Experiments have verified that ANN and SVM are better than numerical models in this interval. However, the above method can only be applied to forecasts in a relatively short time under normal conditions, while the forecasts under extreme conditions are not ideal. In addition, with the increase in the number of inputs and the increase in complexity, the accuracy of the ANN may drop sharply because the model cannot extract enough features (Ni and Ma, 2020).
Recently, due to the limitations of ANN in SWH, the recurrent neural network (RNN) (Zaremba et al., 2014) has gradually become a more popular SWH prediction model. (Mandal and Prabaharan, 2006) introduced an artificial neural network RNN with a rprop update algorithm and applied it to SWH prediction. (Sadeghifar et al., 2017) used RNN to predict the correlation coefficients of SWH at 3 h, 6 h, 12 h, and 24 h to be 0. 96, 0.90, 0.87, and 0.73, respectively. (Miky et al., 2021) integrated neural network-based nonlinear autoregressive network and RNN network for SWH prediction. The experimental results show that the use of RNN for SWH prediction has better results than previous ANN methods. However, the optimization algorithm faces a big problem during RNN training, that is, the problem of long-term dependence-due to the deepening of the network structure, the model loses the ability to learn previous information.
In response to the above problems, the researchers designed a variant of RNN, namely Long Short-Term Memory (LSTM) (Hochreiter and Schmidhuber, 1997). Compared with RNN, it designed a ring structure with two gated units. It can effectively solve the long-term dependence of information, avoid the disappearance or explosion of gradients, thereby significantly improving the accuracy of SWH prediction.  used LSTM neural network to establish a wave height prediction model at three stations in the Bohai Sea. The model uses sea surface wind and wave height as training samples to evaluate the prediction performance of the model and perform error analysis. It is found that for SWH in the range of 3 to 5 m, the prediction accuracy of the LSTM model is significant. (Zhang et al., 2021) proposed the Numerical Long Short-Term Memory method. This method takes the measured wave height value at the current moment and the combined wave height of the simulated nearshore wave prediction value as input, and generates the corrected numerical prediction as output. Experimental results show that this method effectively improves the SWH prediction accuracy of the Bohai Sea and Wheat Island. (Raj and Brown, 2021) developed and applied a high-precision bidirectional long-term and short-term memory (Bi-LSTM) algorithm to predict SWH, and conducted overall analysis and evaluation of wave characteristics at two coastal locations in Queensland.
However, the application of predicting SWH using AI methods is currently still mainly limited to single points, rather than regions.  First, SWH prediction models are usually a mixture of short-term and long-term dependencies. A successful SWH prediction model should capture these two dependencies to make accurate predictions. Long-term dependence considers the differences between different seasons, and short-term dependence considers wave height fluctuations caused by wind direction and wind direction changes in a short time. If these two dependencies are not considered, it is impossible to make accurate SWH predictions. Secondly, the situation of each site is different, only considering the predictive performance of a single point, without measuring the overall area, the generalization of the model is often relatively poor. Solving the limitations of existing methods in SWH prediction is the focus of this work. This paper proposes a deep learning model for SWH prediction of regional multivariate time series, that is, a convolutional bidirectional long-term time series network based on the attention mechanism. As shown in Figure 1, it uses a convolutional layer to find local dependencies between multi-dimensional input variables; uses a Bi-LSTM layer to capture complex long-term dependencies; finally, it combines the attention mechanism with the nonlinear neural network part to make the model is more robust. To better demonstrate the effectiveness of the various components of the model, we have carried out an Ablation Study on the model, specifically, we remove each component one at a time in our CBA-Net model framework.
The remainder of this paper is structured as follows. In section 2, we describe our proposed CBA-Net. In section 3, the experimental design details such as the experimental data set, metrics, and parameter settings are introduced. In section 4, we discussed and analyzed the results of SWH prediction. Finally, in section 5, we summarized our findings.

PROPOSED METHOD
In this section, we introduce the details of the various components of the proposed CBA-Net architecture.

Convolutional Neural Network Module
Traditional neural network layers are fully connected. If the number of network layers deepens, this connection method may have an astronomical number of parameters. Convolutional neural network (CNN) has fewer learning parameters than neural networks, which contributes to trainability; in addition, CNN also shows excellent performance in successfully extracting local and translation invariant features (LeCun and Bengio, 1995;Goodfellow et al., 2016).
The first layer of CBA-Net is a CNN without pooling, whose purpose is to extract the local dependencies between variables in the time dimension. This function is mainly accomplished by the filter in the convolution layer. CNN regards the filter as a scanner with specified window size, and extracts feature information by repeatedly scanning the input time series data from left to right and from top to bottom. The convolution calculation process is shown in Figure 2.
In this paper, the convolutional layer we built is composed of a filter with a depth of 48 and a width of 3. The k-th filter sweeps the input time series matrix X and produces the corresponding calculation results. The calculation formula is as follows, where * denotes the convolution operation and the output z k would be a vector, the RELU function is RELU (x) = max (0,x), W is the weight matrix, and b k is the bias.

Bi-Directional Long Short-Term Memory Module
The output of the convolutional layer is input to the Bi-LSTM module in Figure 1. Bi-LSTM is a combination of forward LSTM and backward LSTM. As shown in Figure 3, LSTM uses two gates to control the content of the cell state c: one is the forget gate, which determines how much the cell state c t-1 from the previous moment is retained to the current moment c t , the other is the input gate, which determines how much of the input z t of the network at the current moment is saved in the unit state c t .
the LSTM uses an output gate to control how much of the unit state c t is input to the current output value h t of the LSTM. This module uses the tanh function as the activation function, and the information state transfer formula of the unit at time t in LSTM is as follows, FIGURE 2 | The basic process of convolution calculation. Among them, the blue part and the red part are multiplied bit by bit to obtain a set of green local feature values. In this way, the fixed-size blue region moves from left to right, from top to bottom in turn, and then multiplies the red part bit by bit to get all calculation results.
where f t represents the processing formula of the forget gate, i t represents the processing formula of the input gate, g t tepresents the new state candidate vector, o t represents the processing formula of the output gate, w represents the given weight coefficient s,represents the sigmoid function, and · represents the element-wise product. Using the LSTM model can better capture long-term dependencies because LSTM can learn what information to remember and what information to forget through the training process, but there is a problem when only building a model with LSTM: it cannot code from back to front Information. Therefore, as shown in Figure 4, Bi-LSTM is used in this work to better capture the two-way dependency.
At this stage, for the given Bi-LSTM input Z = z 1 ,…, z T , where T is the length of the input time series, the model needs to continuously predict SWH from the input time series data, that is

Attention Module
Attention Mechanism originated from the study of human vision (Yang, 2020;Guo et al., 2021). In cognitive science, due to the bottleneck of information processing, humans will selectively focus on part of all information while ignoring other visible information. The attention mechanism has two main aspects: decide which part of the input needs to be paid attention to; allocate limited information processing resources to the important part. Models without an attention mechanism tend to lose a lot of detailed information when the input data is relatively large-scale. This is the main reason for introducing an attention mechanism in this work.
In this module, multiple dimensions are used to predict onedimensional data. To fully extract data features and improve the accuracy of predicting SWH, we use the attention mechanism to determine which dimensions play a key role in predicting the dimension.
At time t,the predicted output y t is, Among them, W 0 and b 0 are trainable parameters, s t is the hidden state of LSTM at time t, the formula is as follows, p t is a context vector, which is calculated from the weighted sum H = h 1 ,h 2 ,…, h I of in the previous stage. The formula is as follows, Among them, a ti is called attention weight, and the calculation formula is as follows, The calculation formula of t ti is as follows, t ti =v T tanh Ws t−1 +Vh i +d ð Þ (12)

EVALUATION
In this section, we will explain the specific details of the experiment. In order to better understand the experimental process, Figure 5 shows the flow chart of the overall experiment.

Dataset
ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 4 to 7 decades. We select (0°~25°N, 105°~124.75°E) as the study area. This area is dominated by wind and waves and is greatly affected by the monsoon. The time resolution of the data is hours, and the spatial resolution is 0.5°×0.5°.
For the prediction of SWH, we use the data from 2011 to 2018 to generate the corresponding training set and the last 720 hours of data in 2020 as the test set. To ensure the relative independence of training and test data sets, the test data is excluded from model training.

Metrics
To evaluate the performance of the model, we use the following three metrics, namely Spatial Average Root Mean Square Error (SARMSE), Spatial Average Mean Absolute Percent Error (SAMAPE), and Spatial Average Correlation Coefficient (SACC). The calculation formulas for the above three metrics are as follows, In the formula, m as the sum of the number of stations in the entire study area n is the total number of test samples, and y 1 and y i are the true and predicted values, respectively. Note that the lower the SARMSE and SAMAPE values, the better the consistency between the measurement and the prediction, but the higher the SACC value, the more accurate the prediction.

Experimental Details
We use Intel Gold 6330 processor and Nvidia GeForce RTX 3090 graphics card for experiments in the Ubuntu20.04 system. The methods mentioned in the experiment are all implemented by Tensorflow 2.x in the Python environment.
To prevent overfitting, we add a dropout layer after the convolutional layer and Bi-LSTM layer and set the parameter to 0.3. In addition, the model uses the Adam algorithm (Kingma and Ba, 2014) to optimize model parameters. Adam algorithm improves the quality and speed of optimization by obtaining an adaptive learning rate for each parameter.

RESULTS
We conducted multiple sets of experiments to verify the performance of the model to predict SWH and analyzed the experimental results. To prove the effectiveness of each component of our model, we conducted a careful ablation study. Table 1 lists the one-hour prediction results of the different models after training, verification, and testing. The optimal results are marked in bold in the table. It is worth noting that we sequentially add a module to perform an ablation study to verify the effectiveness of each component of our proposed model.

One-Hour SWH Prediction
It can be seen from the results that the error of using CNN alone to predict SWH is too large, and it may not be able to accurately predict SWH. However, after using Bi-LSTM based on CNN, the effect has been significantly improved, and the correlation of prediction reaches 0.925. The error is also greatly reduced, SARMSE is reduced from 1.039 to 0.528, a reduction of 49.2%. SAMAPE is also maintained at a low level, indicating that CNN only considers local dependencies to predict SWH is unreliable. After applying Bi-LSTM, long-term dependencies can be captured, which greatly improves the accuracy of SWH prediction. Based on the first two, after introducing the attention mechanism, SARMSE is reduced to 0.299, SAMAPE is reduced to 0.136, which is a reduction of 48.7%, and the error has reached a very low level. At the same time, the correlation of prediction can reach 0.971, The reliability of the model prediction SWH is greatly improved.
From the experimental results in Table 1, it can be seen that CBA-Net can maintain better prediction performance when predicting SWH for 1 h. To display the SWH prediction results more intuitively, Figure 6 show the prediction results of 1 h SWH by different methods. Because CNN only pays attention to the local dependencies, it is easy to fall into the local minimum point, which leads to the under-fitting of the prediction model. After adding the Bi-LSTM module based on CNN, the predictive ability of the model has been improved. Although the resulting error is still at a relatively large level, it has a similar change trend with the real data. After introducing the attention mechanism based on the first two, the predictive ability of the model is further improved. Only some areas have an error of about 0.4m, and the overall error is maintained at a relatively low level. Table 2 lists the twelve-hour prediction results of different algorithms after training, verification, and testing. The optimal results are marked in bold in the table.

Twelve-Hour SWH Prediction
It can be seen from the table that CNN's SWH prediction performance indicators are further reduced. It can be concluded that a single CNN model is not suitable for time series SWH prediction. As the SWH prediction period increases, the correlation between data decreases. But Bi-LSTM can fully extract the dependency between data and data through the ingenious design of bi-directional LSTM. After introducing Bi-LSTM, the degree of data error is slightly higher than the 1 h prediction result under the same conditions, indicating that Bi-LSTM The application of the algorithm is meaningful. After introducing the attention mechanism on the basis of the first two, compared with the 1 h SWH prediction results under the same conditions, the correlation decreases from 0.971 to 0.954. Due to the reduced correlation, both SARMSE and SAMAPE increased slightly, from 0.299 to 0.379 and 0.136 to 0.177 respectively; however, the error was within an acceptable range.
Compared with the prediction performance of the 1 h SWH model under the same conditions, the 12 h SWH prediction index is slightly lower. At present, the possible reason is that the forecast period is relatively large. Figure 7 show the prediction results of different methods for 12 h SWH. Although the forecast period increases, the results of CBA-Net are in good agreement with the original data, indicating that the method proposed in this paper has strong generalization ability and long-term prediction ability. In a small part of the area, the 12 h prediction results have a slightly larger error, with an error of about 0.5m, but the overall prediction is the same as the actual measured data, which shows that CBA-Net is feasible in the 12 h SWH prediction. Table 3 lists the One-day prediction results of different algorithms after training, verification, and testing. The optimal results are marked in bold in the table.

Longer-Term SWH Prediction
As the complexity of marine engineering increases, so does the demand for long-term SWH forecasts. It can be seen that, always been better than other methods. Although the increase in the prediction time interval will reduce the correlation coefficient and the accuracy of the prediction, this drawback can be well alleviated by adding a Bi-LSTM layer and an attention mechanism. Figure 8 show the prediction results of different methods for 24 h SWH. The result confirmed our judgment once again, that a single CNN is not suitable for SWH prediction in time series. CBA-Net's 24 h SWH prediction has large errors in only a small part of the area, but it is still an acceptable level, and the overall prediction effect is better. It also shows that CBA-Net's 24 h SWH prediction is feasible.  The model is trained to predict SWH with U10, V10, and SWH of the ERA5 dataset as input. The model first uses the convolutional layer to find the local dependencies between the multi-dimensional input variables; then uses the Bi-LSTM layer to capture the complex long-term dependencies; finally, the attention mechanism is combined with the nonlinear neural network part to make the model have stronger robustness. In order to prove the effectiveness of the method proposed in this paper, we used three different methods to predict the SWH in the South China Sea. We use the 2011-2018 ERA5 data set to train the model, and use the three indicators of SARMSE,  SAMAPE, and SACC to evaluate the accuracy and stability of the prediction results. The results show that the CBA-Net method can obtain more accurate results in the predictions of 1 h, 12 h, and 24 h. The ablation study also shows that each component of the method proposed in this paper is effective. It can be seen that the SWH prediction technology based on CBA-Net can make full use of the important information of sea wind and significant wave height, establish a prediction model, and realize business applications. For future research, there are several promising directions to extend this work. Due to the complexity of the actual marine environment, it is a challenging task to extend the CBA-Net method to all global domains. In addition, the number of input features directly determines the prediction results, such as wind speed, water depth, terrain, etc., which need to be considered and added to the training of the model. This general deep learning SWH prediction model deserves more attention in future work.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://cds.climate.copernicus.eu/.