Short-Term Daily Prediction of Sea Ice Concentration Based on Deep Learning of Gradient Loss Function

The navigability potential of the Northeast Passage has gradually emerged with the melting of Arctic sea ice. For the purpose of navigation safety in the Arctic area, a reliable daily sea ice concentration (SIC) prediction result is required. As the mature application of deep learning technique in short-term prediction of other fields (atmosphere, ocean, and hurricane, etc.), a new model was proposed for daily SIC prediction by selecting multiple factors, adopting gradient loss function (Grad-loss) and incorporating an improved predictive recurrent neural network (PredRNN++). Three control experiments are designed to test the impact of these three improvements for model performance with multiple indicators. Results show that the proposed model has best prediction skill in our experiments by taking physical process and local SIC variation into consideration, which can continuously predict daily SIC for up to 9 days.


INTRODUCTION
Arctic sea ice has gradually melted in recent decades due to global climate change (Guemas et al., 2016). On the one hand, the melting of sea ice affects the global climate through decreasing the albedo of the sea surface, which has absorbed more radiant heat, and leads to further melting of sea ice (Screen and Simmonds, 2010;Francis and Vavrus, 2015). On the other hand, the melting of Arctic sea ice has also presented important influences and opportunities to the global transportation industry. The summer thaw in the Northeast Passage makes the navigation possible (Stroeve et al., 2012) and has significantly impacted the global transportation industry. Compared with the traditional passage, the Northeast Passage reduces the geographical distance by 40%, greatly reducing time and transportation costs (Chen et al., 2020;Tseng et al., 2021).
However, the opening of the Northeast Passage must firstly ensure the navigation safety with a complete maritime navigation system in the Arctic. The greatest difficulty of Arctic navigation compared to that of other seas lies in sea ice prediction, which remains very difficult due to observational data limitations and complicated sea ice influencing factors. Sea ice concentration (SIC) has the greatest impact on navigation of the many characteristics of sea ice (Similä and Lensu, 2018). Therefore, this study focusses on SIC predictions.
Current sea ice prediction primarily relies on two types of methods: model simulation and statistical prediction. Model simulation is based on factors that affect sea ice changes and characterize the effects of each factor based on known physical laws. Current mainstream models [e.g., the Los Alamos sea ice model (CICE) and the Louvain-la-Neuve sea ice model (LIM)] predict sea ice based on temperature, salinity, melt ponds, and sea ice ridging and rafting (Vancoppenolle et al., 2009;Hunke et al., 2013). Model simulation provides assimilation results with a stable error and strong interpretability using the influence of each factor on sea ice change. However, there is room of improvement such as the redistribution mechanism of small-scale sea ice from thinner to thicker ice is not well-described (Hunke et al., 2011). Additionally, sea ice models should be better coupled with atmospheric and oceanic models. Furthermore, the numerous factors affecting sea ice change and complex influencing rules make it difficult to simulate with unclear influencing rules. For example, current models do not adequately describe the influence of cloud cover, humidity transport, and temperature changeswhich have important effects on local sea ice-due to the lack of quantitative influencing rules (Cox et al., 2016;Lee et al., 2017;Mudryk et al., 2018).
Statistical prediction is different from model simulation. It uses statistical methods to explore data laws and to fit the relationship between the independent and dependent variables. Thus, statistical methods do not require the input of explicit influencing rules and that is why statistical prediction may compensate for model simulation deficiencies in the case of unknown change laws. Traditional statistical methods, such as the vector autoregressive model and the vector Markov model, can perform multi-factor weekly SIC prediction based on multiple sea ice, ocean, and atmosphere factors (Wang et al., 2019). However, traditional methods are limited by the models themselves. Additionally, the spatial resolution of prediction is not high and only described a relatively simple change law. Accurate daily predictions of high-frequency changes are difficult to obtain using traditional methods. Therefore, deep learning techniques have been increasingly favored due to its strong capability to portray more complex non-linear laws. Deep learning methods currently applied to SIC prediction primarily include long short-term memory networks (LSTM) (Chi and Kim, 2017), convolutional neural networks (CNNs) (Wang et al., 2017;Jun Kim et al., 2020), and convolutional long short-term memory network (ConvLSTM) (Liu et al., 2021). A single-factor monthly prediction model of SIC was established based on LSTM (Chi and Kim, 2017). The model has the time series processing capability of LSTM and makes improved predictions based on SIC data at multiple historical moments. However, LSTM only performs grid-bygrid calculations and cannot simultaneously process spatial correlation and time dimension information. In contrast, the SIC multi-factor monthly prediction model based on CNNs can handle spatial correlation information (Jun Kim et al., 2020). However, CNNs cannot process time dimension information based on previous-moment information, which has led to "nearsightedness." Therefore, in order to simultaneously process spatio-temporal information, ConvLSTM was applied to daily SIC prediction (Liu et al., 2021). The study overcame the shortcomings of LSTM and CNNs by establishing a single-factor daily prediction ConvLSTM model. However, the model is only based on the SIC change law and do not consider other factors affecting sea ice change. Moreover, the model can only predict SIC for 1 day. Longer predictions can only be performed through iterative prediction.
Further, in the above-mentioned deep learning models, the mean square error loss (MSE-loss) or mean absolute error loss (MAE-loss) functions are used during model training and only consider the size of the error. However, the change trend within the SIC space is also essential for SIC prediction.
Therefore, the purposes of this study are as follows: to introduce multiple factors affecting SIC changes to perform SIC multi-factor prediction, to introduce improved predictive recurrent neural network (PredRNN++) to achieve daily prediction for multiple consecutive days, and to propose a new gradient loss function (Grad-loss) that introduces the influence of the local SIC change trend during model training to further improve model prediction.
The remainder of this paper is structured as follows: section "Data" introduces data sources, factor screening, and data pre-processing; section "Methods" introduces the methods and design of the comparative experiments; section "Results and Discussion" introduces and discusses the results of the comparative experiments; section "Conclusion" summarizes the study.

DATA ERA5 Reanalysis
ERA5 is the fifth generation European Center for Medium-Range Weather Forecasts (ECMWF) reanalysis for global climate and weather for the past four to seven decades (Mahmoodi et al., 2019;Hersbach et al., 2020;Muhammed Naseef and Sanil Kumar, 2020). Data are currently available from 1950 and split into stored climate data entries from 1950-1978 (preliminary back extension) and from 1979 onward (final release plus timely updates) (Hersbach et al., 2018).
Compared with ERA-interim, ERA5 reanalysis adds more satellite observations during the assimilation process. This improves the resilience of the system to discontinuities resulting from outages of any single instrument. At the same time, it can also reduce analysis errors through the effect of averaging independent errors in the observations. The data observed by these satellites is bias corrected using VarBC (Auligné et al., 2007). The bias correction model for channels assimilated from these sensors employ a constant term, a scan-angle-dependent correction (based on a third-order polynomial in scan-angle) and four airmass predictors (Hersbach et al., 2020). After the bias correction, the satellite observations are assimilated into models to obtain the variables in ERA5 reanalysis. For example, measurements from the microwave sounders MSU, AMSU-A, and ATMS are assimilated as brightness temperatures, providing information on temperature throughout the troposphere and stratosphere (Bormann et al., 2013).
Based on satellite observations assimilated into models, ERA5 reanalysis can provide researchers with the most precise hourly reanalysis data and contains a variety of meteorological and marine elements. ERA5 bias has been assessed for specific variables and diverse spatio-temporal domains. The accuracy of each element in the ERA5 analysis is generally better than that of ERA-interim. For example, the accuracies of upper-air temperature, surface pressure, and 10-m zonal wind are increased by approximately 0.4 K, 0.2hPa, and 0.5 m/s, respectively (Hersbach et al., 2020).
The temporal coverage of this dataset is from 1979 to present, and the spatial coverage is global. Data has been regridded to a regular lat-lon grid of 0.25 degrees and provided hourly for the reanalysis. The data used in this study can be downloaded from the website [ERA5 hourly data on single levels from 1979 to present (copernicus.eu)].

Factor Screening and Data Preprocessing
The selection of predictors is no doubt a fundamental chain of deep learning. Sea level pressure, temperature and wind speed are typical factors considered in the previous works of sea ice concentration (SIC) prediction (Vancoppenolle et al., 2009;Hunke et al., 2013;Ballinger and Sheridan, 2015). In addition, based on the dynamic and thermal processes of sea ice, existing studies have shown that factors such as cloud cover, radiation, and vertical flux also have an impact on the changes of sea ice (Gu et al., 2018;Perovich et al., 2007;Comiso et al., 2017). Therefore, various data are extracted from ERA5 for SIC prediction, including sea surface temperature (SST), vertical integral of heat flux (VHF), vertical integral of divergence of moisture flux (VMF), mean sea level pressure (MSL), 10-m wind component (WIND), 2-m temperature (T2M), albedo for direct radiation (AL), low cloud cover (LCC), skin temperature (SKT), and SIC.
Factor screening is performed on all 10 factors to reduce the amount of model calculation and data dimensions. Five meteorological and oceanic factors (i.e., SST, MSL, T2M, SKT, and SIC) are selected by calculating the time lag correlation coefficients. The time-lagged correlation is the calculation of the correlation between the value of each factor at time t-1 and the SIC at time t. Figure 1 shows the time-lag correlation coefficient.
As shown in Figure 1, the stronger the intensity of the warmer color, the greater the positive time lag correlation between this factor and SIC. The stronger the intensity of the cooler color, the greater the negative time lag correlation. Therefore, according to the absolute value of the time lag correlation, we nondimensionalize the data of the five above-mentioned factors and normalize each one to between 0 and 100 to train the model. Due to missing land values in SST and SIC, nan values should not exist in the model training. Therefore, the nan values of land in the SST are set to -100 to clearly distinguish from sea surface temperature. The nan values of the land in SIC are set to 0 due to the absence of sea ice.
Data from January 2010 to December 2019 are used as the training set for the SIC prediction model, and data from January to December 2020 are used as test values of the model prediction effect. The longitude and latitude range from 120 • to 180 • E and 66.75 • to 83 • N, respectively.

METHODS
This study was based on Grad-loss and applied ConvLSTM and PredRNN++ to 10 consecutive days of short-term daily SIC prediction. In addition to comparing the ConvLSTM and PredRNN++ models, this study also compared the prediction effects of different loss functions.

Convolutional Long Short-Term Memory Network
Convolutional long short-term memory network is a combination of CNNs and LSTM. CNNs are neural networks that include convolution calculations and are suitable for processing two-dimensional spatial field information. CNNs overcome the limitations of point-by-grid calculations and are mostly used in image recognition and classification fields (LeCun et al., 1998;Ren et al., 2017;Yoo et al., 2019). LSTM is a time-cycle neural network based on recurrent neural networks (RNNs) that processes time-series information. LSTM solves the long-term dependency problem of RNNs and has recently been used in speech recognition and text translation (Sundermeyer et al., 2015;Wang et al., 2020). ConvLSTM adds the convolution operation of CNNs based on LSTM and processes temporal and spatial field information (Shi et al., 2015). Therefore, ConvLSTM is more suitable for applications for image processing (Hu et al., 2020).

Improved Predictive Recurrent Neural Network
Predictive recurrent neural network is a deep learning network based on ConvLSTM. Principal improvements include changing the ConvLSTM neural unit to causal LSTM and proposing a new neural unit (Gradient Highway Unit, GHU). Causal LSTM calculates spatio-temporal information separately to better capture short-term dynamic changes. The GHU connects neural units at different moments and directly transmits characteristic historical information to subsequent moments. Thus, retaining the gradient during backpropagation is easier and partially solves the problem of gradient disappearance (Wang et al., 2018). PredRNN++ retains the advantages of ConvLSTM by processing time series and performing convolution operations on spatial fields. In addition, PredRNN++ is more suitable than ConvLSTM for predictions that rely on long-term historical sequences (Bonnet et al., 2020).

Gradient Loss Function
Most loss functions currently used in deep learning methods are mean square error loss function (MSE-loss) or mean absolute error loss function (MAE-loss). The calculation formulas of MSEloss and MAE-loss are as follows: Mean absolute error loss exhibits better robustness than MSE in the face of outliers, because MSE squares the error and gives outliers more weight. However, both MSE and MAE compare the real value with the predicted value grid by grid. Therefore, the MSE and MAE loss functions can only compare the difference between the real SIC and the predicted SIC at the same grid point in the network training process.
In fact, different oceanic regions respond differently to environmental factors. Thus, local sea ice changes also have an important impact on the prediction effect. Therefore, this study adopts a gradient based on MAE to reflect local sea ice changes, and proposes Grad-loss. We have compared the prediction effect when using MAE-loss and Grad-loss to explore the improvement of Grad-loss. The calculation formula for the Grad-loss is as follows: where, Grad lat and Grad lon represent the gradients in the latitude and longitude directions, respectively; and std represents the standard deviation. To ensure that the dimensions of MAE and Grad are the same, the std of the SIC and the std of the gradient are removed from the formula.

Research Flow
This experiment builds ConvLSTM and PredRNN++ based on the TensorFlow deep learning environment (Abadi et al., 1983) and applies them to short-term SIC prediction.
Existing research on the sensitivity of network hyperparameters shows that the prediction effect is better when the number of network layers is 3 (Wang et al., 2017). The layer of 3 strikes a balance between the modeling capability and data amount. A shallower model would lead to inadequate fitting and a deeper model cannot be effectively feeded. In the training phase, the training effect is better when the learning rate is set to 0.001 (Jun Kim et al., 2020). In this way, the loss can be reduced quickly, and the possibility of falling into a local optimum can be reduced. The kernel size of layer 1-3 is set 5-3, respectively. That is, the faster changes are extracted first, and then the subtle changes are learned (Jun Kim et al., 2020). The setting of filters, in a 3-layer network, usually decreases from 128 to 64. That is, the model extracts more changes in the front layer, and then gradually extracts the law related to sea ice change in the latter layer (Liu et al., 2021).
Consequently, three-layer networks are built for ConvLSTM and PredRNN++. The filters in each layer of the network were 128, 128, and 64, with kernel sizes of 5, 3, and 3, respectively. The dimension of input data was 4D (time series × height × width × factors). The batch size of input data was 32. The activation function was a rectified linear unit (ReLU). The optimizer used Adam, which combined the advantages of the AdaGrad and RMSProp optimization algorithms, had high computational efficiency, and required only a small amount of memory (Kingma and Ba, 2015). Table 1 presented the specific hyper-parameters.
Besides two different networks, two types of input data, and two types of loss functions are considered. The input data can be the SIC itself (namely the self-regression model), or the multiple factors listed in section "Factor Screening and Data Preprocessing." These two cases are called single-factor and multi-factor hereinafter. In terms of loss function, a novel loss based on horizontal gradient (Grad-loss) is proposed, in order to improve the conventional MAE-loss. Eight models are thus constructed to make further comparison, as shown in Table 2.
The length of input sequence and output sequence can be an important factor of deep learning. The performance is usually better with the comparable length of input sequence and output sequence, but could be degraded with the former becoming longer (Wang et al., 2018;Bonnet et al., 2020;Liu et al., 2021).
The performance of each model is evaluated by the 10-day running test of 10-day lagged prediction in 2020. To be more specific, the prediction of January 11-20 is made by the data of January 1-10, succeeded by the prediction of January 12-21 from January 2-11, until the prediction of December 22-31. In this case, 347 seq-to-seq predictions are made in the year 2020. Figure 2 illustrated the analogy process.
Based on a variety of properties of the human visual system, the spatial structure similarity (SSIM) considers the brightness, contrast, and structure information of the image. In deep learning, it is usually used to assess the quality of image prediction (Wang et al., 2004). Hence we have evaluated the SSIM, MAE, and the difference (DIFF) between the predicted value and the test value to compare the prediction effects of each model. The SSIM compares the spatial structure difference between the predicted value and the test value across the entire oceanic region; MAE compares the error of the two values in the overall oceanic region; the DIFF intuitively displays the difference of the error and the spatial distribution between the predicted and test values.
The DIFF and SSIM calculation methods are as follows: SSIM(y,ŷ) = 2u y uŷ+c 1 2σ yŷ +c 2 u 2 y +u 2 y +c 1 σ 2 y +σ 2 y +c 2 (5) where, u represents the average value, and σ represents the variance or covariance. c 1 and c 2 are constants, c 1 = (0.01 * L) 2 , and c 2 = (0.03 * L) 2 . L is set to 100 in this study to calculate SIC. Figure 3 shows the oceanic region and control experiments designed in this study. Factor screening and pre-processing are adopted to generate the input datasets. Eight models are adopted for the training period (see Table 2). The test sets are put into the eight models to obtain their predictions. Three control experiments are designed to test the performance of the proposed multi-factor short-term daily SIC prediction model with three improvements, i.e., adopting multi-factors, gradient loss function and the advanced deep learning algorithm. Firstly, the impact of adopting multi-factors is tested by comparing the performance of single-factor and multi-factor both in ConvLSTM model and PredRNN++ model. Secondly, the impact of adopting gradient loss function is tested. Finally, the performance of adopting

Contrast 1:
The impact of adopting multifactors for model.

Contrast 2:
The impact of adopting Grad-loss for model.

Contrast 3:
The impact of adopting PredRNN++ for model.

RESULTS AND DISCUSSION Predictability
Although the above-mentioned models are used for 10-day SIC prediction during training, their effectiveness is unknown. Therefore, to test the predictability, we have compared the average MAE predicted by each model for 10 days with the monthly average MAE. Because the Northeast Passage is primarily navigable in autumn (September to November), we consider October as a comparison example. The monthly average SIC in October is subtracted from the daily SIC, and the monthly average MAE is obtained using formula 2. Figure 4 shows that among the eight models, the model with the best prediction effect can reach 9 days of validity. That is to say, the prediction error on the 9th day is smaller than the monthly average change. Therefore, the follow-up comparison experiment between the SSIM and MAE of each model is conducted based on the longest predictable time (9 days).

Single Factor or Multiple Factors
This experiment uses the five factors selected in section "Factor Screening and Data Preprocessing" to verify whether the effect of multi-factor prediction is better than that of single-factor prediction. As shown in Figure 2, each model makes 347 10day running predictions. Considering the validity cannot surpass 9 days (Figure 4), the SSIM and MAE of 347 consecutive 9-day predictions of each model are averaged over 9 days to compare the effects of predictions. Figures 5, 6 show the results.
In Figure 5, the positive value of DIFF indicates the better performance of multi-factor model. The performance of the multi-factor model in summer is sometimes less effective. But more often than not, multi-factor model has better spatial structure similarity. In Figure 6, the negative value of DIFF indicates the better performance of multi-factor model. In most cases, the error of the multi-factor model is smaller. In spring and summer, although there are fluctuations, the improvement effect is more obvious. Therefore, in general, the prediction effect of multi-factor model is better than that of single-factor model. The multi-factor improvement effect is more obvious on PredRNN++.
The advantage of the multi-factor prediction model over a single factor is primarily reflected in the season when the sea ice changes greatly. In summer, melt ponds change the albedo of the sea surface and decrease the accuracy of summer SIC observations (Cavalieri et al., 1990;Perovich et al., 2007;Mäkynen et al., 2014). Therefore, the multi-factor model is more accurate through considering the factors affecting sea ice to weaken the influence of melt ponds (Liu et al., 2021).

Mean Absolute Error-Loss or Gradient Loss Function
To test whether using Grad-loss can improve model prediction, we have divided the ConvLSTM and PredRNN++ multi-factor prediction model into two types: MAE-loss and Grad-loss. The SSIM and MAE of the prediction results are averaged for nine consecutive days. Figures 7, 8 show the results. Figures 7, 8 show that Grad-loss further improves the accuracy of the SIC prediction model based on multiple factors for ConvLSTM and PredRNN++. Grad-loss adds the local change of SIC to the loss function by considering the gradient between adjacent grid points. Thus, the Grad-loss-based SIC prediction model considering local SIC changes produces more accurate predictions than the MAE-loss-based model. In addition, Grad-loss significantly improves year-round prediction accuracy (rather than only in the summer).

Convolutional Long Short-Term Memory Network or Improved Predictive Recurrent Neural Network
According to model differences, this experiment has divided into two categories to test the applicability of the ConvLSTM and   PredRNN++ for short-term SIC prediction. In addition to taking the 9-day average of the SSIM and MAE, this study also averages 347 prediction results for nine consecutive days. The average effects of 347 predictions and consecutive 9-day predictions are compared. Figures 9, 10 compare the model prediction results. Figures 9, 10 show that the PredRNN++ multi-factor prediction model based on Grad-loss has a relatively high SSIM and low MAE. The advantage of PredRNN++ over ConvLSTM for the 347 predictions is primarily reflected in seasons with significant sea ice changes. PredRNN++ produces the best 9-day prediction results.
In addition to comparing the SSIM and MAE of the entire oceanic region, this study also has compared the size and distribution of the DIFF of the two types of models in the  spatial field. Because Arctic sea ice changes greatly in summer (June to August), this study has calculated the DIFF between the predicted and test values in July 2020. Owing to the different predictability of different models, this study only compares the first-day prediction results. Figure 11 shows the 15 July prediction results. Figure 11 shows that the DIFF of the monthly average result is the largest, indicating significant SIC variability in the   current month. Overall, the DIFF of the PredRNN++ multifactor prediction model based on Grad-loss is the smallest. In addition, the DIFF spatial distribution of the PredRNN++ model is relatively uniform, with no large-scale oceanic region deviations. Except for large deviations near islands, deviations in most oceanic regions are below 10%. By comparison, ConvLSTM has large deviations oceanic regions and a relatively large overall deviation.
The SSIM, MAE, and DIFF indicators show that the PredRNN++ multi-factor prediction model based on Grad-loss has the best prediction effect. The prediction effect of the Gradloss-based ConvLSTM multi-factor model is similar to that of PredRNN++ in winter but inferior during periods of significant SIC changes (summer and autumn). Summer and autumn are the navigable seasons of the Northeast Passage, and the East Siberian Sea is an essential sea of the Northeast Passage (Kunkel, 2004). Thus, the requirements for sea ice prediction are higher and the PredRNN++ multi-factor prediction model based on Grad-loss is the best choice. Models trained by PredRNN++ had better prediction results than those of ConvLSTM primarily due to the improved Causal LSTM in PredRNN++. The Causal LSTM could separately calculate temporal and spatial change of the unit by adding more non-linear calculations under the same network level condition. Thus, it could better reflect SIC variation in summer and autumn.
However, more non-linear calculations in PredRNN++ might not significantly improve the results in winter as the little SIC change.
The PredRNN++ multi-factor prediction model based on Grad-loss had the highest prediction skill but still had some limitations. First, the SST and SIC land missing values were set to -100 and 0, respectively, causing the training data to deviate from the true physical meaning. In the future, an attention mechanism could possibly be introduced to concentrate the weight of the convolution calculation on the sea ice region and to weaken the influence of the missing values. Second, PredRNN++ made longer predictions and achieved higher accuracy but sharply increased network complexity and computational overhead. The computational cost of PredRNN++ was more than twice that of ConvLSTM. Follow-up work might be able to simplify the PredRNN++ network structure to reduce the computational cost while ensuring the SIC prediction effect.
In addition, the predictability of the model for Arctic navigation had an important influence on trajectory planning results. Longer model predictions could produce more extensive area predictions with the trajectory planning algorithm and reduce the likelihood of falling into a local optimum (Koenig and Likhachev, 2005). The MAE of continuous 9-day prediction showed that the Grad-loss-based PredRNN++ multi-factor prediction model had the slowest error increase and the smallest error range among the models. Therefore, the PredRNN++ multi-factor prediction model based on Grad-loss was more suitable for trajectory planning in Arctic waters.

CONCLUSION
Model numerical simulation is based on known laws, which has a time-consuming calculation. This study has proposed a new predict model with adopting multiple factors, gradient loss function and an advanced deep learning algorithm called PredRNN++ that can continuously predict daily SIC for up to 9 days. Factor screening is adopted to select 5 of 10 factors that affected SIC changes for prediction. Three control experiments are designed to test the impact of these three improvements for model performance.
Additionally, the SSIM, MAE, and DIFF indicators are used to evaluate the prediction effect of each model, and the following conclusions are drawn: 1. The physical interpretation of the prediction models and the short-term SIC prediction effect are improved by introducing multiple factors related to SIC changes. 2. Grad-loss function further improves the accuracy of daily SIC prediction of the multi-factor model by adding extra information of local SIC variation.

The PredRNN++ multi-factor prediction model based on
Grad-loss function has the best spatial structure similarity, the lowest error, and the best predictability. Additionally, the PredRNN++ model has the smallest deviation between the predicted value and the test value and contains no large-scale oceanic region deviations.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.