Using a deep-learning approach to infer and forecast the Indonesian Throughflow transport from sea surface height

Xin, Linchao; Hu, Shijian; Wang, Fan; Xie, Wenhong; Hu, Dunxin; Dong, Changming

doi:10.3389/fmars.2023.1079286

ORIGINAL RESEARCH article

Front. Mar. Sci., 26 January 2023

Sec. Physical Oceanography

Volume 10 - 2023 | https://doi.org/10.3389/fmars.2023.1079286

This article is part of the Research TopicMulti-Scale Ocean Dynamical Processes and Their Climatic, Ecological and Sedimentological Effects in the Eastern Indian OceanView all 19 articles

Using a deep-learning approach to infer and forecast the Indonesian Throughflow transport from sea surface height

Linchao Xin^1,2,3

Shijian Hu^1,2,3*

Fan Wang^1,2,3

Wenhong Xie⁴

Dunxin Hu^1,2,3

Changming Dong⁴

¹CAS Key Laboratory of Ocean Circulation and Waves, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, China
²Pilot National Laboratory for Marine Science and Technology (Qingdao), Qingdao, China
³College of Marine Science, University of Chinese Academy of Sciences, Qingdao, China
⁴School of Marine Sciences, Nanjing University of Information Science and Technology, Nanjing, China

The Indonesian Throughflow (ITF) connects the tropical Pacific and Indian Oceans and is critical to the regional and global climate systems. Previous research indicates that the Indo-Pacific pressure gradient is a major driver of the ITF, implying the possibility of forecasting ITF transport by the sea surface height (SSH) of the Indo-Pacific Ocean. Here we used a deep-learning approach with the convolutional neural network (CNN) model to reproduce ITF transport. The CNN model was trained with a random selection of the Coupled Model Intercomparison Project Phase 6 (CMIP6) simulations and verified with residual components of the CMIP6 simulations. A test of the training results showed that the CNN model with SSH is able to reproduce approximately 90% of the total variance of ITF transport. The CNN model with CMIP6 was then transformed to the Simple Ocean Data Assimilation (SODA) dataset and this transformed model reproduced approximately 80% of the total variance of ITF transport in the SODA. A time series of ITF transport, verified by Monitoring the ITF (MITF) and International Nusantara Stratification and Transport (INSTANT) measurements of ITF, was then produced by the model using satellite observations from 1993 to 2021. We discovered that the CNN model can make a valid prediction with a lead time of 7 months, implying that the ITF transport can be predicted using the deep-learning approach with SSH data.

1 Introduction

The Indonesian seas have active exchanges with neighboring oceans through multiple channels and the Indonesian Throughflow (ITF) passing by these channels (Wyrtki, 1961; Gordon, 1986; Sprintall et al., 2019). As the unique oceanic passage between tropics, the ITF links the Pacific low-latitude western boundary current and the Indian Ocean circulation system and hence plays an important role in the Indo-Pacific Ocean circulation system (Wyrtki, 1987; Hu et al., 2015; Sprintall et al., 2019; Phillips et al., 2021). Under the context of global warming, ocean circulations, including the ITF, are expected to change significantly (e.g., Sen Gupta et al., 2016; Hu et al., 2020; Ma et al., 2020; Hu et al., 2021; Santoso et al., 2022; Shilimkar et al., 2022). Changes in the ITF may cause fluctuations in the Indo-Pacific exchange rate and have an impact on regional and global climates (Gordon, 1986; Sprintall et al., 2014; Feng et al., 2015; Lee et al., 2015; Liu et al., 2016; Hu and Sprintall, 2017; Feng et al., 2018; Li et al., 2018; Hu et al., 2019).

The Indonesian seas have complex topographies and ocean dynamic processes (e.g., Wijffels and Meyers, 2004; Gordon, 2005; Hu and Sprintall, 2016; Wei et al., 2019; Sun and Thompson, 2020; Xu et al., 2021). Several notable observational experiments have been conducted in this region, such as the Indonesian-US Arlindo program (Gordon et al., 1999), the International Nusantara Stratification and Transport (INSTANT) program (Sprintall et al., 2004; Sprintall et al., 2009; van Aken et al., 2009; Gordon et al., 2010), Monitoring the ITF (MITF; Susanto et al., 2012; Gordon et al., 2019), and the expendable bathythermograph (XBT) deployments along the IX1 section (Meyers et al., 1995), as well as the Northwestern Pacific Ocean Circulation and Climate Experiment (NPOCE; Hu et al., 2011). These observations are crucial for recovering the characteristics and underlying dynamics of the ITF. However, the lack of long-term and continuous ITF time series makes it difficult to gain a deeper understanding (Sprintall et al., 2019).

Previous studies suggested finding a proxy of ITF transport in addition to direct observations (e.g., Sprintall and Révelard, 2014; Susanto and Song, 2015; Hu and Sprintall, 2016). Wyrtki (1961) proposed that the large-scale pressure gradient between the Pacific Ocean and the Indian Ocean is the driver of the ITF on an annual to longer time scale, and the wind field changes in the Pacific and Indian oceans affect ITF transport. Susanto et al. (2007) suggested an ITF proxy using SSH anomalies from T/P altimeters and thermocline depth anomalies along the Lombok Strait. Using numerical simulations, Shinoda et al. (2012) found that sea-level differences between the eastern Indian Ocean and the western Pacific were highly correlated with ITF transport. Sprintall and Révelard (2014) used remotely sensed altimeter data to develop proxy time series of ITF transport, focusing on the three outflow passages of Lombok, Ombai, and Timor. Susanto and Song (2015) developed an ITF transport proxy from satellite altimetry and gravimetry ocean bottom pressure (OBP) data, which they validated with measurements in the Makassar Strait. Hu and Sprintall (2016) proposed an ITF transport proxy on the basis of steric height from hydrologic data to separate the salinity effect on ITF transport.

Previous research indicates a close connection between ITF transport and SSH in the Indo-Pacific Ocean. Nevertheless, the proxy of ITF derived from SSH using conventional methods, such as linear regression, is typically based on the difference of SSH between two regions within the Indo-Pacific Ocean and hence ignores SSH signals of certain regions, resulting in significant inconsistency between the proxy and observations and a lack of ability to predict ITF transport. By contrast, approaches based on machine learning may be more promising for developing a better ITF proxy and predicting ITF variability. Li et al. (2018) used a backpropagation (BP) neural network to create a multidecadal time series of 0–300 m Makassar Throughflow.

Deep learning is a more powerful tool for extracting critical information from large amounts of image data than machine learning, such as the simple BP neural network. Deep learning is capable of optimizing a non-linear function from a large amount of trainable data. Theoretically, deep neural networks can approximate non-linear mappings of any complexity (Cybenko, 1989; Hornik, 1991), and deep learning has been widely used in oceanography, e.g., automatic detection and prediction of mesoscale eddies (Zeng et al., 2015; Xu et al., 2019), prediction of El Niño–Southern Oscillation, studies of climate model parameter sensitivity, and parameterization of unresolved atmospheric processes (Ham et al., 2019; Esteves et al., 2019; Anderson and Lucas, 2018). Bolton and Zanna (2019) demonstrated the powerful potential of deep learning for estimating ocean currents using satellite observations. Deep learning was shown to accurately predict subsurface ocean currents by George and Manucharyan (2021) using synthetic data generated from a simplified ocean turbulence model.

The goal of this study is to create a proxy-ITF transport using deep learning and SSH data. The remainder of the paper is organized as follows: the Data and Methods section introduces the convolutional neural network (CNN) processing methods and architecture diagram, and the Results section investigates the performance of the CNN model with different data in estimating the transport of ITF through SSH. The final section contains a summary and a discussion.

2 Data and methods

2.1 Data

The data we used for training came from 36 climate models that took part in the Coupled Model Intercomparison Project Phase 6 (CMIP6; see details in Tables 1 and 2). These models have been used to simulate the historical climate since 1850, and they are driven by a variety of observational and time-varying external forces.

TABLE 1

Table 1 The specifics of the data used for this study.

TABLE 2

Table 2 Details of the CMIP6 models used in this study.

The data for transfer learning and testing are a reanalysis dataset from the University of Maryland’s Simple Ocean Data Assimilation (SODA 2.2.4; details in Table 1). The SODA dataset assimilates ocean station data, mooring temperature and salinity time series, various types of surface temperature and salinity observations, and nighttime infrared satellite SST data. The physical output quantity is mapped to a uniform 0.5°×0.5°×40 grid in the form of a monthly average.

The observational data for comparison and verification of the model results come from the INSTANT program, the MITF program, Ocean Surface Current Analysis Real-time (OSCAR, Lagerloef et al., 2002), and the Archiving, Validation, and Interpretation of Satellite Oceanographic data (AVISO; see details in Table 1). The INSTANT moorings were deployed simultaneously to measure the ITF from the Pacific inflow at Makassar Strait and Lifamatola Passage to the Indian Ocean export channels of Timor, Ombai, and Lombok, from 2004 to 2006 (Sprintall et al., 2004). The mooring array was designed to measure the ITF’s velocity, temperature, and salinity profiles. In this study, we used the sum of volume transports at three outflow straits, i.e., the Lombok strait, the Timor Passage, and the Ombai Strait, as the ITF transport.

The MITF moorings were deployed simultaneously to measure the ITF from the Pacific inflow at Makassar Strait; the mooring data in the Makassar Strait spans more than 13 years (Susanto et al., 2012; Gordon et al., 2019). The XBT survey along the IX1 section between Fremantle, Western Australia and the Sunda Strait, Indonesia has been operating for more than 30 years. The time series of geostrophic transport of ITF can be obtained through the IX1 temperature data (Liu et al., 2015). The OSCAR is an experimental processing system and data center that provides surface velocity fields in the tropical Pacific Ocean. Surface currents from the OSCAR were calculated from satellite altimeters and vector wind data using methods developed during the TOPEX/Poseidon altimeter research mission (Bonjean and Lagerloef, 2002). The sea surface height data are a multisource altimeter sea surface height fusion product provided by AVISO, with a spatial resolution of 0.25°×0.25° and a temporal resolution of 1 month. The data were primarily fused with satellite data from several altimeters, including TOPEX/POSEIDON, Jason. 1, and ERS/Envisat (AVISO, 2020).

To facilitate deep learning, the SSH data from SODA and CMIP6 were linearly interpolated. Given that the ITF is controlled by a large-scale gradient over the Indo-Pacific Ocean, we used SSH from a broad region 30°E–286°E and 44°S–44°N (Figure 1). In SODA and CMIP6 models, ITF transport was defined as the volume of transport across the section at 113.5°E (8.5°S–22.5°S). The CMIP6 data was divided into three sets: train set, verification set, and test set. Figure 2 presents a schematic diagram of the training and operation of the CNN model with the above datasets.

FIGURE 1

Figure 1 Annual mean sea surface currents (vectors, OSCAR) and sea surface height (color, AVISO) in 2018.

FIGURE 2

Figure 2 A schematic diagram displaying the training and operation of a CNN model with various datasets.

The train set contains model data from 1850 to 1974, the verification set contains data from 1974 to 1994, and the test set contains data from 1994 to 2014. SODA data from 1871 to 1974 was used for transfer learning, while SODA data from 1980 to 2010 was used for testing. ITF transport was standardized before being incorporated into the deep-learning model.

The standardized z-score method is based on the following equation:

\begin{array}{l} Z = \frac{X - μ}{σ}, & (1) \end{array}

where μ is the mean value of the train data, σ is the standard deviation of the train data, X is the transport of ITF, and Z is the standardized ITF transport. The z-score method can be applied to numerical data and is not affected by the magnitude of the data, because its function is to eliminate the inconvenience caused by the magnitude of the analysis.

2.2 Methods

Figure 3 shows the CNN architecture used in this study, which consists of four convolutional layers and four pooling layers. The variables of the input layer correspond to the SSH from time t−2 months to time t (in months), between 30°E–286°E and 44°S–44°N. Each convolution layer was convoluted by a 4×4 convolutional filter. To filter the output of previous layers, a predefined non-linear activation function and batch normalization were applied. After the four convolutions, the features are flattened into one-dimensional vectors and transferred to a two-layer fully connected neural network to predict ITF transport. ReLU (Pedamonti, 2018) was used as a non-linear activation function. The role of the activation function is to add non-linear properties to the network, allowing it to learn highly complex mappings.

FIGURE 3

Figure 3 CNN Architecture diagram.

The convolution layer works by convolving a small convolution filter (Figure 3) onto the input image and then passing each output pixel through the activation function, mapping the input (SSH image) to the output (ITF transport). The CNN’s convolutional filtering matrices are not present, but the gradient descent algorithm is used to optimize input and output data until they reach the minimum value of the target error function (Kingma and Ba, 2014).

The horizontal dimension of the pooling kernel is 4×4. The pooling kernel continuously reduces the previous layer’s data by selecting the most significant pixel among the locally selected pixels. The cost of transport prediction error is calculated by taking the derivative of the network’s weight value. The weight value of the convolution filter and the entire connection is then trained using backpropagation to update each weight value to reduce the loss. The power of CNN lies in the fact that the filters of each convolutional layer are learned from data as part of the training process rather than being prespecified.

Deep learning necessitates the selection of hyperparameters to optimize the network, specifically: the horizontal dimension of the convolution matrix is 4×4, the Adam Optimiser algorithm (Kingma and Ba, 2014) is used to achieve gradient descent, and the default learning rate is set to 0.001. The dropout probability is set to 30% to reduce overfit and is implemented between the first and second fully connected layers. The neural network loss function is defined as the mean square error between the actual transport of ITF and the predicted transport by CNN. The network is coded in Python and employs Google’s machine-learning package TensorFlow (Abadi et al., 2016).

Insufficient training data causes overfitting or skill reduction in any neural network. High-complexity networks with more trainable parameters typically achieve better prediction skills, but they require more training data (George et al., 2021). The number of free parameters in the CNN in this paper was O(10⁶), and this was updated iteratively using the random gradient descent method and training data with a number of O(10⁴). Regularization techniques are used in CNN optimization to detect and prevent overfitting. This method divides data into independent train sets, verification sets, test sets, and random dropout of neurons.

We also compared results from various methods, including: (1) support vector machines (SVM); (2) logistic regression with the penalty term set to L2 and the regularization coefficient set to 1; (3) random forest, in which we implemented a random forest with 1,000 tree estimators; (4) deep fully connected neural networks (DNN), in which we used four layers of neural networks with 8,000, 1,250, 256, and 64 neurons, ReLU activation function, mean square error as the loss function, and no dropout; (5) a residual network (ResNet) with over 17 million parameters; and a (6) convolutional LSTM network with two convolutional LSTM layers, one convolutional layer, and two fully connected layers. All of the methods described above used the same dataset.

To evaluate the performance of CNN and other data-driven methods, skill S and correlation coefficient R were defined as:

\begin{array}{l} \begin{matrix} S = 1 - {(\frac{\frac{1}{N} \sum_{i = 1}^{N} {(y_{p, i} - y_{t, i})}^{2}}{σ_{y_{t}}^{2}})}^{\frac{1}{2}} \\ R = \frac{\frac{1}{N} \sum_{i = 1}^{N} (y_{p, i} - {\bar{y}}_{p}) (y_{t, i} - {\bar{y}}_{t})}{σ_{y_{p}} σ_{y_{t}}}, \end{matrix} & (2) \end{array}

where y_p and y_t are the predicted transport and the actual transport and σ_{y_p} and σ_{y_t} are the standard deviations of the actual and predicted transport of ITF.

The skill and correlation coefficient of perfect prediction is approaching. However, there are significant differences between these two indicators. Skill S is the monotone decreasing function of mean square error, which is negative when the prediction is worse than the data average (George et al., 2021). Anyway, it is not sensitive to measurement accuracy in some cases and often needs to be multiplied by a constant.

3 Results

The ITF transport of verification and test data was built using CNN training of CMIP6 data. CNN’s epoch was set to 100. One epoch indicates that all data have been sent to the network, completing the forward computation and backpropagation process. Figure 4 demonstrates that the predictive skill of verification data reached a high level around epoch 30. The average S showed a peak of 0.69 (Figure 4), corresponding to a high correlation coefficient of 0.95 that was significant at a 99% confidence level between deep-learning transport and actual transport of CMIP6. This suggests that the CNN was very efficient at extracting the required information from the SSH to infer the ITF transport of CMIP6. Training for too many epochs does not result in better verification and test data results. The training skill was expected to improve further with the development of the CNN model. Nonetheless, the skill of verification and test data was approximately 0.69. The skill stabilized in a short epoch, indicating that excessive training may lead to overfitting.

FIGURE 4

Figure 4 Evolution of skills with deep learning. The solid lines represent deep-learning skills with CMIP6, whereas the dotted lines (except black) represent SODA skills. The solid blue line represents the verification data, the orange line the test data, and the black dotted line represents the average of the top 10 skills using CMIP6. The green dotted line represents verification data, the red dotted line represents test data, and the pink dotted line denotes the average of the top 10 skills with SODA.

Figures 5A, B show the distribution of inferred and actual ITF of CMIP6 qualitatively and quantitatively, and the inferred ITF corresponded well with the actual ITF. A comparison of CNN-inferred and actual ITF transports of CMIP6 revealed that the CNN is capable of producing a reasonable ITF transport time series (Figure 5C). Despite the fact that the CNN explained up to 90% of ITF transport variation (Figure 5A), it appeared that the results inferred by CNN had a systematic bias, which may be a result of the limitations of using only SSH data. The CNN consistently underestimated the peak values of ITF transportation (Figures 5B, C). Even when increasing the number of extreme ITF transport training examples, testing various optimizers (e.g., stochastic gradient descent [SGD], Adam), loss functions (mean absolute error and mean square error), and weight regularization (L1, L2), this underestimation is unavoidable.

FIGURE 5

Figure 5 A comparison of CNN-inferred and actual ITF CMIP6 transports. (A) The x-axis represents the range of CMIP6 actual ITF, while the y-axis represents the inferred ITF for each actual ITF. The black dotted lines show where the inferred ITF equals the CMIP6-actual ITF. The scatter diagram shows that the CNN explains more than 90% of the traffic variance (max achieves R² = 0.90). (B) The histogram demonstrates the bias of underestimated transportation extremes. The actual ITF transport is shown in black, while the inferred ITF transport is shown in red. (C) Time series of actual transport (black) and inferred ITF transport (red) when the skill is 0.69.

The CNN showed excellent performance in inferring the ITF transport and we then directly substituted the data from SODA and AVISO into the model trained by the CMIP 6 (ITF without transfer, i.e., all the samples for training, test, and prediction were from CMIP 6 simulations). The average test skill of SODA data acquired by the CNN peaked at 0.54 (Figure 4). Owing to overfitting, the skill dropped rapidly after the training epoch reaches approximately 70, so the training step size should be set between 30 and 70. When compared with the CNN model based on CMIP6, SODA’s test skills were significantly lower, and its inferring ability is less stable.

To compensate for the small sample size of the reanalysis data, we conducted transfer learning on SODA using the training model with CMIP6 data. The R of inferred SODA ITF transport with transfer learning was 0.91, while the R of inferred SODA ITF transport without transfer learning was 0.86. The correlation of inferring with transfer learning was slightly improved when compared to the model without transfer learning (Figure 6). In inferring extreme transport, transfer learning was slightly inferior to that without transfer learning, but the degree of fitting was better than the model without transfer learning (Figure 6).

FIGURE 6

Figure 6 Comparison between CNN-inferred SODA with transfer (red), CNN-inferred SODA without transfer (blue), and actual ITF transports of SODA (black).

Figure 7 shows a comparison of various statistical methods. We found that the CNN explains more than 90% of the variance and shows a better performance than other statistical methods (logistic regression, random forest, SVM, or DNN), as expected (Figure 7). We also put different CNN variants to the test, such as the ResNet and convolutional LSTM. It is interesting to note that ResNet performed similarly to the CNN, despite having more parameters (Figure 7). The addition of recurrent neural networks did not improve the CNN’s capability. Given that the CNN has an excellent ability to infer ITF transport with the SSH, we used the same model to further investigate the deep-learning approach of predicting long-term ITF transport with the CNN. It should be noted that these various statistical methods contain very different parameters that may potentially influence the comparison.

FIGURE 7

Figure 7 A comparison of various statistical methods for inferring with CMIP6. The y-axis represents the inference abilities of various technologies, such as SVM, logistic regression, random forest, DNN, CNN, ResNet, and convolutional LSTM network. The R² of each column represents the proportion of inferred variance.

We then generated an updated long-term and continuous time series of ITF transport using the CNN-based deep-learning approach and updated satellite observations of SSH (Figure 8). The CNN model’s inference of ITF transport was validated by comparing it with observations (Figure 8). It appeared that the ITF from deep learning captures the general variability of ITF: the correlation coefficient was 0.43 between IX1-observed ITF and 13-month-running-mean time series of CNN-inferred ITF with satellite observation, 0.57 between INSTANT-observed ITF and CNN-inferred ITF with satellite observations, and 0.52 between MITF-observed ITF and CNN-predicted ITF with satellite observations, all of which were significant at the 99% confidence level (Figure 8). The deep-learning ITF differed from the observations primarily in terms of peaks and valleys, which may be associated with the ability of CMIP6 models to reproduce the ITF’s extremes (figure not shown).

FIGURE 8

Figure 8 Comparison between CNN-inferred and actual ITF transports of observations. (A) inferred-predicted (black), ITF transports from INSTANT observation (red), and MITF (blue). (B) Thirteen-month-running-mean CNN-inferred (black) and actual ITF transports of IX1 (red).

Figure 9 compares the predicted ITF transport of SODA with the actual ITF after transfer learning with different time leads. As shown in Figure 9, the R decreased overall as the time lead increased, and the CNN model could make a valid prediction (R>0.5) with a lead time of up to approximately 7 months. It should be noted that a 12-month moving average is used before calculating the correlation coefficient between the actual ITF and predicted ITF transports to reduce the influence of the ITF’s strong seasonality, and it shows that including of seasonality leads to a higher correlation coefficient between the actual ITF and predicted ITF transports. Figure 10 compares the predicted ITF transport from various models with the actual ITF transport (time series are 12-month smoothed to focus on interannual variability). The CNN was more effective and produced a longer forecast than other models (Figures 9 and 10).

FIGURE 9

Figure 9 Comparison of actual ITF and predicted ITF transport of SODA using various models.

FIGURE 10

Figure 10 Comparison between actual ITF and inferred ITF transport of SODA with different models. (A) CNN. (B) Resnet. (C) Convolutional LSTM. (D) Forest. (E) Logic. (F) DNN.

4 Discussion and conclusion

In this study, we investigated the deep-learning approach for inferring and predicting ITF transport with SSH images using model simulations from CMIP6 and reanalysis data products. We discovered that the CNN-based deep-learning approach with SSH images can generate a reasonable time series of ITF transport that captures approximately 90% of the actual ITF variance. CNN-based deep learning with reanalysis data sets performed similarly well, reproducing approximately 80% of the variance of actual ITF transport. These findings imply that the CNN, which explicitly relies on two-dimensional pattern analysis, outperforms other traditional data-driven technologies, such as logistic regression (R² =0.56), random forest (R² =0.74), statistical vector machines (R² =0.22), and primary fully connected neural networks (R² =0.84).

Although the CNN performed admirably in predicting ITF transport, it appears that the CNN’s prediction has a systematic bias, and the peak values of ITF transport were consistently underestimated by the CNN. Even when some methods, such as increasing the number of extreme ITF transport training examples, testing various optimizers, loss functions, and weight regularization, were used, bias and underestimation remained unavoidable. The bias and underestimation indicate that the skill limitations are due to the inherent incompleteness of the information in SSH, rather than a lack of training data or weaknesses in the CNN architecture.

It is well known that the network parameters most likely influence deep-learning performance. Different network parameters were also tested in this study. We employed various sizes of convolutional filters (3×3, 4×4, 5×5, and 7×7) and pooling filters (2×2 and 4×4). Larger convolutional filters produced worse predictions than smaller convolutional filters. We discovered that a 3×3 convolutional filter predicts similarly to a 4×4 convolutional filter, but a larger convolutional filter means fewer parameters. Additionally, we tested various optimizers (SGD and Adam), loss functions (mean absolute error and mean square error), and weight regularization (L1 and L2), and the CNN was found to be the most efficient choice. We attempted to improve the performance of the CNN by increasing the number of parameters and the cyclic neural network, which can improve the performance at month 0. However, as prediction time increased, the prediction ability of the two remained inferior to that of the CNN. This means that in some cases, more complex networks do not produce better results. We also tried increasing the size of the input from t−9 months to time t (in months), and the performance of ResNet and convolutional LSTM improved slightly.

The performance of the CNN in the model and reanalysis data demonstrates that there is enough information in SSH to predict ITF transport. However, further improvement in predicting ITF transport using a deep-learning approach is required and is dependent on at least two factors. On the one hand, the amount of training data required for the CNN supervised learning is quite large, necessitating a sampling size of O(10⁴). As a result, advanced deep-learning techniques that reduce the amount of essential training data by at least one order of magnitude are required, as is the ability to forecast in the long term. On the other hand, considering that the ITF is influenced by baroclinic processes as well, the subsurface information is also important for inferring and predicting ITF transport. As a result, a better deep-learning approach should use subsurface information in training the model, even though observing the subsurface ocean is obviously very different from satellite-based sea surface observation. Furthermore, as the large-scale pressure gradient between the Pacific Ocean and the Indian Ocean is the driver of ITF and some key regions seem to determine the ITF (e.g., Susanto et al., 2007; Tillinger and Gordon, 2009), the deep-learning approach might be further improved if we additionally consider the SSH in these key regions. All of this points to a bright future for monitoring and predicting important large-scale ocean circulations, such as the ITF.

Data availability statement

The original contributions presented in the study are included in the article/supplementary material. Further inquiries can be directed to the corresponding author.

Author contributions

SH designed the study. LX and SH conducted the analysis and wrote the initial draft of the paper. All authors contributed to the article and approved the submitted version.

Funding

This study is supported by the Shandong Provincial Natural Science Foundation (ZR2020JQ18), the Strategic Priority Research Program of the Chinese Academy of Sciences (CAS) (XDB42010403), the National Natural Science Foundation of China (42022040), and the Laoshan Laboratory (2022LSL010304). S. Hu is a member of the CAS Interdisciplinary Innovation Team (JCTD-2020-12) and the Youth Innovation Promotion Association of CAS. We are grateful to Qin- Yan Liu for providing the time series of ITF transport based on IX1 observations and to Mingting Li for helping us access the MITF time series.

Acknowledgments

The altimeter data were produced by SSALTO/Duacs and available at https://sso.altimetry.fr/(readers may need to register for an account with AVISO to access the data). The CMIP6 simulations can be accessed at https://esgf-node.llnl.gov/search/cmip6/. The International Nusantara Stratification and Transport (INSTANT) data can be accessed at http://www.marine.csiro.au/~cow074/instantdata.htm. The SODA 2.2.4 and OSCAR datasets were obtained from the Asia-Pacific Data Research Center of the International Pacific Research Center at the University of Hawaiʻi at Mānoa (http://apdrc.soest.hawaii.edu/index.php).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Abadi M., Agarwal A., Barham P., Brevdo E., Chen Z., Citro C., et al. (2016). Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv 1603, 4467. doi: 10.48550/arXiv.1603.04467