<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="2.3" xml:lang="EN">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Mar. Sci.</journal-id>
<journal-title>Frontiers in Marine Science</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Mar. Sci.</abbrev-journal-title>
<issn pub-type="epub">2296-7745</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fmars.2023.1151796</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Marine Science</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>MCSTNet: a memory-contextual spatiotemporal transfer network for prediction of SST sequences and fronts with remote sensing data</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Ma</surname><given-names>Ying</given-names>
</name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/1989480"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Liu</surname><given-names>Wen</given-names>
</name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/2295537"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Chen</surname><given-names>Ge</given-names>
</name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<xref ref-type="author-notes" rid="fn001"><sup>*</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/631912"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Zhong</surname><given-names>Guoqiang</given-names>
</name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="author-notes" rid="fn001"><sup>*</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/1055446"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Tian</surname><given-names>Fenglin</given-names>
</name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/1263681"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Frontiers Science Center for Deep Ocean Multispheres and Earth System, School of Marine Technology, Ocean University of China</institution>, <addr-line>Qingdao</addr-line>, <country>China</country></aff>
<aff id="aff2"><sup>2</sup><institution>Faculty of Information Science and Engineering, Ocean University of China</institution>, <addr-line>Qingdao</addr-line>, <country>China</country></aff>
<aff id="aff3"><sup>3</sup><institution>Laboratory for Regional Oceanography and Numerical Modeling, Laoshan Laboratory</institution>, <addr-line>Qingdao</addr-line>, <country>China</country></aff>
<author-notes>
<fn fn-type="edited-by">
<p>Edited by: Toru Miyama, Japan Agency for Marine-Earth Science and Technology, Japan</p>
</fn>
<fn fn-type="edited-by">
<p>Reviewed by: Claudia Fanelli, Italian National Research Council - Institute of Marine Sciences (CNR-ISMAR), Italy; Takuro Matsuta, The University of Tokyo, Japan</p>
</fn>
<fn fn-type="corresp" id="fn001">
<p>*Correspondence: Ge Chen, <email xlink:href="mailto:gechen@ouc.edu.cn">gechen@ouc.edu.cn</email>;  Guoqiang Zhong, <email xlink:href="mailto:gqzhong@ouc.edu.cn">gqzhong@ouc.edu.cn</email>
</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>16</day>
<month>05</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>10</volume>
<elocation-id>1151796</elocation-id>
<history>
<date date-type="received">
<day>26</day>
<month>01</month>
<year>2023</year>
</date>
<date date-type="accepted">
<day>18</day>
<month>04</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2023 Ma, Liu, Chen, Zhong and Tian</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Ma, Liu, Chen, Zhong and Tian</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>Ocean fronts are a response to the variabilities of marine hydrographic elements and are an important mesoscale ocean phenomenon, playing a significant role in fish farming and fishing, sea-air exchange, marine environmental protection, etc. The horizontal gradients of sea surface temperature (SST) are frequently applied to reveal ocean fronts. Up to now, existing spatiotemporal prediction approaches have suffered from low prediction precision and poor prediction quality for non-stationary data, particularly for long-term prediction. It is a challenging task for medium- and long-term fine-grained prediction for SST sequences and fronts in oceanographic research. In this study, SST sequences and fronts are predicted for future variation trends based on continuous mean daily remote sensing satellite of SST data. To enhance the precision of the predicted SST sequences and fronts, this paper proposes a novel memory-contextual spatiotemporal transfer network (MCSTNet) for SST sequence and front predictions. MCSTNet involves three components: the encoder-decoder structure, a time transfer module, and a memory-contextual module. The encoder-decoder structure is used to extract the rich contextual and semantic information in SST sequences and frontal structures from the SST data. The time transfer module is applied to transfer temporal information and fuse low-level, fine-grained temporal information with high-level semantic information to improve medium- and long-term prediction precision. And the memory-contextual module is employed to fuse low-level, spatiotemporal information with high-level semantic information to enhance short-term prediction precision. In the training process, mean squared error (MSE) loss and contextual loss are combined to jointly guide the training of MCSTNet. Extensive experiments demonstrate that MCSTNet predicts more authentic and reasonable SST sequences and fronts than the state-of-the-art (SOTA) models on the SST data.</p>
</abstract>
<kwd-group>
<kwd>the encoder-decoder structure</kwd>
<kwd>the time transfer module</kwd>
<kwd>the memory-contextual module</kwd>
<kwd>MCSTNet</kwd>
<kwd>SST sequence and front prediction tasks</kwd>
</kwd-group>
<contract-num rid="cn002">2018AAA0100400</contract-num>
<contract-num rid="cn004">ZR2020MF131</contract-num>
<contract-sponsor id="cn002">National Key Research and Development Program of China<named-content content-type="fundref-id">10.13039/501100012166</named-content></contract-sponsor>
<contract-sponsor id="cn004">Natural Science Foundation of Shandong Province<named-content content-type="fundref-id">10.13039/501100007129</named-content></contract-sponsor>
<contract-sponsor id="cn005">Natural Science Foundation of Shandong Province<named-content content-type="fundref-id">10.13039/501100007129</named-content></contract-sponsor>
<counts>
<fig-count count="10"/>
<table-count count="4"/>
<equation-count count="10"/>
<ref-count count="51"/>
<page-count count="19"/>
<word-count count="10140"/>
</counts>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-in-acceptance</meta-name>
<meta-value>Physical Oceanography</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec id="s1" sec-type="intro">
<label>1</label>
<title>Introduction</title>
<p>Comprehending complex ocean phenomena is a difficult task since plenty of natural processes must be taken into account. Numerical models based on physical equations have long been used in the field of ocean phenomena prediction. In-depth research is made possible due to the abundance of ocean products that are derived from satellites, which also emphasizes the need for practical techniques for researching time-series observations. As one of the most time-honored ocean products, sea surface temperature (SST) is a key factor in helping to comprehend scientific issues concerning sea-air interaction, biological, chemical, and physical oceanography. SST is frequently applied to disclose a variety of important ocean phenomena, such as ocean fronts (<xref ref-type="bibr" rid="B21">Legeckis, 1977</xref>). When dealing with phenomena of a complex nature, traditional statistical analysis approaches are restricted by their shallow model structure.</p>
<p>Deep neural networks (DNNs) are an upgraded version of artificial neural networks (ANNs) that are one of the most widely used and currently prominent deep learning (DL) approaches, which use valid parameter optimization techniques and architecture designs (<xref ref-type="bibr" rid="B15">Jordan and Mitchell, 2015</xref>; <xref ref-type="bibr" rid="B20">LeCun et&#xa0;al., 2015</xref>). Compared to conventional statistical models, DL approaches are far more sophisticated. They have been widely applied in oceanography domains (<xref ref-type="bibr" rid="B4">Ducournau and Fablet, 2016</xref>; <xref ref-type="bibr" rid="B10">Ham et&#xa0;al., 2019</xref>; <xref ref-type="bibr" rid="B33">Reichstein et&#xa0;al., 2019</xref>; <xref ref-type="bibr" rid="B22">Li et&#xa0;al., 2020</xref>; <xref ref-type="bibr" rid="B1">Buongiorno Nardelli et&#xa0;al., 2022</xref>). It is efficient to discover and mine the complex rules in a time series of large amounts of remote sensing data (<xref ref-type="bibr" rid="B51">Zheng et&#xa0;al., 2020</xref>; <xref ref-type="bibr" rid="B24">Liu et&#xa0;al., 2021</xref>). Inspired by the video frame prediction using DL approaches (<xref ref-type="bibr" rid="B42">Wang et&#xa0;al., 2021</xref>; <xref ref-type="bibr" rid="B5">Gao et&#xa0;al., 2022</xref>), in this paper, we develop an SST pattern prediction method by building a DL model.</p>
<p>There are many different algorithms for DL, two typical ones of them being convolutional neural networks (CNNs) (<xref ref-type="bibr" rid="B19">Kunihiko and Sei, 1982</xref>) and recurrent neural networks (RNNs) (<xref ref-type="bibr" rid="B6">Glorot et&#xa0;al., 2011</xref>). CNNs are generally used in computer vision and are essentially input-to-output mappings that learn a large number of mapping relationships between inputs and outputs. On the other hand, RNNs are commonly applied in natural language processing (NLP) and various sequence processing tasks, where information is passed through repetitive loops so that the network can remember the sequence information and analyze patterns of data variation across the sequence.</p>
<p>The improvements in the structures of RNN, such as long short-term memory (LSTM), have better memory capability and long-term dependency to handle sequence prediction problems. While traditional RNN structures suffer from vanishing and exploding gradients when dealing with long sequences, resulting in ineffective information transfer. LSTM enhances the hidden layer of RNNs, and short and long time series are stored and retrieved by its memory blocks (<xref ref-type="bibr" rid="B12">Hochreiter and Schmidhuber, 1997</xref>). Recurrently connected cells are applied to study the relationships between two time frames and then transfer the probabilistic inference to the subsequent frame. Recently, these methods have been enhanced so that many architectures for temporal sequence processing tasks are available. Specifically, convolutional long short-term memory (ConvLSTM) is proposed by <xref ref-type="bibr" rid="B35">Shi et&#xa0;al. (2015)</xref>, which substitutes the CNN activation method for the sigmoid activation functions or rectified linear unit (RELU) so that it obtains higher prediction accuracy compared to LSTM. This is because CNN can improve the feature extraction capabilities of LSTM. An encoder-decoder LSTM is proposed by <xref ref-type="bibr" rid="B37">Srivastava et&#xa0;al. (2015)</xref>, which achieves reconstructing and predicting the video sequences. These developments open up a number of inspiring opportunities. <xref ref-type="bibr" rid="B50">Zhang et&#xa0;al. (2017)</xref> try to apply a fully connected LSTM (FC-LSTM) structure to model the sequence dependencies and tackle the issue of SST pattern prediction. As far as we are aware, this is the first study to employ the cutting-edge sequence prediction technique to predict SST. Nevertheless, FC-LSTM only considers temporal sequence. As a matter of fact, SST pattern prediction is an issue of spatiotemporal sequence, which inputs previous SST patterns and outputs future SST patterns. The predictive performance is restricted due to the flaw of FC-LSTM. Therefore, it is difficult to increase prediction accuracy because a great deal of information is lost during the prediction processing. Generally, previous SST pattern prediction approaches neglected the spatial sequences in images, leading to low prediction accuracy (<xref ref-type="bibr" rid="B37">Srivastava et&#xa0;al., 2015</xref>; <xref ref-type="bibr" rid="B50">Zhang et&#xa0;al., 2017</xref>). To tackle this issue, based on the SST data of Chinese coastal waters and the Bohai Sea, <xref ref-type="bibr" rid="B47">Yang et&#xa0;al. (2017)</xref> develop an SST forecast network called CFCC-LSTM that consists of one convolutional and one fully connected LSTM layer. <xref ref-type="bibr" rid="B44">Wei et&#xa0;al. (2020)</xref> employ a neural network to forecast South China Sea temperature based on the Ice Analysis (OSTIA) data. Likewise, <xref ref-type="bibr" rid="B26">Meng et&#xa0;al. (2021)</xref> propose a generative adversarial network (GAN) based on physics-guided learning and apply observation data from the South China Sea to calibrate parameters, improving the prediction performance of sea subsurface temperature. In addition, <xref ref-type="bibr" rid="B51">Zheng et&#xa0;al. (2020)</xref> propose a DL network with a bias correction and a DNN to predict SST data and then tropical instability waves (TIWs) based on the predicted SST data. In other oceanic areas, SST patterns are also forecasted by DL-based approaches (<xref ref-type="bibr" rid="B29">Patil and Deo, 2017</xref>; <xref ref-type="bibr" rid="B50">Zhang et&#xa0;al., 2017</xref>; <xref ref-type="bibr" rid="B30">Patil and Deo, 2018</xref>). Although it is unsatisfactory for the long sequence prediction performance and the authenticity of the predicted images using the aforementioned approaches, it has been demonstrated that predicting SST by DL approaches based on spatiotemporal sequences of remote sensing images offers promise for building a data-driven model to tackle this issue.</p>
<p>Since SST is extremely simple to observe with high precision, its horizontal gradients are frequently employed to describe fronts (<xref ref-type="bibr" rid="B34">Ruiz et&#xa0;al., 2019</xref>). SST fronts are narrow transition zones between two or more bodies of water with distinctly different temperature characteristics, including fronts associated with small-scale meteorological forcing, submesoscale fronts, tidal fronts, shelf-break SST fronts, and planetary-scale SST fronts (<xref ref-type="bibr" rid="B25">Mauzole, 2022</xref>). Fronts can divide SST images into multiple regions and produce nonlinear flows and processes on different temporal and spatial scales. Therefore, monitoring and predicting front activity is a considerable challenge. Continuous changes in front activity can be obtained by processing a time series of daily SST and using these changes to predict future front activity, which is important for sea-air exchange, marine fish farming, and fishing (<xref ref-type="bibr" rid="B38">Toggweiler and Russell, 2008</xref>; <xref ref-type="bibr" rid="B46">Woodson and Litvin, 2015</xref>). However, previous research has only conducted a preliminary investigation of front forecasts in different oceanic areas. In the Kuroshio region, although direct forecasting of Kuroshio fronts is relatively rare, changes in the position of the Kuroshio can directly affect the extent of Kuroshio intrusion on the shelf and therefore the position of the Kuroshio fronts. Thus, several studies on forecasting the path of the Kuroshio have been done. For instance, <xref ref-type="bibr" rid="B18">Komori et&#xa0;al. (2003)</xref> make use of a 1-1/2 layer primitive equation model to forecast short-term Kuroshio path variabilities south of Japan and reproduce the characteristic evolution of the Kuroshio into a large-amplitude route off Enshu-nada. <xref ref-type="bibr" rid="B16">Kagimoto et&#xa0;al. (2008)</xref> successfully forecast the Kuroshio large meander path variations using a high-resolution (approximately 10&#xa0;km) ocean forecasting system, the Japan Coastal Ocean Predictability Experiment (JCOPE). <xref ref-type="bibr" rid="B17">Kamachi et&#xa0;al. (2004)</xref> develop a more complex ocean data assimilation forecasting system for operational use by the Japan Meteorological Agency. Moreover, Gulf of Mexico eddy frontal positions (<xref ref-type="bibr" rid="B28">Oey et&#xa0;al., 2005</xref>; <xref ref-type="bibr" rid="B49">Yin and Oey, 2007</xref>; <xref ref-type="bibr" rid="B3">Counillon and Bertino, 2009</xref>; <xref ref-type="bibr" rid="B7">Gopalakrishnan et&#xa0;al., 2013</xref>) and Iceland-Faroe front variability (<xref ref-type="bibr" rid="B27">Miller et&#xa0;al., 1995</xref>; <xref ref-type="bibr" rid="B31">Popova et&#xa0;al., 2002</xref>; <xref ref-type="bibr" rid="B23">Liang and Robinson, 2004</xref>) are forecasted. From the perspective of the global forecast system, <xref ref-type="bibr" rid="B36">Smedstad et&#xa0;al. (2003)</xref> establish the global eddy real-time forecasting systems that are capable of forecasting fronts and eddies using the assimilation methods of optimal interpolation (OI) based on SST and sea surface height (SSH) data. The interpolation results are corrected based on daily data changes and rely on SST data from satellite infrared radiometers to locate the fronts, while SST data from the infrared radiometers are highly disturbed by cloud cover, resulting in low prediction performance. To improve the prediction performance, several model-based global ocean forecasting systems are developed, such as those based on the Hybrid Coordinate Ocean Model (HYCOM) (<xref ref-type="bibr" rid="B2">Chassignet et&#xa0;al., 2009</xref>) and the Nucleus for European Modelling of the Ocean Model (NEMO) (<xref ref-type="bibr" rid="B13">Hurlburt et&#xa0;al., 2009</xref>). Up to now, existing research using DL-based models to predict fronts has been rare. <xref ref-type="bibr" rid="B48">Yang et&#xa0;al. (2022)</xref> employ GoogLeNet to categorize the front trend as attenuating or enhancing, but this is only a classification task and cannot predict future front variation trends.</p>
<p>Existing DL-based approaches for spatiotemporal sequence prediction are mainly divided into four categories: RNNs-based (<xref ref-type="bibr" rid="B41">Wang et&#xa0;al., 2017</xref>; <xref ref-type="bibr" rid="B43">Wang et&#xa0;al., 2019</xref>; <xref ref-type="bibr" rid="B42">Wang et&#xa0;al., 2021</xref>), CNNs-based (<xref ref-type="bibr" rid="B5">Gao et&#xa0;al., 2022</xref>), CNNs and RNNs-based (<xref ref-type="bibr" rid="B35">Shi et&#xa0;al., 2015</xref>), and DL and physical constraints-based (<xref ref-type="bibr" rid="B9">Guen and Thome, 2020</xref>) approaches. CNNs-based approaches are not good at predicting long-term changes in the data because they cannot learn continuous change features in the sequence well (<xref ref-type="bibr" rid="B41">Wang et&#xa0;al., 2017</xref>). RNNs-based approaches predict future sequences by learning the change features of previous sequences, while the quality of predicted images decreases with increasing prediction time, resulting in poor prediction quality for complex long term prediction tasks. CNNs and RNNs-based approaches, in which the CNNs discard some fine-grained information when extracting features to reduce the computational complexity of the network, result in poor prediction quality. DL and physical constraints-based approaches are employed to constrain data with simple change patterns by a specific physical model, whereas the quality of predicted images is low for non-stationary data. The existing spatiotemporal prediction approaches have suffered from low prediction precision and poor prediction quality for non-stationary data, especially for long-term prediction, which is a challenging task for long-term fine-grained prediction for SST sequences and fronts.</p>
<p>In this study, variations in future trends of SST sequences and fronts are predicted based on continuous mean daily SST data. Encouraged by the excellent performance of U-Net in oceanography (<xref ref-type="bibr" rid="B22">Li et&#xa0;al., 2020</xref>), this paper proposes a memory-contextual spatiotemporal transfer network (MCSTNet) for continuous spatiotemporal prediction of SST sequences and fronts to improve prediction precision. The MCSTNet involves three components: the encoder-decoder structure, the time transfer module, and the memory contextual module. During the training phase, the combined benefits of mean squared error (MSE) loss and contextual loss collectively guide MCSTNet for stability training. Extensive experiments demonstrate that the predicted SST sequences and fronts by our proposed MCSTNet are more authentic and reasonable than the state-of-the-art (SOTA) models and that MCSTNet outperforms the SOTA models on the SST data. Furthermore, the SSS data are applied to verify the performance and generalization ability of MCSTNet.</p>
<p>Our contributions are as follows:</p>
<list list-type="bullet">
<list-item>
<p>Based on continuous mean daily SST data, a continuous spatiotemporal prediction MCSTNet framework is proposed to predict the short-, medium-, and long-term future variations in trends of SST sequences and fronts.</p>
</list-item>
<list-item>
<p>The methodology of MCSTNet contains three components: the encoder-decoder structure, the transfer module, and the memory-contextual module. The encoder-decoder structure extracts the rich contextual and semantic information in SST sequences and frontal structures from the SST data. The time transfer module transfers temporal information and fuses low-level, fine-grained temporal information with high-level semantic information, and the memory-contextual module fuses low-level spatiotemporal information with high-level semantic information, which enhances the predicted precision of SST sequences and fronts.</p>
</list-item>
<list-item>
<p>Qualitative and quantitative experimental results demonstrate that the performance of MCSTNet is superior to the SOTA models on both SST and SSS data. The ablation studies demonstrate the effectiveness of each module within MCSTNet.</p>
</list-item>
</list>
<p>The remainder of this paper is organized as follows. The SST data is preprocessed, and a continuous spatiotemporal prediction network, called MCSTNet, is built in Section 2. The experimental results display the excellent spatiotemporal SST sequence and front prediction capability of MCSTNet on the SST and SSS data, which combines the medium- and long-term prediction benefits of the time transfer module with the short-term prediction capability of the memory-contextual module, as presented in Section 3. We summarize this paper with remarks and future work in Section 4.</p>
</sec>
<sec id="s2" sec-type="materials|methods">
<label>2</label>
<title>Materials and methods</title>
<p>In this section, we preprocess the SST data based on the derivation of physical quantities. Moreover, a continuous spatiotemporal prediction network called MCSTNet is proposed for the SST sequence and front prediction tasks.</p>
<sec id="s2_1">
<label>2.1</label>
<title>Data preprocessing</title>
<p>The daily SST data, with a spatial resolution of 0.05&#xb0; &#xd7; 0.05&#xb0;, are generated by the Operational Sea Surface Temperature and Sea Ice Analysis (OSTIA) system. A representative oceanic area, the Oyashio current (OC) region (30&#xb0; N &#x2013; 45&#xb0; N, 142&#xb0; E &#x2013; 157&#xb0; E), is selected, whose SST is shown in <xref ref-type="fig" rid="f1"><bold>Figure&#xa0;1A</bold></xref>. We apply 8-year SST data as a training set, from 1 January, 2006, to 31 December, 2013, to train the learning models, and use 2-year SST data as a testing set, from 1 January, 2014, to 31 December, 2015, to test them. Physical quantities are employed to derive gradients of SST to obtain front structures, which is our target data. In this study, the SST gradient map is referred to as the SST front structures because the SST gradient can reflect the SST front structures (<xref ref-type="bibr" rid="B43">Guan et al., 2010</xref>). The formulas are:</p>
<fig id="f1" position="float">
<label>Figure&#xa0;1</label>
<caption>
<p>Data processing procedures in the OC region. <bold>(A)</bold> The SST data on 2 January, 2015, <bold>(B)</bold> front structures derived from physical quantities, and <bold>(C)</bold> fronts derived from threshold segmentation. The background is represented by category 0 and the SST front structure is represented by category 1.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fmars-10-1151796-g001.tif"/>
</fig>
<disp-formula>
<label>(1)</label>
<mml:math display="block" id="M1">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mi>G</mml:mi>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mo>=</mml:mo>
<mml:msqrt>
<mml:mrow>
<mml:msubsup>
<mml:mi>G</mml:mi>
<mml:mi>x</mml:mi>
<mml:mn>2</mml:mn>
</mml:msubsup>
<mml:mo>+</mml:mo>
<mml:msubsup>
<mml:mi>G</mml:mi>
<mml:mi>y</mml:mi>
<mml:mn>2</mml:mn>
</mml:msubsup>
</mml:mrow>
</mml:msqrt>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
</disp-formula>
<disp-formula>
<label>(2)</label>
<mml:math display="block" id="M2">
<mml:mrow>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi>x</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable columnalign="left">
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>S</mml:mi>
<mml:mi>S</mml:mi>
<mml:mi>T</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>S</mml:mi>
<mml:mi>S</mml:mi>
<mml:mi>T</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo stretchy="false">/</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>&#x2260;</mml:mo>
<mml:mn>0</mml:mn>
<mml:mtext>&#x2009;and&#x2009;</mml:mtext>
<mml:mi>j</mml:mi>
<mml:mo>&#x2260;</mml:mo>
<mml:msub>
<mml:mi>j</mml:mi>
<mml:mi>m</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mi>S</mml:mi>
<mml:mi>S</mml:mi>
<mml:mi>T</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>S</mml:mi>
<mml:mi>S</mml:mi>
<mml:mi>T</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mi>S</mml:mi>
<mml:mi>S</mml:mi>
<mml:mi>T</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>j</mml:mi>
<mml:mi>m</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>S</mml:mi>
<mml:mi>S</mml:mi>
<mml:mi>T</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>j</mml:mi>
<mml:mi>m</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>j</mml:mi>
<mml:mi>m</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>
<disp-formula>
<label>(3)</label>
<mml:math display="block" id="M3">
<mml:mrow>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi>y</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable columnalign="left">
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>S</mml:mi>
<mml:mi>S</mml:mi>
<mml:mi>T</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>S</mml:mi>
<mml:mi>S</mml:mi>
<mml:mi>T</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo stretchy="false">/</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2260;</mml:mo>
<mml:mn>0</mml:mn>
<mml:mtext>&#x2009;and&#x2009;</mml:mtext>
<mml:mi>i</mml:mi>
<mml:mo>&#x2260;</mml:mo>
<mml:msub>
<mml:mi>i</mml:mi>
<mml:mi>m</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mi>S</mml:mi>
<mml:mi>S</mml:mi>
<mml:mi>T</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>S</mml:mi>
<mml:mi>S</mml:mi>
<mml:mi>T</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mi>S</mml:mi>
<mml:mi>S</mml:mi>
<mml:mi>T</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>i</mml:mi>
<mml:mi>m</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>S</mml:mi>
<mml:mi>S</mml:mi>
<mml:mi>T</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>i</mml:mi>
<mml:mi>m</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>i</mml:mi>
<mml:mi>m</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where <inline-formula>
<mml:math display="inline" id="im1">
<mml:mi>G</mml:mi>
</mml:math>
</inline-formula> denotes the final gradients of the SST data in equation (1). In equation (2), <inline-formula>
<mml:math display="inline" id="im2">
<mml:mrow>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi>x</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> denotes the zonal gradient of the SST data that is half of the difference between adjacent pixels in the zonal direction on the SST data, and <inline-formula>
<mml:math display="inline" id="im3">
<mml:mrow>
<mml:msub>
<mml:mi>j</mml:mi>
<mml:mi>m</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the last pixel value in the zonal direction. In equation (3), <inline-formula>
<mml:math display="inline" id="im4">
<mml:mrow>
<mml:msub>
<mml:mi>i</mml:mi>
<mml:mi>m</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the last pixel value in the meridional direction, and <inline-formula>
<mml:math display="inline" id="im5">
<mml:mrow>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi>y</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> represents the meridional gradient of the SST data, which is half of the difference between adjacent pixels in the meridional direction on the SST data.</p>
<p>
<xref ref-type="fig" rid="f1"><bold>Figure&#xa0;1B</bold></xref> shows front structures obtained from the SST data using physical quantities to derive gradients. The nearshore front structures are excluded because the nearshore environment interferes with SST, resulting in data inaccuracies in experiments. The magnitude of the values of the front structures reflects their strength, which is significantly larger than that of the surrounding hydrographic elements. The final fronts can be obtained by setting the threshold <inline-formula>
<mml:math display="inline" id="im6">
<mml:mi>&#x3b8;</mml:mi>
</mml:math>
</inline-formula>, where <inline-formula>
<mml:math display="inline" id="im7">
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>&#x2264;</mml:mo>
<mml:mi>&#x3b8;</mml:mi>
<mml:mo>&#x2264;</mml:mo>
<mml:mn>1.8</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>&#xb0;C/km. This is due to the fact that the minimum gradient value is 0&#xb0;C/km and the mean value of the maximum value of SST fronts for each day of the 10-year period in the OC region is 1.8&#xb0;C/km. When using SST gradient data to obtain SST fronts, the size of the <inline-formula>
<mml:math display="inline" id="im8">
<mml:mi>&#x3b8;</mml:mi>
</mml:math>
</inline-formula> value needs to be selected based on experience and practical application scenarios. <xref ref-type="fig" rid="f1"><bold>Figure&#xa0;1C</bold></xref> displays the SST fronts when the threshold <inline-formula>
<mml:math display="inline" id="im9">
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0.5</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>&#xb0;C/km is used to segment front structures. Values exceeding 0.5&#xb0;C/km are detected as SST fronts and assigned a value of 1, while values lower than 0.5&#xb0;C/km are detected as non-SST fronts and assigned a value of 0.</p>
<p>In addition, the sea surface salinity (SSS) data are applied to verify the reliability and generalizability of the proposed MCSTNet. Similar to the SST data, the 8-year SSS data are used as a training set, from 1 January, 2006, to 31 December, 2013, and the 2-year SSS data are employed as a testing set, from 1 January, 2014, to 31 December, 2015. The daily SSS data, with a spatial resolution of 0.08&#xb0; &#xd7; 0.08&#xb0;, from ERA5 reanalyses are generated by the Operational Mercator global ocean reanalysis system. Similarly, the target front structures are obtained using physical quantities to derive gradients of SSS. Hereafter, we substitute front for front structure in subsequent text.</p>
</sec>
<sec id="s2_2">
<label>2.2</label>
<title>The MCSTNet framework</title>
<p>The overall framework of MCSTNet is displayed in <xref ref-type="fig" rid="f2"><bold>Figure&#xa0;2</bold></xref>, which is made up of three parts: the encoder-decoder structure, the time transfer module, and the memory-contextual module. The encoder-decoder structure is the backbone structure except for the time transfer module and memory-contextual module, which comprises a data encoder, a feature decoder, as well as a multi-task generation module. The encoder-decoder structure is used to extract the rich spatial features of input sequences and generate high-quality predicted spatial information. The time transfer module is applied to transfer temporal information and fuse low-level, fine-grained temporal information with high-level semantic information to improve medium- and long-term prediction precision. The memory-contextual module is employed to fuse low-level spatiotemporal information with high-level semantic information to enhance short-term prediction precision. Thus, MCSTNet is better at transferring spatiotemporal information for sequence prediction. As shown in <xref ref-type="fig" rid="f2"><bold>Figure&#xa0;2</bold></xref>, the MCSTNet framework receives the previous SST sequences and predicts the future SST sequences and fronts.</p>
<fig id="f2" position="float">
<label>Figure&#xa0;2</label>
<caption>
<p>The MCSTNet framework. It includes three parts: the encoder-decoder structure, the time transfer module, and the memory-contextual module. MCSTNet receives SST sequences and then outputs the predicted SST sequences along with SST fronts.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fmars-10-1151796-g002.tif"/>
</fig>
<sec id="s2_2_1">
<label>2.2.1</label>
<title>The encoder-decoder structure</title>
<p>The encoder-decoder structure is made up of a data encoder, a feature decoder, and a multi-task generation module.</p>
<sec id="s2_2_1_1">
<label>2.2.1.1</label>
<title>The data encoder</title>
<p>The convolutional block makes use of a 2D convolution that can only receive data in four dimensions, but the input data for the sequence prediction task has five dimensions ([batch, sequence length, number of input image channels, input image height, input image width]). Therefore, it is necessary to combine the batch and sequence length of the input data into one dimension, becoming four dimensions ([batch &#xd7; sequence length, number of input image channels, input image height, input image width]). The first dimension is [batch x sequence length], treating all sequences as independent images. The transformed four-dimension data are fed into the data encoder, which includes four convolutional blocks and four max-pooling operations. The convolutional block consists of a 2D convolution with a convolution kernel of 3 &#xd7; 3, a GroupNorm with a group size of 2, and a LeakyRelu. To ensure that the size of input and output data is the same, the 2D convolution uses the zero padding technique. The max-pooling operation, with a step size of 2, is employed between convolutional blocks to halve the size of feature maps.</p>
</sec>
<sec id="s2_2_1_2">
<label>2.2.1.2</label>
<title>The feature decoder</title>
<p>After the data encoder, the feature decoder is employed to decode the obtained high-level spatial sequence information. The feature decoder is made up of four convolutional blocks and four up-sampling layers with a factor of 2, which doubles the size of feature maps. The contextual feature maps obtained using the feature decoder are short on rich semantic information, while the semantic feature maps obtained using the data encoder are short on rich contextual information. The multi-scale feature maps from the data encoder and the feature decoder are connected using the memory-contextual module and the time transfer module to get the most out of the low-level contextual features and high-level semantic features of the input data, improving the quality of the predicted images.</p>
</sec>
<sec id="s2_2_1_3">
<label>2.2.1.3</label>
<title>The multi-task generation module</title>
<p>The multi-task generation module contains two sub-networks, each of which is made up of a convolutional block and a 2D convolution with a convolution kernel of 1 &#xd7; 1. Two sub-networks receive the feature maps obtained by the feature decoder to generate the predicted SST and fronts, respectively.</p>
</sec>
</sec>
<sec id="s2_2_2">
<label>2.2.2</label>
<title>The time transfer module</title>
<p>The time transfer module further extracts the temporal information from the shallow data encoder and transfers it to the deep feature decoder, enabling the transfer of the temporal information. To re-establish the temporal relationship between the image sequences, the features output from the shallow data encoder are dimensionally transformed from [batch &#xd7; sequence length, number of input image channels, input image height, input image width] to [batch, sequence length &#xd7; number of input image channels, input image height, input image width], to fuse the temporal and channel dimensions. Global self-adaptive max-pooling and global self-adaptive average pooling are performed in the channel dimension, respectively. The pooled features obtained are input to a two-layer, fully connected network with shared weights for non-linear transformation. The obtained features are added to the deep feature encoder output to obtain temporal feature information. After learning temporal feature information, the data needs to be transformed back to the shape it had before being input into the time transfer module to fuse the temporal information with the spatiotemporal features output by the deep feature decoder. The obtained features are dimensionally transformed, expanding the dimension to [batch, sequence length &#xd7; number of input image channels, input image height, input image width], and then reshaping it to [batch &#xd7; sequence length, number of input image channels, input image height, input image width]. Finally, the obtained temporal features and the deep spatial semantic features are multiplied element by element and added to achieve the transfer and fusion of temporal information.</p>
</sec>
<sec id="s2_2_3">
<label>2.2.3</label>
<title>The memory-contextual module</title>
<p>The memory-contextual module consists of ConvLSTMs with three hidden layers. The memory-contextual module learns spatiotemporal sequence information in the feature space from the shallow data encoder and transfers the learned spatiotemporal sequence information to the deep feature decoder, which learns information about object position changes in the image sequence, i.e., learns spatiotemporal sequence information about sequence changes. Specifically, the four-dimensional features output by the shallow data encoder are reshaped into five-dimensional spatiotemporal sequence features, which are fed into the memory-contextual module to learn the spatiotemporal sequence information. The obtained five-dimensional spatiotemporal sequence features are reshaped into four-dimensional features, which are concatenated with the spatiotemporal features output by the deep feature decoder to obtain the combined spatiotemporal sequence features.</p>
</sec>
<sec id="s2_2_4">
<label>2.2.4</label>
<title>Loss function</title>
<p>In this study, predictions of future SST sequence and front variation trend based on previous SST data are multi-task predictions, requiring a different loss function for each sub-network to guide the training of MCSTNet. Essentially, the SST sequence and front prediction tasks are image generation issues, i.e., it is necessary to determine whether the generated image sequence is similar to the real image sequence. In general, image similarity is measured using MSE, and the loss function is written as</p>
<disp-formula>
<label>(4)</label>
<mml:math display="block" id="M4">
<mml:mrow>
<mml:mtable columnalign="left">
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow>
<mml:msub>
<mml:mi>L</mml:mi>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>S</mml:mi>
<mml:mi>E</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>I</mml:mi>
<mml:mo>,</mml:mo>
<mml:mover accent="true">
<mml:mi>I</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mo>&#xb7;</mml:mo>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:munderover>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:munderover>
<mml:munderover>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:munderover>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mi>I</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mover accent="true">
<mml:mi>I</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
</disp-formula>
<p>In equation (4), <inline-formula>
<mml:math display="inline" id="im10">
<mml:mi>M</mml:mi>
</mml:math>
</inline-formula> denotes the number of rows of predicted image pixels; <inline-formula>
<mml:math display="inline" id="im11">
<mml:mi>N</mml:mi>
</mml:math>
</inline-formula> denotes the number of columns of predicted image pixels; <inline-formula>
<mml:math display="inline" id="im12">
<mml:mi>I</mml:mi>
</mml:math>
</inline-formula> denotes the target SST or front images; <inline-formula>
<mml:math display="inline" id="im13">
<mml:mover accent="true">
<mml:mi>I</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
</mml:math>
</inline-formula> denotes the predicted SST or front images; and <inline-formula>
<mml:math display="inline" id="im14">
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> denotes the position of each pixel. The lower the MSE value is, the more similar the two images are. MSE measures whether the predicted and target images have the same meaning in terms of corresponding pixels. Meanwhile, the feature similarity between the predicted and target images should be measured. Specifically, the predicted and target images are input into the ResNet-50 feature extractor, which uses a pre-trained model from ImageNet and weights sourced from <xref ref-type="bibr" rid="B11">He et al. (2016)</xref>, to extract features, respectively, and the features output of the last <italic>l</italic>th layer are obtained. Generally, contextual loss is employed to determine the difference between features by calculating the cosine similarity. Because contextual loss measures the overall feature similarity of images, it can promote the prediction of non-stationary information.</p>
<p>When the cosine distance is small, they can be considered similar, and conversely, they are considered dissimilar. Thus, the issue of determining whether two features are similar is transformed into the issue of minimizing the cosine similarity between feature maps of the predicted and target images. Contextual loss is written as:</p>
<disp-formula>
<label>(5)</label>
<mml:math display="block" id="M5">
<mml:mrow>
<mml:mtable columnalign="left">
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow>
<mml:msub>
<mml:mi>L</mml:mi>
<mml:mrow>
<mml:mi>C</mml:mi>
<mml:mi>X</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>p</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>l</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>c</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mtext>&#x3a6;</mml:mtext>
<mml:mi>l</mml:mi>
</mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>p</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mtext>&#x3a6;</mml:mtext>
<mml:mi>l</mml:mi>
</mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where <inline-formula>
<mml:math display="inline" id="im15">
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>p</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> represents the predicted SST or fronts, <inline-formula>
<mml:math display="inline" id="im16">
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> represents the target SST or fronts, <inline-formula>
<mml:math display="inline" id="im17">
<mml:mtext>&#x3a6;</mml:mtext>
</mml:math>
</inline-formula> represents a ResNet-50 pre-trained feature extractor, and <inline-formula>
<mml:math display="inline" id="im18">
<mml:mi>l</mml:mi>
</mml:math>
</inline-formula> represents the last <italic>l</italic>th layer of the ResNet-50 in equation (5).</p>
<p>Combining the advantages of MSE in tackling stationary information and the virtues of contextual loss in tackling non-stationary information, the overall loss function is jointly guided by two loss functions and is written as:</p>
<disp-formula>
<label>(6)</label>
<mml:math display="block" id="M6">
<mml:mrow>
<mml:mtable columnalign="left">
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow>
<mml:msub>
<mml:mi>L</mml:mi>
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>L</mml:mi>
<mml:mrow>
<mml:mi>S</mml:mi>
<mml:mi>S</mml:mi>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>&#x3bb;</mml:mi>
<mml:msub>
<mml:mi>L</mml:mi>
<mml:mrow>
<mml:mi>C</mml:mi>
<mml:mi>X</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x3bb;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:msub>
<mml:mi>L</mml:mi>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>S</mml:mi>
<mml:mi>E</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
</disp-formula>
<p>In equation (6), <inline-formula>
<mml:math display="inline" id="im20">
<mml:mi>&#x3bb;</mml:mi>
</mml:math>
</inline-formula> is an equilibrium factor to balance the two loss functions, which is usually determined based on experimentation and experience (<inline-formula>
<mml:math display="inline" id="im21">
<mml:mrow>
<mml:mi>&#x3bb;</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0.2</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> in our experiments). MCSTNet makes use of a multi-task training mode to predict SST sequence and front images, respectively. Although MCSTNet has two prediction sub-networks, i.e., the SST prediction sub-network and the front prediction sub-network, the predicted results only differ in data, so the loss functions of these two prediction sub-networks can be the same. The final loss function of MCSTNet is the sum of the loss functions of the SST prediction sub-network and the front prediction sub-network and is written as:</p>
<disp-formula>
<label>(7)</label>
<mml:math display="block" id="M7">
<mml:mrow>
<mml:mtable columnalign="left">
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow>
<mml:msub>
<mml:mi>L</mml:mi>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>C</mml:mi>
<mml:mi>S</mml:mi>
<mml:mi>T</mml:mi>
<mml:mi>N</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>L</mml:mi>
<mml:mrow>
<mml:mi>S</mml:mi>
<mml:mi>S</mml:mi>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>L</mml:mi>
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
</disp-formula>
</sec>
</sec>
</sec>
<sec id="s3">
<label>3</label>
<title>Experiment</title>
<p>In the experiments, the excellent spatiotemporal sequence prediction capability of our proposed MCSTNet is reported and compared with SOTA sequence prediction models based on the SST data. The generalization of MCSTNet is verified based on the SSS data. Furthermore, the effectiveness of each module in MCSTNet is evaluated.</p>
<sec id="s3_1">
<label>3.1</label>
<title>Experimental settings</title>
<p>In this section, we introduce the MCSTNet training process and experimental platform configuration.</p>
<sec id="s3_1_1">
<label>3.1.1</label>
<title>The training process of MCSTNet</title>
<p>In this study, MCSTNet was used for the SST sequence and front prediction tasks that required learning the stationary and non-stationary information in SST sequences. Its training process was extremely challenging, and it might be difficult to fit the network during the training process. To facilitate network fitting, MCSTNet introduced a probability of using the target SST during the training process. Specifically, MCSTNet set a decreasing probability value with increasing iteration steps, which meant that the target sequences were used to substitute the predicted sequences with a certain probability when training the memory-contextual module.</p>
<p>The training process of MCSTNet is shown in <xref ref-type="other" rid="algo1"><bold>Algorithm 1</bold></xref>. During the training process, MCSTNet received the input SST sequences of length 10 with random sampling, the target SST sequences, and the probability of using the target SST. The input SST sequences and the target SST sequences were fed into the data encoder and feature encoder to extract features, respectively. When generating each predicted SST feature, based on the probability of using the target SST, the predicted SST sequence features are substituted for the target SST sequence features. This enables the memory-contextual module to learn the spatiotemporal feature information well, making the whole network stable during training. The initial value of the probability was approaching 1.0, which decreased with increasing iteration steps, eventually decreasing to zero. When the value of this probability was zero, the whole network was predicted using all the input SST sequences. During the test, the value of this probability was always set to zero.</p>
</sec>
<sec id="s3_1_2">
<label>3.1.2</label>
<title>Experimental platform configuration</title>
<p>The experimental platform configuration is as follows. The server operating system is Ubuntu 20.04.4 LTS, with 2 physical cores and 56 logical CPU cores, 128 GB of memory, and two NVIDIA 3090 Ti graphics cards. The software development environment relies on Linux, and the development language is Python 3.7.11. The DL framework used is Pytorch 1.12.0, which is widely used for DL. It is a flexible and efficient DL framework with an underlying C++ implementation. To make a fair comparison, we set the hyperparameters for MCSTNet and other comparative methods to be consistent. A detailed list of hyperparameters for MCSTNet and other comparative methods is presented in <xref ref-type="supplementary-material" rid="SM1"><bold>Table S2</bold></xref>.</p>
<statement id="algo1">
<label>Algorithm 1 The training process of MSTNet.</label>
<p>
<preformat><bold>Require:</bold> <italic>SST<sub>x</sub>
</italic>: input SST sequences; <italic>SST<sub>y</sub>
</italic>: target <italic>SST</italic> sequences; SST<sub><italic>&#x177;</italic>
</sub>: the predicted SST sequences; <italic>Front<sub>y</sub>
</italic>: target fronts; Front<italic>&#x177;</italic>: the predicted fronts; <italic>N</italic>: epoch; <italic>B</italic>: batch size; <italic>L</italic>: the length of the predicted sequences; &#x398;: MCSTNet&#x2019;s parameter; <italic>prob</italic>: the probability of using <italic>SST<sub>y</sub>
</italic> to assist in&#xD; temporal learning during MCSTNet training; <italic>step</italic>: iteration steps; &#x3c5;, &#x3b7;: the hyperparameters of <italic>prob</italic>;&#xD;&#xD;<bold>Require:</bold> Adam optimizer: &#x3b1; = 0.9, <italic>&#x3b2;</italic> = 0.999, learning rate=0.001;&#xD;&#xD;1: Initial MCSTNet&#x2019;s parameters &#x398;;&#xD;&#xD;2: <italic>prob</italic> = &#x3c5;/(&#x3c5; + exp ((step + &#x3b7;)/&#x3c5;));&#xD;&#xD;3: <italic>step</italic> = 0;&#xD;&#xD;4: <bold>while</bold> not converged <bold>do</bold>&#xD;&#xD;5: <bold>for</bold> <italic>i</italic> = 1, 2,&#x2026;, N/B <bold>do</bold>&#xD;&#xD;6: Sample training data SST<italic><sub>x</sub>
</italic>, SST<italic><sub>y</sub>
</italic>;&#xD;&#xD;7: Input training data into MCSTNet to extract the features of <italic>SST<sub>x</sub>
</italic> and <italic>SST<sub>y</sub>
</italic>, respectively;&#xD;&#xD;8: When data are passed to the memory-contextual module in MCSTNet;&#xD;&#xD;9: <bold>for</bold> <italic>j</italic> = 1, 2,&#x2026;, <italic>L</italic> <bold>do</bold>&#xD;&#xD;10: The temporary variable <italic>temp</italic> &#x3f5; [0, 1];&#xD;&#xD;11: <bold>if</bold> temp &lt; <italic>prob</italic> <bold>then</bold>&#xD;&#xD;12: Employ the features of <italic>SST<sub>y</sub>
</italic>[<italic>j]</italic> to replace the predicted features of SST<italic><sub>x</sub>
</italic>[<italic>j</italic>] and pass them to the next layer of the network;&#xD;&#xD;13: <bold>else</bold>&#xD;&#xD;14: Employ the features of <italic>SST<sub>x</sub>
</italic>[<italic>j</italic>] as the predicted features and pass them into the next layer of the network;&#xD;&#xD;15. <bold>end if</bold>&#xD;&#xD;16. <bold>end for</bold>&#xD;&#xD;17: When data are passed to the time transfer module in MCSTNet, fuse the temporal features of the feature decoder and pass them into the next layer of the network;&#xD;&#xD;18: Guide MCSTNet training by equation (7);&#xD;&#xD;19: Update MCSTNet&#x2019;s parameters &#x398;;&#xD;&#xD;20: Obtain <italic>SST<sub><italic>&#x177;</italic>
</sub>
</italic> and <italic>Front<sub><italic>&#x177;</italic>
</sub>
</italic>;&#xD;&#xD;21: step += 1;&#xD;&#xD;22: <bold>end for</bold>&#xD;&#xD;23: <bold>end while</bold>&#xD;</preformat>
</p>
</statement>
</sec>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Comparative approaches</title>
<p>In our experiments, the SOTA models incorporated physical constraints-based PhyDNet (<xref ref-type="bibr" rid="B9">Guen and Thome, 2020</xref>), RNNs-based ConvLSTM (<xref ref-type="bibr" rid="B35">Shi et&#xa0;al., 2015</xref>), PredRNN (<xref ref-type="bibr" rid="B41">Wang et&#xa0;al., 2017</xref>), PredRNNv2 (<xref ref-type="bibr" rid="B42">Wang et&#xa0;al., 2021</xref>), as well as MIM (<xref ref-type="bibr" rid="B43">Wang et&#xa0;al., 2019</xref>), and CNNs-based SimVP (<xref ref-type="bibr" rid="B5">Gao et&#xa0;al., 2022</xref>), which were used as comparative approaches to compare with our proposed MCSTNet.</p>
<p>PhyDNet introduces a recurrent physical cell to model physical dynamics for discretizing the restriction of the partial differential equation (PDE) and is a global sequence to sequence DL-based approach. It is the first study to achieve good predictive performance by combining physical constraints with DL.</p>
<p>ConvLSTM innovatively combines CNN and LSTM for predicting sequences of images, enabling the capture of both spatial and temporal sequences, and has been applied to a real-life radar echo dataset for precipitation nowcasting. Each layer of the CNN structure encodes spatial information, while the memory units encode temporal information independently.</p>
<p>PredRNN is proposed by <xref ref-type="bibr" rid="B41">Wang et&#xa0;al. (2017)</xref>, a novel recurrent network in which two memory cells are used to extract the variance of spatiotemporal information, improving the predictive power of spatiotemporal sequences. <xref ref-type="bibr" rid="B42">Wang et&#xa0;al. (2021)</xref> propose an enhancement to the PredRNN structure, dubbed PredRNNv2, which is expanded to predict action-conditioned video. During the training period, reverse scheduled sampling is employed to learn the dependencies between jumpy frames by arbitrarily hiding the training data and changing with certain probabilities.</p>
<p>Memory in memory (MIM) is proposed by <xref ref-type="bibr" rid="B43">Wang et&#xa0;al. (2019)</xref>, an upgraded form of LSTM in which two inbuilt long short-term memories substitute for the forget gate of LSTM. MIM, with its two cascaded and self-renewed memory structures, makes use of differential information between neighboring recurrent states to model for nearly stationary and non-stationary spatiotemporal features. Higher-order non-stationarity can be dealt with by stacking MIM structures.</p>
<p>SimVP is proposed by <xref ref-type="bibr" rid="B5">Gao et&#xa0;al. (2022)</xref>, which only uses CNN structure and simple MSE loss for video prediction. SimVP learns the spatial information of the images using a normal CNN structure and learns the temporal information in the video sequences using an inception structure-based CNN, enabling the prediction of spatiotemporal information in video sequences.</p>
</sec>
<sec id="s3_3">
<label>3.3</label>
<title>Experimental results</title>
<p>In this section, to test the capability of MCSTNet to predict sequences and transfer spatiotemporal information, we conducted preliminary experiments on the Moving MNIST dataset (see <xref ref-type="supplementary-material" rid="SM1"><bold>Figures S1</bold></xref>, <xref ref-type="supplementary-material" rid="SM1"><bold>S2</bold></xref>; <xref ref-type="supplementary-material" rid="SM1"><bold>Table S1</bold></xref>). Moreover, we conducted comparison experiments to verify the excellent spatiotemporal sequence prediction capability of MCSTNet on the SST data. The SSS data were used to verify the r eliability and generalizability of MCSTNet, and ablation studies were conducted to demonstrate the effectiveness of each module in MCSTNet.</p>
<sec id="s3_3_1">
<label>3.3.1</label>
<title>SST sequence and front predictions based on the SST data</title>
<p>To demonstrate the advantages of the MCSTNet framework in predicting SST sequences and fronts, we compared MCSTNet to comparative approaches for both quantitative and qualitative assessments based on the SST data.</p>
<sec id="s3_3_1_1">
<label>3.3.1.1</label>
<title>Qualitative assessment</title>
<p>
<xref ref-type="fig" rid="f3"><bold>Figure&#xa0;3</bold></xref> displays the predicted results of the future 10-day SST sequences and front images using MCSTNet based on the previous 10-day satellite SST sequences, and only the images for the even-numbered days are plotted. The time horizon of SST images and front images is 20 days, from 6 March, to 25 March, 2015, with a latitude range of 39.3&#xb0;N&#x2013;42.5&#xb0;N and a longitude range of 145.2&#xb0;E&#x2013;148.4&#xb0;E. The predicted front images corresponding to SST sequences in the future 10 days were obtained by MCSTNet through learning the SST variation laws in the previous 10 days. At the beginning of the prediction (T = 11 and 13), the predicted SST sequences and fronts are similar to the target SST and fronts. As the prediction time increases, although the predicted SST sequences and fronts deviate somewhat from the target SST and fronts, the overall variation trend of the SST sequences and fronts is consistent with the target. This is due to the large time span of the SST data, which results in the large variability of the SST data. The variation trend of SST sequences and fronts predicted by MCSTNet is reasonable and authentic. It reveals that MCSTNet is effective for SST sequence prediction and front variation trend prediction.</p>
<fig id="f3" position="float">
<label>Figure&#xa0;3</label>
<caption>
<p>Satellite and predicted SST sequences, physics-based and predicted fronts at five successive sequences with 1-day intervals. The previous 10-day satellite SST sequences are input into MCSTNet to predict future 10-day SST sequences and fronts. <bold>(A)</bold> Input SST sequences on 6, 8, 10, 12, and 14 March, 2015. <bold>(B)</bold> Target SST, <bold>(C)</bold> the predicted SST, <bold>(D)</bold> target frons, and <bold>(E)</bold> the predicted fronts, on 16, 18, 20, 22, and 24 March, 2015.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fmars-10-1151796-g003.tif"/>
</fig>
<p>
<xref ref-type="fig" rid="f4"><bold>Figure&#xa0;4</bold></xref> displays the predicted results of the future 10-day SST sequences using MCSTNet and comparative approaches based on the previous 10-day satellite SST sequences, and only the images for the even-numbered days are plotted. From the thirdline, each line displays the future 10-day SST sequences predicted by MCSTNet, SimVP, PhyDNet, PredRNNv2, MIM, PredRNN, and ConvLSTM, based on SST variation laws, respectively. Because of the normalization technique used during the training on the SST data, some of the training data was close to zero, making PredRNNv2 and MIM difficult to train and producing poorer visualization results. The visual results show that CNNs-based models predict significantly better than those only relying on RNNs, indicating that CNNs are helpful for image detail processing. Compared to SimVP, which uses only the CNN structure, MCSTNet predicts SST sequences better. In particular, medium-term SST sequences predicted by MCSTNet are more accurate. Compared to <xref ref-type="fig" rid="f4"><bold>Figure&#xa0;4E</bold></xref>, MCSTNet predicts significantly better. This is because it is difficult to use a certain physical model to constrain non-stationary SST data, resulting in PhyDNet predicting SST worse. RNNs-based models, including PredRNN, PredRNNv2, MIM, and ConvLSTM, are good at predicting data with a stable and constant shape in the image but perform poorly in predicting image details on non-stationary SST data, leading to poor quality of the predicted images. MCSTNet combines the detailed learning ability of the CNN module with the variation pattern learning ability of the RNN module, which takes into account the non-stationary information and detailed features of SST sequences, and obtains more authentic and reasonable predicted results than comparative approaches. It demonstrates that MCSTNet is the most appropriate approach for the SST sequence prediction task.</p>
<fig id="f4" position="float">
<label>Figure&#xa0;4</label>
<caption>
<p>Satellite and predicted SST sequences at five successive sequences with 1-day intervals by MCSTNet and the compared approaches. The previous 10-day satellite SST sequences are employed to predict future 10-day SST sequences. <bold>(A)</bold> Input SST sequences on 6, 8, 10, 12, and 14 March, 2015. <bold>(B)</bold> Target SST sequences, and the predicted SST sequences by <bold>(C)</bold> MCSTNet, <bold>(D)</bold> SimVP, <bold>(E)</bold> PhyDNet, <bold>(F)</bold> PredRNNv2, <bold>(G)</bold> MIM, <bold>(H)</bold> PredRNN, as well as <bold>(I)</bold> ConvLSTM on 16, 18, 20, 22, and 24 March, 2015.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fmars-10-1151796-g004.tif"/>
</fig>
<p>
<xref ref-type="fig" rid="f5"><bold>Figure&#xa0;5</bold></xref> displays the 1-day results as well as the SST and front error images between target and prediction. The target SST and front denote the future 1-day SST and physical-based fronts, respectively, and are on 27 October, 2014, and 13 February, 2015. MCSTNet received the previous 10-day satellite SST sequences to predict the future 10-day SST sequences and front images. The SST and fronts error images were obtained by taking the absolute value of the differences between the target and predicted SST and the target and predicted fronts. For each line in the error images, the maximum and average errors are: the first line with a maximum error of 1.572&#xb0;C and an average error of 0.350&#xb0;C; the second line with a maximum error of 0.282&#xb0;C/km and an average error of 0.039&#xb0;C/km; the third line with a maximum error of 1.157&#xb0;C and an average error of 0.197&#xb0;C; and the fourth line with a maximum error of 0.302&#xb0;C/km and an average error of 0.040&#xb0;C/km. From the statistical data and visual results, despite the relatively high maximum error of the predicted data, the relatively small average error indicates that the overall prediction performance of MCSTNet is relatively stable. It indicates that MCSTNet is appropriate for SST sequence and front prediction tasks.</p>
<fig id="f5" position="float">
<label>Figure&#xa0;5</label>
<caption>
<p>Satellite and predicted SST sequences, physics-based and predicted fronts, and the error between them. The previous 10-day satellite SST sequences are employed to predict future 10-day SST sequences and fronts, but only one day is plotted. From left to right, each column represents target SST sequences and fronts, the predicted SST sequences and fronts by MCSTNet, and the SST and front error between target and predicted by MCSTNet (absolute value). There are two examples of SST and front predictions: <bold>(A, B)</bold> show images from 26 October, 2014 to 27 October, 2014, while <bold>(C, D)</bold> show images from 12 February, 2015 to 13 February, 2015.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fmars-10-1151796-g005.tif"/>
</fig>
<p>We predicted not only short- and medium-term SST sequences and fronts but also long-term SST sequences and fronts. <xref ref-type="fig" rid="f6"><bold>Figure&#xa0;6</bold></xref> displays the predicted results of the future 30-day SST sequences using MCSTNet and comparative approaches based on the previous 30-day satellite SST sequences, and only the images at five successive sequences with 5-day intervals are plotted. From the third line, each line displays the future 30-day SST sequences predicted by MCSTNet, SimVP, PredRNNv2, MIM, PredRNN, and ConvLSTM, based on SST variation laws, respectively. All models show that the performance of the short-term prediction is better than that of the medium-term, and that of the medium-term is better than that of the long-term. As the predicted time increases, the authenticity and precision of the predicted SST sequences by all models decrease. However, our proposed MCSTNet predicts higher-quality SST sequences than the comparison method, with clearer images, richer detail, and more accurate SST variation patterns. This demonstrates that MCSTNet is beneficial for learning fine-grained, long-term spatiotemporal information about SST sequence variation laws.</p>
<fig id="f6" position="float">
<label>Figure&#xa0;6</label>
<caption>
<p>Satellite and predicted SST sequences at five successive sequences with 1-day intervals by MCSTNet and the compared approaches. The previous 30-day satellite SST sequences are used to predict future 30-day SST sequences. <bold>(A)</bold> Input SST sequences on 16, 22, 28 October, and 3, 9 November, 2014. <bold>(B)</bold> Target SST sequences, and the predicted SST sequences by <bold>(C)</bold> MCSTNet, <bold>(D)</bold> SimVP, <bold>(E)</bold> PhyDNet, <bold>(F)</bold> PredRNNv2, <bold>(G)</bold> MIM, <bold>(H)</bold> PredRNN, as well as <bold>(I)</bold> ConvLSTM on 15, 21, 27 November, and 3, 9 December, 2014.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fmars-10-1151796-g006.tif"/>
</fig>
</sec>
<sec id="s3_3_1_2">
<label>3.3.1.2</label>
<title>Quantitative assessment</title>
<p>To quantificationally evaluate the authenticity and quality of the predicted SST and front images by MCSTNet and comparative approaches, the MSE (<xref ref-type="bibr" rid="B32">Prasad and Rao, 1990</xref>), mean absolute error (MAE) (<xref ref-type="bibr" rid="B45">Willmott and Matsuura, 2005</xref>), structural similarity (SSIM) (<xref ref-type="bibr" rid="B40">Wang et al., 2004</xref>) and peak signal-to-noise ratio (PSNR) (<xref ref-type="bibr" rid="B14">Huynh-Thu and Ghanbari, 2008</xref>) were selected as evaluation indices. Specifically, the deviation between the predicted and target images was measured by MSE and MAE. MSE evaluates the ability of models to measure abnormal data, while MAE evaluates the ability of models to measure most data.  MAE is calculated as:</p>
<disp-formula>
<label>(8)</label>
<mml:math display="block" id="M8">
<mml:mrow>
<mml:mtable columnalign="left">
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>A</mml:mi>
<mml:mi>E</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>I</mml:mi>
<mml:mo>,</mml:mo>
<mml:mover accent="true">
<mml:mi>I</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mo>&#xb7;</mml:mo>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:munderover>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:munderover>
<mml:munderover>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:munderover>
<mml:mo>|</mml:mo>
<mml:mi>I</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mover accent="true">
<mml:mi>I</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>|</mml:mo>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where <inline-formula>
<mml:math display="inline" id="im22">
<mml:mi>I</mml:mi>
</mml:math>
</inline-formula> is the target images, <inline-formula>
<mml:math display="inline" id="im23">
<mml:mover accent="true">
<mml:mi>I</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
</mml:math>
</inline-formula> is the predicted images, <inline-formula>
<mml:math display="inline" id="im24">
<mml:mi>M</mml:mi>
</mml:math>
</inline-formula> is the number of rows of images, and <inline-formula>
<mml:math display="inline" id="im25">
<mml:mi>N</mml:mi>
</mml:math>
</inline-formula> is the number of columns of images in equation (8).</p>
<p>The structural similarity between the target and predicted images was measured by SSIM, which measured image similarity in terms of luminance, contrast, and structure, respectively. Specifically, the comparison of luminance compares the local variations in brightness or intensity of pixels between the two images by calculating the standard deviation of the pixel intensities within a small region of the image. The structure comparison compares the spatial patterns of the pixels in each image by calculating the correlation between different regions of the images. SSIM is written as:</p>
<disp-formula>
<label>(9)</label>
<mml:math display="block" id="M9">
<mml:mrow>
<mml:mtable columnalign="left">
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mi>S</mml:mi>
<mml:mi>S</mml:mi>
<mml:mi>I</mml:mi>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>I</mml:mi>
<mml:mo>,</mml:mo>
<mml:mover accent="true">
<mml:mi>I</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>I</mml:mi>
<mml:mo>,</mml:mo>
<mml:mover accent="true">
<mml:mi>I</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:msup>
<mml:mo stretchy="false">]</mml:mo>
<mml:mi>&#x3b1;</mml:mi>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
</mml:mrow>
<mml:mi>c</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>I</mml:mi>
<mml:mo>,</mml:mo>
<mml:mover accent="true">
<mml:mi>I</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:msup>
<mml:mo stretchy="false">]</mml:mo>
<mml:mi>&#x3b2;</mml:mi>
</mml:msup>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>I</mml:mi>
<mml:mo>,</mml:mo>
<mml:mover accent="true">
<mml:mi>I</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
<mml:mi>&#x3b3;</mml:mi>
</mml:msup>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow/>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:msub>
<mml:mi>&#x3bc;</mml:mi>
<mml:mi>I</mml:mi>
</mml:msub>
<mml:msub>
<mml:mi>&#x3bc;</mml:mi>
<mml:mover accent="true">
<mml:mi>I</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c3;</mml:mi>
<mml:mrow>
<mml:mi>I</mml:mi>
<mml:mover accent="true">
<mml:mi>I</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
</mml:mrow>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>&#x3bc;</mml:mi>
<mml:mi>I</mml:mi>
<mml:mn>2</mml:mn>
</mml:msubsup>
<mml:mo>+</mml:mo>
<mml:msubsup>
<mml:mi>&#x3bc;</mml:mi>
<mml:mover accent="true">
<mml:mi>I</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
<mml:mn>2</mml:mn>
</mml:msubsup>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>&#x3c3;</mml:mi>
<mml:mi>I</mml:mi>
<mml:mn>2</mml:mn>
</mml:msubsup>
<mml:mo>+</mml:mo>
<mml:msubsup>
<mml:mi>&#x3c3;</mml:mi>
<mml:mover accent="true">
<mml:mi>I</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
<mml:mn>2</mml:mn>
</mml:msubsup>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where <inline-formula>
<mml:math display="inline" id="im26">
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>I</mml:mi>
<mml:mo>,</mml:mo>
<mml:mover accent="true">
<mml:mi>I</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> represents the luminance comparison between the target and predicted images; <inline-formula>
<mml:math display="inline" id="im27">
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>I</mml:mi>
<mml:mo>,</mml:mo>
<mml:mover accent="true">
<mml:mi>I</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> represents the contrast comparison between the target and predicted images; <inline-formula>
<mml:math display="inline" id="im28">
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>I</mml:mi>
<mml:mo>,</mml:mo>
<mml:mover accent="true">
<mml:mi>I</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> represents the structure comparison between the target and predicted images; <inline-formula>
<mml:math display="inline" id="im29">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3bc;</mml:mi>
<mml:mi>I</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula>
<mml:math display="inline" id="im30">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3bc;</mml:mi>
<mml:mover accent="true">
<mml:mi>I</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are the mean values of <inline-formula>
<mml:math display="inline" id="im31">
<mml:mi>I</mml:mi>
</mml:math>
</inline-formula>, <inline-formula>
<mml:math display="inline" id="im32">
<mml:mover accent="true">
<mml:mi>I</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
</mml:math>
</inline-formula>, respectively; <inline-formula>
<mml:math display="inline" id="im33">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c3;</mml:mi>
<mml:mi>I</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula>
<mml:math display="inline" id="im34">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c3;</mml:mi>
<mml:mover accent="true">
<mml:mi>I</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, and <inline-formula>
<mml:math display="inline" id="im35">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c3;</mml:mi>
<mml:mrow>
<mml:mi>I</mml:mi>
<mml:mover accent="true">
<mml:mi>I</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> denote the covariances of <inline-formula>
<mml:math display="inline" id="im36">
<mml:mi>I</mml:mi>
</mml:math>
</inline-formula>, <inline-formula>
<mml:math display="inline" id="im37">
<mml:mover accent="true">
<mml:mi>I</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
</mml:math>
</inline-formula>, as well as <inline-formula>
<mml:math display="inline" id="im38">
<mml:mrow>
<mml:mi>I</mml:mi>
<mml:mover accent="true">
<mml:mi>I</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula>, respectively; <inline-formula>
<mml:math display="inline" id="im39">
<mml:mi>&#x3b1;</mml:mi>
</mml:math>
</inline-formula>, <inline-formula>
<mml:math display="inline" id="im40">
<mml:mi>&#x3b2;</mml:mi>
</mml:math>
</inline-formula>, and <inline-formula>
<mml:math display="inline" id="im41">
<mml:mi>&#x3b3;</mml:mi>
</mml:math>
</inline-formula> are hyper-parameters; and constants <inline-formula>
<mml:math display="inline" id="im42">
<mml:mrow>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> as well as <inline-formula>
<mml:math display="inline" id="im43">
<mml:mrow>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are used to avoid zero denominator issue in equation (9).</p>
<p>The quality of the maximum possible value of the target and predicted images was assessed by PSNR. It is calculated as <inline-formula>
<mml:math display="inline" id="im44">
<mml:mrow>
<mml:mn>10</mml:mn>
<mml:mo>&#xb7;</mml:mo>
<mml:msub>
<mml:mi>log</mml:mi>
<mml:mn>10</mml:mn>
</mml:msub>
<mml:mfrac>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>A</mml:mi>
<mml:msubsup>
<mml:mi>X</mml:mi>
<mml:mi>I</mml:mi>
<mml:mn>2</mml:mn>
</mml:msubsup>
</mml:mrow>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>S</mml:mi>
<mml:mi>E</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</inline-formula>, where <inline-formula>
<mml:math display="inline" id="im45">
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>A</mml:mi>
<mml:msubsup>
<mml:mi>X</mml:mi>
<mml:mi>I</mml:mi>
<mml:mn>2</mml:mn>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> is the maximum pixel value of images. Higher image quality is represented by smaller MSE and MAE values, as well as bigger SSIM and PSNR values.</p>
<p>
<xref ref-type="table" rid="T1"><bold>Tables&#xa0;1</bold></xref>, <xref ref-type="table" rid="T2"><bold>2</bold></xref> show the values of MSE, MAE, PSNR, and SSIM obtained by MCSTNet and the compared approaches concerning 10- and 30-day SST sequence prediction on the SST data, respectively. Compared to the other models, our proposed MCSTNet obtains the best values on all evaluation indices for both the 10- and 30-day prediction results. As the predicted time increases, the authenticity and accuracy of the predicted SST sequences decrease for all models. The values for both MSE and MAE are meaningfully lower than those of comparative approaches, and those for PSNR and SSIM are slightly higher than those of comparative approaches. It reveals that MCSTNet is superior to comparative approaches, and the predicted SST sequences by MCSTNet are the most authentic and reasonable than comparative approaches.</p>
<table-wrap id="T1" position="float">
<label>Table&#xa0;1</label>
<caption>
<p>Comparison of the results of 10-day SST spatiotemporal sequence prediction between MCSTNet and other approaches in terms of MSE, MAE, PSNR and SSIM on the SST data.</p>
</caption>
<table frame="hsides">
<thead>
<tr>
<th valign="top" align="center">Models</th>
<th valign="top" align="center">MSE <inline-formula>
<mml:math display="inline" id="im46">
<mml:mo>&#x2193;</mml:mo>
</mml:math>
</inline-formula>
</th>
<th valign="top" align="center">MAE <inline-formula>
<mml:math display="inline" id="im47">
<mml:mo>&#x2193;</mml:mo>
</mml:math>
</inline-formula>
</th>
<th valign="top" align="center">PSNR <inline-formula>
<mml:math display="inline" id="im48">
<mml:mo>&#x2191;</mml:mo>
</mml:math>
</inline-formula>
</th>
<th valign="top" align="center">SSIM <inline-formula>
<mml:math display="inline" id="im49">
<mml:mo>&#x2191;</mml:mo>
</mml:math>
</inline-formula>
</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="center">ConvLSTM</td>
<td valign="top" align="center">16.2</td>
<td valign="top" align="center">193.8</td>
<td valign="top" align="center">29.6</td>
<td valign="top" align="center">0.880</td>
</tr>
<tr>
<td valign="top" align="center">PredRNN</td>
<td valign="top" align="center">12.8</td>
<td valign="top" align="center">170.1</td>
<td valign="top" align="center">30.2</td>
<td valign="top" align="center">0.893</td>
</tr>
<tr>
<td valign="top" align="center">MIM</td>
<td valign="top" align="center">10.9</td>
<td valign="top" align="center">155.9</td>
<td valign="top" align="center">30.5</td>
<td valign="top" align="center">0.893</td>
</tr>
<tr>
<td valign="top" align="center">PredRNNv2</td>
<td valign="top" align="center">10.4</td>
<td valign="top" align="center">151.8</td>
<td valign="top" align="center">30.5</td>
<td valign="top" align="center">0.890</td>
</tr>
<tr>
<td valign="top" align="center">PhyDNet</td>
<td valign="top" align="center">12.0</td>
<td valign="top" align="center">166.2</td>
<td valign="top" align="center">30.1</td>
<td valign="top" align="center">0.892</td>
</tr>
<tr>
<td valign="top" align="center">SimVP</td>
<td valign="top" align="center">12.6</td>
<td valign="top" align="center">172.1</td>
<td valign="top" align="center">29.9</td>
<td valign="top" align="center">0.885</td>
</tr>
<tr>
<td valign="top" align="center">MCSTNet</td>
<td valign="top" align="center"><bold>9.8</bold>
</td>
<td valign="top" align="center"><bold>144.5</bold>
</td>
<td valign="top" align="center"><bold>31.0</bold>
</td>
<td valign="top" align="center"><bold>0.908</bold>
</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>The best results are highlighted in boldface, and the down/up arrow indicates the lower the better/the higher the better.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<table-wrap id="T2" position="float">
<label>Table&#xa0;2</label>
<caption>
<p>Comparison of the results of 30-day SST spatiotemporal sequence prediction between MCSTNet and other methods in terms of MSE, MAE, PSNR and SSIM on the SST data.</p>
</caption>
<table frame="hsides">
<thead>
<tr>
<th valign="top" align="center">Models</th>
<th valign="top" align="center">MSE <inline-formula>
<mml:math display="inline" id="im50">
<mml:mo>&#x2193;</mml:mo>
</mml:math>
</inline-formula>
</th>
<th valign="top" align="center">MAE <inline-formula>
<mml:math display="inline" id="im51">
<mml:mo>&#x2193;</mml:mo>
</mml:math>
</inline-formula>
</th>
<th valign="top" align="center">PSNR <inline-formula>
<mml:math display="inline" id="im52">
<mml:mo>&#x2191;</mml:mo>
</mml:math>
</inline-formula>
</th>
<th valign="top" align="center">SSIM <inline-formula>
<mml:math display="inline" id="im53">
<mml:mo>&#x2191;</mml:mo>
</mml:math>
</inline-formula>
</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="center">ConvLSTM</td>
<td valign="top" align="center">24.3</td>
<td valign="top" align="center">239.9</td>
<td valign="top" align="center">29.0</td>
<td valign="top" align="center">0.844</td>
</tr>
<tr>
<td valign="top" align="center">PredRNN</td>
<td valign="top" align="center">23.9</td>
<td valign="top" align="center">239.3</td>
<td valign="top" align="center">29.1</td>
<td valign="top" align="center">0.859</td>
</tr>
<tr>
<td valign="top" align="center">MIM</td>
<td valign="top" align="center">19.0</td>
<td valign="top" align="center">217.7</td>
<td valign="top" align="center">29.1</td>
<td valign="top" align="center">0.861</td>
</tr>
<tr>
<td valign="top" align="center">PredRNNv2</td>
<td valign="top" align="center">19.7</td>
<td valign="top" align="center">215.4</td>
<td valign="top" align="center">29.2</td>
<td valign="top" align="center">0.859</td>
</tr>
<tr>
<td valign="top" align="center">PhyDNet</td>
<td valign="top" align="center">20.6</td>
<td valign="top" align="center">226.4</td>
<td valign="top" align="center">29.1</td>
<td valign="top" align="center">0.868</td>
</tr>
<tr>
<td valign="top" align="center">SimVP</td>
<td valign="top" align="center">22.7</td>
<td valign="top" align="center">242.1</td>
<td valign="top" align="center">28.9</td>
<td valign="top" align="center">0.850</td>
</tr>
<tr>
<td valign="top" align="center">MCSTNet</td>
<td valign="top" align="center"><bold>16.9</bold>
</td>
<td valign="top" align="center"><bold>203.7</bold>
</td>
<td valign="top" align="center"><bold>29.3</bold>
</td>
<td valign="top" align="center"><bold>0.868</bold>
</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>The best results are highlighted in boldface. The down/up arrow indicates the lower the better/the higher the better.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</sec>
</sec>
<sec id="s3_3_2">
<label>3.3.2</label>
<title>Verification of the generalization of MCSTNet based on the SSS data</title>
<p>To verify the generalization of our proposed MCSTNet, in addition to the SST data, we also made use of MCSTNet to predict SSS sequences and fronts on the SSS data.</p>
<p>
<xref ref-type="fig" rid="f7"><bold>Figure&#xa0;7</bold></xref> depicts the predicted future 10-day SSS sequences and front images with even-numbered days by MCSTNet, based on previous 10-day satellite SSS sequences. The SSS sequences and front images have a temporal window of 20 days, from 6 March, to 25 March, 2015, with latitudes ranging from 35.6&#xb0;N to 40.8&#xb0;N and longitudes ranging from 142.0&#xb0;E to 147.2&#xb0;E. The fronts were predicted by MCSTNet based on learning the SSS variation laws in the previous 10 days. Compared to the target, the predicted SSS sequences and fronts using MCSTNet are authentic and reasonable, which demonstrates that MCSTNet is effective and appropriate for SSS sequence prediction and front variation tendency prediction tasks.</p>
<fig id="f7" position="float">
<label>Figure&#xa0;7</label>
<caption>
<p>Satellite and predicted SSS sequences, physics-based and predicted fronts at five successive sequences with 1-day intervals. The previous 10-day satellite SSS sequences are employed to predict future 10-day SSS sequences and fronts. <bold>(A)</bold> Input SSS sequences on 6, 8, 10, 12, and 14 March, 2015. <bold>(B)</bold> Target SSS, <bold>(C)</bold> the predicted SSS, <bold>(D)</bold> target fronts, and <bold>(E)</bold> the predicted fronts on 16, 18, 20, 22, and 24 March, 2015.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fmars-10-1151796-g007.tif"/>
</fig>
<p>
<xref ref-type="fig" rid="f8"><bold>Figure&#xa0;8</bold></xref> shows the 1-day images as well as the SSS and front error images. MCSTNet predicted the future 10-day SSS and front images based on the previous 10-day SSS data. We calculated the absolute value of the SSS and front errors as the difference between the target and the predicted SSS and front using MCSTNet. SSS and front errors are mostly found in the locations of extreme SSS as well as the front and are numerically small. The results demonstrate that MCSTNet is accurate for SSS sequence and front prediction tasks.</p>
<fig id="f8" position="float">
<label>Figure&#xa0;8</label>
<caption>
<p>Satellite and predicted SSS sequences, physics-based and predicted fronts, and the error between them. The previous 10-day satellite SSS sequences are used to predict future 10-day SSS sequences and fronts, but only one day is plotted. From left to right, each column represents target SSS sequences and fronts, the predicted SSS sequences and fronts by MCSTNet, and the SSS and front error between target and predicted by MCSTNet (absolute value). There are two examples of SSS and front predictions: <bold>(A, B)</bold> show images from 26 October, 2014 to 27 October, 2014, while <bold>(C, D)</bold> show images from 12 February, 2015 to 13 February, 2015.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fmars-10-1151796-g008.tif"/>
</fig>
<p>The values of MSE, MAE, PSNR, and SSIM of SSS sequence prediction images obtained by MCSTNet and comparative approaches are shown in <xref ref-type="table" rid="T3"><bold>Table&#xa0;3</bold></xref>. The best values on all evaluation indices for the 10-day SSS sequence prediction images are obtained by MCSTNet compared to the other models. MSE and MAE values are significantly lower than those of comparative approaches, and PSNR and SSIM values are somewhat higher. This demonstrates that MCSTNet outperforms comparative approaches, and using MCSTNet to predict SSS sequences is reasonable.</p>
<table-wrap id="T3" position="float">
<label>Table&#xa0;3</label>
<caption>
<p>Comparison of the results of 10-day SSS sequence prediction images between MCSTNet and other approaches in terms of MSE, MAE, PSNR and SSIM on the SSS data.</p>
</caption>
<table frame="hsides">
<thead>
<tr>
<th valign="top" align="center">Models</th>
<th valign="top" align="center">MSE <inline-formula>
<mml:math display="inline" id="im54">
<mml:mo>&#x2193;</mml:mo>
</mml:math>
</inline-formula>
</th>
<th valign="top" align="center">MAE <inline-formula>
<mml:math display="inline" id="im55">
<mml:mo>&#x2193;</mml:mo>
</mml:math>
</inline-formula>
</th>
<th valign="top" align="center">PSNR <inline-formula>
<mml:math display="inline" id="im56">
<mml:mo>&#x2191;</mml:mo>
</mml:math>
</inline-formula>
</th>
<th valign="top" align="center">SSIM <inline-formula>
<mml:math display="inline" id="im57">
<mml:mo>&#x2191;</mml:mo>
</mml:math>
</inline-formula>
</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="center">ConvLSTM</td>
<td valign="top" align="center">0.656</td>
<td valign="top" align="center">42.2</td>
<td valign="top" align="center">38.0</td>
<td valign="top" align="center">0.990</td>
</tr>
<tr>
<td valign="top" align="center">PredRNN</td>
<td valign="top" align="center">0.649</td>
<td valign="top" align="center">39.3</td>
<td valign="top" align="center">38.1</td>
<td valign="top" align="center">0.989</td>
</tr>
<tr>
<td valign="top" align="center">MIM</td>
<td valign="top" align="center">0.576</td>
<td valign="top" align="center">37.2</td>
<td valign="top" align="center">38.8</td>
<td valign="top" align="center">0.989</td>
</tr>
<tr>
<td valign="top" align="center">PredRNNv2</td>
<td valign="top" align="center">0.653</td>
<td valign="top" align="center">40.7</td>
<td valign="top" align="center">38.1</td>
<td valign="top" align="center">0.990</td>
</tr>
<tr>
<td valign="top" align="center">SimVP</td>
<td valign="top" align="center">0.426</td>
<td valign="top" align="center">33.4</td>
<td valign="top" align="center">39.9</td>
<td valign="top" align="center">0.990</td>
</tr>
<tr>
<td valign="top" align="center">MCUNet</td>
<td valign="top" align="center"><bold>0.380</bold>
</td>
<td valign="top" align="center"><bold>29.3</bold>
</td>
<td valign="top" align="center"><bold>40.6</bold>
</td>
<td valign="top" align="center"><bold>0.991</bold>
</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>The best results are highlighted in boldface. The down/up arrow indicates the lower the better/the higher the better.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="s3_3_3">
<label>3.3.3</label>
<title>Ablation studies</title>
<p>To demonstrate the effectiveness of the encoder-decoder structure, the memory-contextual module (MCM), and the time transfer module (TTM) in MCSTNet, ablation studies were done.</p>
<p>
<xref ref-type="fig" rid="f9"><bold>Figure&#xa0;9</bold></xref> shows the predicted future 10-day front images with odd-numbered days using each module in MCSTNet based on the previous 10-day satellite SST sequences. From the third line, each line displays the future 10-day front predicted using MCSTNet, MCSTNet without TTM, MCSTNet without MCM, and the encoder-decoder structure. The results demonstrate that the model using only the encoder-decoder structure is basically unable to predict the trend of fronts. The performance of the MCSTNet without MCM model, i.e., combining TTM with the encoder-decoder structure, outperforms the encoder-decoder structure. This is because TTM transfers temporal information and fuses low-level, fine-grained temporal information with high-level semantic information to improve medium- and long-term prediction precision. MCSTNet without TTM, i.e., combining MCM with the encoder-decoder structure, achieves good predictive results. This is because MCM can fuse low-level spatiotemporal information with high-level semantic information, and the quality of predicted fronts decreases with increasing prediction time, while the encoder-decoder structure extracts fine-grained features to precisely complement this shortcoming. In contrast to the aforementioned models, the MCSTNet model, which uses all modules, including the MCM, TTM, and encoder-decoder structure, not only predicts the front variation trend but also generates high-quality fronts. This is important for the prediction of fronts and illustrates that our proposed encoder-decoder structure, MCM, and TTM, help with the front prediction task.</p>
<fig id="f9" position="float">
<label>Figure&#xa0;9</label>
<caption>
<p>Visualization of the benefit of each module in MCSTNet on the SST data. The previous 10-day satellite SST sequences are used to predict future 10-day fronts. <bold>(A)</bold> Input SST sequences on 17, 19, 21, 23, and 25 February, 2014. <bold>(B)</bold> Target fronts, and the predicted fronts by <bold>(C)</bold> MCSTNet, <bold>(D)</bold> MCSTNet without TTM, <bold>(E)</bold> MCSTNet without MCM, as well as <bold>(F)</bold> encoder-decoder structure on 27 February, and 1, 3, 5, 7 March, 2014.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fmars-10-1151796-g009.tif"/>
</fig>
<p>To further investigate the effectiveness of each module in MCSTNet for the front prediction task, we calculated the precision of front prediction, which represents the probability of a correct front prediction and whose formula is written as:</p>
<disp-formula>
<label>(10)</label>
<mml:math display="block" id="M10">
<mml:mrow>
<mml:mtable columnalign="left">
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>n</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mover accent="true">
<mml:mi>I</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo stretchy="false">[</mml:mo>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>*</mml:mo>
<mml:msub>
<mml:mover accent="true">
<mml:mi>I</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
<mml:mo>+</mml:mo>
<mml:mi>&#x3f5;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo stretchy="false">[</mml:mo>
<mml:msub>
<mml:mover accent="true">
<mml:mi>I</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
<mml:mo>+</mml:mo>
<mml:mi>&#x3f5;</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
</disp-formula>
<p>In equation (10), <inline-formula>
<mml:math display="inline" id="im58">
<mml:mrow>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula>
<mml:math display="inline" id="im59">
<mml:mrow>
<mml:msub>
<mml:mover accent="true">
<mml:mi>I</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> represent mask maps of target and predicted front images, respectively, obtained by setting a threshold value to fronts. In the mask maps, the region with fronts was set to 1, and the region without fronts was set to 0. <inline-formula>
<mml:math display="inline" id="im60">
<mml:mi>M</mml:mi>
</mml:math>
</inline-formula> and <inline-formula>
<mml:math display="inline" id="im61">
<mml:mi>N</mml:mi>
</mml:math>
</inline-formula> denote the total number of pixels in the meridional and zonal directions of images, respectively; <inline-formula>
<mml:math display="inline" id="im62">
<mml:mi>i</mml:mi>
</mml:math>
</inline-formula> and <inline-formula>
<mml:math display="inline" id="im63">
<mml:mi>j</mml:mi>
</mml:math>
</inline-formula> are the pixel positions; and <inline-formula>
<mml:math display="inline" id="im64">
<mml:mrow>
<mml:mi>&#x3f5;</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0.00001</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> is the parameter to prevent the division-by-zero issue.The range of the precision value is 0 to 1, and the larger the better.</p>
<p>The predicted front precision variation with the threshold taken by fronts and the predicted number of days, using each module in MCSTNet, is shown in <xref ref-type="fig" rid="f10"><bold>Figure&#xa0;10</bold></xref>. Comparing <xref ref-type="fig" rid="f10"><bold>Figures&#xa0;10A, B</bold></xref>, there is an improvement in the predicted front precision using TTM over the encoder-decoder structure. Especially for the medium-term front prediction, the precision improvement is more obvious, indicating that TTM learns information about the temporal variation information in the sequence. By comparing <xref ref-type="fig" rid="f10"><bold>Figures&#xa0;10A, C</bold></xref>, the use of MCM can substantially improve the ability of sequence prediction. In particular, the improvement is more pronounced for the short-term front prediction but not for the medium-term front prediction. When comparing <xref ref-type="fig" rid="f10"><bold>Figures&#xa0;10A-D</bold></xref>, the values of the predicted front precision by MCSTNet are significantly higher than those of using only a single module. This is because the MCSTNet model combines the medium- and long-term sequence prediction benefits of TTM with the short-term, high-quality prediction capability of MCM, making improvements for both short-, medium-, and long-term prediction. The precision of the front prediction decreases as the predicted time increases for all models, which is consistent with the characteristics of the prediction task.</p>
<fig id="f10" position="float">
<label>Figure&#xa0;10</label>
<caption>
<p>The precision of fronts predicted by MCSTNet and its subnetwork. Precision of front for <bold>(A)</bold> the encoder-decoder structure, <bold>(B)</bold> MCSTNet without MCM, <bold>(C)</bold> MCSTNet without TTM, and <bold>(D)</bold> MCSTNet.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fmars-10-1151796-g010.tif"/>
</fig>
<p>The fronts predicted by MCM and TTM were evaluated objectively using the MSE, MAE, PSNR, and SSIM evaluation indices, as shown in <xref ref-type="table" rid="T4"><bold>Table&#xa0;4</bold></xref>. Both MCM and TTM have an enhancing effect on the prediction ability of fronts, and the enhancing effect of MCM is greater than that of TTM. This is because TTM is only a temporal feature information transfer model, while MCM transfers both temporal and spatial information. The simultaneous use of both modules obtains the best values in terms of all evaluation indices.</p>
<table-wrap id="T4" position="float">
<label>Table&#xa0;4</label>
<caption>
<p>Quantitative comparison of the future 10-day fronts predicted using the modules, i.e., MCM and TTM in MCSTNet, concerning MSE, MAE, PSNR, and SSIM on the SST data.</p>
</caption>
<table frame="hsides">
<thead>
<tr>
<th valign="top" colspan="2" align="center">Modules</th>
<th valign="bottom" align="center" rowspan="2">MSE <inline-formula>
<mml:math display="inline" id="im65">
<mml:mo>&#x2193;</mml:mo>
</mml:math>
</inline-formula>
</th>
<th valign="bottom" align="center" rowspan="2">MAE <inline-formula>
<mml:math display="inline" id="im66">
<mml:mo>&#x2193;</mml:mo>
</mml:math>
</inline-formula>
</th>
<th valign="bottom" align="center" rowspan="2">PSNR <inline-formula>
<mml:math display="inline" id="im67">
<mml:mo>&#x2191;</mml:mo>
</mml:math>
</inline-formula>
</th>
<th valign="bottom" align="center" rowspan="2">SSIM <inline-formula>
<mml:math display="inline" id="im68">
<mml:mo>&#x2191;</mml:mo>
</mml:math>
</inline-formula>
</th>
</tr>
<tr>
<th valign="top" align="center">TTM</th>
<th valign="top" align="center">MCM</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="center">&#x2715;</td>
<td valign="top" align="center">&#x2715;</td>
<td valign="top" align="center">16.5</td>
<td valign="top" align="center">190.0</td>
<td valign="top" align="center">29.3</td>
<td valign="top" align="center">0.618</td>
</tr>
<tr>
<td valign="top" align="center">&#x2713;</td>
<td valign="top" align="center">&#x2715;</td>
<td valign="top" align="center">14.8</td>
<td valign="top" align="center">174.4</td>
<td valign="top" align="center">29.6</td>
<td valign="top" align="center">0.637</td>
</tr>
<tr>
<td valign="top" align="center">&#x2715;</td>
<td valign="top" align="center">&#x2713;</td>
<td valign="top" align="center">12.9</td>
<td valign="top" align="center">154.9</td>
<td valign="top" align="center">30.2</td>
<td valign="top" align="center">0.688</td>
</tr>
<tr>
<td valign="top" align="center">&#x2713;</td>
<td valign="top" align="center">&#x2713;</td>
<td valign="top" align="center"><bold>11.8</bold>
</td>
<td valign="top" align="center"><bold>151.9</bold>
</td>
<td valign="top" align="center"><bold>30.4</bold>
</td>
<td valign="top" align="center"><bold>0.699</bold>
</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>The best results are highlighted in boldface. The down/up arrow indicates the lower the better/the higher the better. The &#x2018;x&#x2019; means not using this module, while the &#x2018;&#x221a;&#x2019; means using this module.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</sec>
</sec>
</sec>
<sec id="s4" sec-type="conclusions">
<label>4</label>
<title>Conclusion and future work</title>
<p>Inspired by the virtues of the U-Net and ConvLSTM architectures, this paper proposes a continuous spatiotemporal prediction model, called MCSTNet, for predicting future spatiotemporal variation sequences of SST and fronts based on continuous mean daily SST data. The methodology of MCSTNet contains three components: the encoder-decoder structure, the time transfer module, and the memory-contextual module. The encoder-decoder structure consists of the data encoder, feature decoder, and multi-task generation module. The rich contextual and semantic information in SST sequences and frontal structures from the SST data is extracted by the encoder-decoder structure. The temporal information is transferred, and low-level fine-grained temporal information is fused with high-level semantic information to enhance the medium- and long-term prediction precision of SST sequences and fronts by the time transfer module. The low-level spatiotemporal information is fused with high-level semantic information to improve the short-term prediction precision of SST sequences and fronts using the memory-contextual module. Combining the virtues of MSE loss and contextual loss collectively guides MCSTNet for stability training during the training phase. Qualitative and quantitative experimental results demonstrate that the performance of MCSTNet is superior to the SOTA models on the SST data, which include physical constraints-based PhyDNet, RNNs-based ConvLSTM, PredRNN, PredRNNv2, as well as MIM, and CNNs-based SimVP. This is because MCSTNet combines the detail learning ability of the CNN module with the variation pattern learning ability of the RNN module, which takes into account the variation pattern and detail features of SST sequences. The above-mentioned method, such as PhyDNet, still has room for improvement in long-term spatiotemporal prediction. It is recommended to add skip connections in the shallow encoder module of PhyDNet so that fine-grained spatiotemporal information can be transferred to the deep decoder module, similar to the time transfer module proposed in this study. The time transfer module can be improved to transfer fine-grained spatiotemporal information, thus improving the long-term prediction ability of PhyDNet. Moreover, the SSS data are applied to verify the performance and generalization ability of MCSTNet, and the results show that the predicted SSS sequences and fronts by MCSTNet are authentic and reasonable. Ablation studies demonstrate the effectiveness of each module in MCSTNet, including the excellent feature extraction capability of the encoder-decoder structure, the short-term prediction capability of the memory-contextual module, and the medium- and long-term prediction benefits of the time transfer module.</p>
<p>Due to the limitations of computer computing power, fronts in the Oyashio current region have been studied. DL methods have the ability of transfer learning, where the knowledge learnt from one dataset can be applied to another dataset, making them ideal for training on SST data in one region and then being applied to other regions. The model&#x2019;s prediction effectiveness can be enhanced by including data from the target area during training. In the future, we will expand our work to a larger scale or even a global scale to predict global SST sequences and fronts. Furthermore, we intend to add self-attention modules (<xref ref-type="bibr" rid="B39">Vaswani et al., 2017</xref>) to further improve the performance of MCSTNet. The DL methodology holds the promise of guiding the exploration of next-prediction &#x201c;smart&#x201d; SST sequences and fronts by harnessing our observational and theoretical knowledge to promote the development of this field.</p>
</sec>
<sec id="s5" sec-type="data-availability">
<title>Data availability statement</title>
<p>The original contributions presented in the study are included in the article/<xref ref-type="supplementary-material" rid="SM1"><bold>Supplementary Material</bold></xref>. Further inquiries can be directed to the corresponding authors.</p>
</sec>
<sec id="s6" sec-type="author-contributions">
<title>Author contributions</title>
<p>YM contributed to conceptualization, methodology, writing-original draft preparation and software. WL contributed to investigation, software and verification. GC contributed to editing and funding acquisition. GZ contributed to supervision, writing-reviewing and funding acquisition. FT contributed to data preparation, visualization and project administration. All authors contributed to the article and approved the submitted version.</p>
</sec>
</body>
<back>
<sec id="s7" sec-type="funding-information">
<title>Funding</title>
<p>This work was partially supported by the International Research Center of Big Data for Sustainable Development Goals under Grant No. CBAS2022GSP01, the National Key Research and Development Program of China under Grant No. 2018AAA0100400, the Science and Technology Innovation Project for Laoshan Laboratory under Grants No. LSKJ202204303 and No. LSKJ202201406, HY Project under Grant No. LZY2022033004, the Natural Science Foundation of Shandong Province under Grants No. ZR2020MF131 and No. ZR2021ZD19, Project of the Marine Science and Technology cooperative Innovation Center under Grant No. 22-05-CXZX-04-03-17, the Science and Technology Program of Qingdao under Grant No. 21-1-4-ny-19-nsh, Project of Associative Training of Ocean University of China under Grant No. 202265007, and the Fundamental Research Funds for the Central Universities under Grant No. 202261006.</p>
</sec>
<ack>
<title>Acknowledgments</title>
<p>We want to thank &#x201c;Qingdao AI Computing Center&#x201d; and &#x201c;Eco-Innovation Center&#x201d; for providing inclusive computing power and technical support of MindSpore during the completion of this paper.</p>
</ack>
<sec id="s8" sec-type="COI-statement">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec id="s9" sec-type="disclaimer">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<sec id="s10" sec-type="supplementary-material">
<title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fmars.2023.1151796/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fmars.2023.1151796/full#supplementary-material</ext-link>
</p>
<supplementary-material xlink:href="DataSheet_1.pdf" id="SM1" mimetype="application/pdf"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Buongiorno Nardelli</surname> <given-names>B.</given-names>
</name>
<name>
<surname>Cavaliere</surname> <given-names>D.</given-names>
</name>
<name>
<surname>Charles</surname> <given-names>E.</given-names>
</name>
<name>
<surname>Ciani</surname> <given-names>D.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Super-resolving Ocean Dyn. from space with computer vision algorithms</article-title>. <source>Remote Sens.</source> <volume>14</volume>, <elocation-id>1159</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.3390/rs14051159</pub-id>
</citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chassignet</surname> <given-names>E. P.</given-names>
</name>
<name>
<surname>Hurlburt</surname> <given-names>H. E.</given-names>
</name>
<name>
<surname>Metzger</surname> <given-names>E. J.</given-names>
</name>
<name>
<surname>Smedstad</surname> <given-names>O. M.</given-names>
</name>
<name>
<surname>Cummings</surname> <given-names>J. A.</given-names>
</name>
<name>
<surname>Halliwell</surname> <given-names>G. R.</given-names>
</name>
<etal/>
</person-group>. (<year>2009</year>). <article-title>Us godae: global ocean prediction with the hybrid coordinate ocean model (hycom)</article-title>. <source>Oceanogr.</source> <volume>22</volume>, <fpage>64</fpage>&#x2013;<lpage>75</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/1-4020-4028-8_16</pub-id>
</citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Counillon</surname> <given-names>F.</given-names>
</name>
<name>
<surname>Bertino</surname> <given-names>L.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>High-resolution ensemble forecasting for the gulf of mexico eddies and fronts</article-title>. <source>Ocean Dyn.</source> <volume>59</volume>, <fpage>83</fpage>&#x2013;<lpage>95</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/s10236-008-0167-0</pub-id>
</citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ducournau</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Fablet</surname> <given-names>R.</given-names>
</name>
</person-group> (<year>2016</year>). &#x201c;<article-title>Deep learning for ocean remote sensing: an application of convolutional neural networks for super-resolution on satellite-derived sst data</article-title>,&#x201d; in <source>2016 9th IAPR Workshop Pattern Recogniton Remote Sensing (PRRS) (IEEE)</source>, <fpage>1</fpage>&#x2013;<lpage>6</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1109/PRRS.2016.7867019</pub-id>
</citation>
</ref>
<ref id="B5">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Gao</surname> <given-names>Z.</given-names>
</name>
<name>
<surname>Tan</surname> <given-names>C.</given-names>
</name>
<name>
<surname>Wu</surname> <given-names>L.</given-names>
</name>
<name>
<surname>Li</surname> <given-names>S. Z.</given-names>
</name>
</person-group> (<year>2022</year>). &#x201c;<article-title>Simvp: simpler yet better video prediction</article-title>,&#x201d; in <conf-name>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</conf-name> (<publisher-loc>New Orleans</publisher-loc>: <publisher-name>IEEE</publisher-name>), <volume>3170&#x2013;3180</volume>. doi:&#xa0;<pub-id pub-id-type="doi">10.1109/CVPR52688.2022.00317</pub-id>
</citation>
</ref>
<ref id="B6">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Glorot</surname> <given-names>X.</given-names>
</name>
<name>
<surname>Bordes</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Bengio</surname> <given-names>Y.</given-names>
</name>
</person-group> (<year>2011</year>). &#x201c;<article-title>Deep sparse rectifier neural networks</article-title>,&#x201d; in <conf-name>Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS)</conf-name> (<publisher-loc>Fort Lauderdale</publisher-loc>: <publisher-name>JMLR.org</publisher-name>) <volume>15</volume>, <fpage>315</fpage>&#x2013;<lpage>323</lpage>. Available at: <uri xlink:href="http://proceedings.mlr.press/v15/glorot11a/glorot11a.pdf">http://proceedings.mlr.press/v15/glorot11a/glorot11a.pdf</uri>
</citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gopalakrishnan</surname> <given-names>G.</given-names>
</name>
<name>
<surname>Cornuelle</surname> <given-names>B. D.</given-names>
</name>
<name>
<surname>Hoteit</surname> <given-names>I.</given-names>
</name>
<name>
<surname>Rudnick</surname> <given-names>D. L.</given-names>
</name>
<name>
<surname>Owens</surname> <given-names>W. B.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>State estimates and forecasts of the loop current in the gulf of mexico using the mitgcm and its adjoint</article-title>. <source>J. Geophys. Res. Oceans</source> <volume>118</volume>, <fpage>3292</fpage>&#x2013;<lpage>3314</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1002/jgrc.20239</pub-id>
</citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Guan</surname> <given-names>B.</given-names>
</name>
<name>
<surname>Molotch</surname> <given-names>N. P.</given-names>
</name>
<name>
<surname>Waliser</surname> <given-names>D. E.</given-names>
</name>
<name>
<surname>Fetzer</surname> <given-names>E. J.</given-names>
</name>
<name>
<surname>Neiman</surname> <given-names>P. J.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>Extreme snowfall events linked to atmospheric rivers and surface air temperature <italic>via</italic> satellite measurements</article-title>. <source>Geophys. Res. Lett.</source> <volume>37</volume>, <elocation-id>L20401</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1029/2010GL044696</pub-id>
</citation>
</ref>
<ref id="B9">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Guen</surname> <given-names>V. L.</given-names>
</name>
<name>
<surname>Thome</surname> <given-names>N.</given-names>
</name>
</person-group> (<year>2020</year>). &#x201c;<article-title>Disentangling physical dynamics from unknown factors for unsupervised video prediction</article-title>,&#x201d; in <conf-name>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</conf-name> (CVPR), (<publisher-loc>Seattle</publisher-loc>: <publisher-name>Computer Vision Foundation / IEEE</publisher-name>), <fpage>11474</fpage>&#x2013;<lpage>11484</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1109/CVPR42600.2020.01149</pub-id>
</citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ham</surname> <given-names>Y. G.</given-names>
</name>
<name>
<surname>Kim</surname> <given-names>J. H.</given-names>
</name>
<name>
<surname>Luo</surname> <given-names>J. J.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Deep learning for multi-year enso forecasts</article-title>. <source>Nature</source> <volume>573</volume>, <fpage>568</fpage>&#x2013;<lpage>572</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1038/s41586-019-1559-7</pub-id>
</citation>
</ref>
<ref id="B11">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>He</surname> <given-names>K.</given-names>
</name>
<name>
<surname>Zhang</surname> <given-names>X.</given-names>
</name>
<name>
<surname>Ren</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Sun</surname> <given-names>J.</given-names>
</name>
</person-group> (<year>2016</year>). &#x201c;<article-title>Deep residual learning for image recognition</article-title>,&#x201d; in <conf-name>Proceedings of the IEEE conference on computer vision and pattern recognition</conf-name> (CVPR) (<publisher-loc>Las Vegas:</publisher-loc> <publisher-name>IEEE Computer Society</publisher-name>), <fpage>770</fpage>&#x2013;<lpage>778</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1109/CVPR.2016.90</pub-id>
</citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hochreiter</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Schmidhuber</surname> <given-names>J.</given-names>
</name>
</person-group> (<year>1997</year>). <article-title>Long short-term memory</article-title>. <source>Neural Comput.</source> <volume>9</volume>, <fpage>1735</fpage>&#x2013;<lpage>1780</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1162/neco.1997.9.8.1735</pub-id>
</citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hurlburt</surname> <given-names>H. E.</given-names>
</name>
<name>
<surname>Brassington</surname> <given-names>G. B.</given-names>
</name>
<name>
<surname>Drillet</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Kamachi</surname> <given-names>M.</given-names>
</name>
<name>
<surname>Benkiran</surname> <given-names>M.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>High-resolution global and basin-scale ocean analyses and forecasts</article-title>. <source>Oceanogr.</source> <volume>22</volume>, <fpage>110</fpage>&#x2013;<lpage>127</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.5670/oceanog.2009.70</pub-id>
</citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huynh-Thu</surname> <given-names>Q.</given-names>
</name>
<name>
<surname>Ghanbari</surname> <given-names>M.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>Scope of validity of psnr in image/video quality assessment</article-title>. <source>Electron. Lett.</source> <volume>44</volume>, <fpage>800</fpage>&#x2013;<lpage>801</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1049/el:20080522</pub-id>
</citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jordan</surname> <given-names>M. I.</given-names>
</name>
<name>
<surname>Mitchell</surname> <given-names>T. M.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Machine learning: trends perspectives, and prospects</article-title>. <source>Science</source> <volume>349</volume>, <fpage>255</fpage>&#x2013;<lpage>260</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1126/science.aaa8415</pub-id>
</citation>
</ref>
<ref id="B16">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Kagimoto</surname> <given-names>T.</given-names>
</name>
<name>
<surname>Miyazawa</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Guo</surname> <given-names>X.</given-names>
</name>
<name>
<surname>Kawajiri</surname> <given-names>H.</given-names>
</name>
</person-group> (<year>2008</year>). <source>High resolution kuroshio forecast system: description and its applications</source> (<publisher-loc>New York</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>69</fpage>.</citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kamachi</surname> <given-names>M.</given-names>
</name>
<name>
<surname>Kuragano</surname> <given-names>T.</given-names>
</name>
<name>
<surname>Ichikawa</surname> <given-names>H.</given-names>
</name>
<name>
<surname>Nakamura</surname> <given-names>H.</given-names>
</name>
<name>
<surname>Nishina</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Isobe</surname> <given-names>A.</given-names>
</name>
<etal/>
</person-group>. (<year>2004</year>). <article-title>Operational data assimilation system for the kuroshio south of japan: reanalysis and validation</article-title>. <source>J. Oceanogr.</source> <volume>60</volume>, <fpage>303</fpage>&#x2013;<lpage>312</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1023/B:JOCE.0000038336.87717.b7</pub-id>
</citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Komori</surname> <given-names>N.</given-names>
</name>
<name>
<surname>Awaji</surname> <given-names>T.</given-names>
</name>
<name>
<surname>Ishikawa</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Kuragano</surname> <given-names>T.</given-names>
</name>
</person-group> (<year>2003</year>). <article-title>Short-range forecast experiments of the kuroshio path variabilities south of japan using topex/poseidon altimetric data</article-title>. <source>J. Geophys. Res. Oceans</source> <volume>108</volume>, <fpage>10</fpage>&#x2013;<lpage>11</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1029/2001JC001282</pub-id>
</citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kunihiko</surname> <given-names>F.</given-names>
</name>
<name>
<surname>Sei</surname> <given-names>M.</given-names>
</name>
</person-group> (<year>1982</year>). <article-title>Neocognitron: a self-organizing neural network model for a mechanism of visual pattern recognition</article-title>. <source>Compet. Coop. Neural Nets</source> <volume>36</volume>, <fpage>267</fpage>&#x2013;<lpage>285</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/BF00344251</pub-id>
</citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>LeCun</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Bengio</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Hinton</surname> <given-names>G.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Deep learning</article-title>. <source>Nature</source> <volume>521</volume>, <fpage>436</fpage>&#x2013;<lpage>444</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1038/nature14539</pub-id>
</citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Legeckis</surname> <given-names>R.</given-names>
</name>
</person-group> (<year>1977</year>). <article-title>Long waves in the eastern equatorial pacific ocean: a view from a geostationary satellite</article-title>. <source>Science</source> <volume>197</volume>, <fpage>1179</fpage>&#x2013;<lpage>1181</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1126/science.197.4309.1179</pub-id>
</citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname> <given-names>X.</given-names>
</name>
<name>
<surname>Liu</surname> <given-names>B.</given-names>
</name>
<name>
<surname>Zheng</surname> <given-names>G.</given-names>
</name>
<name>
<surname>Ren</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Zhang</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Liu</surname> <given-names>Y.</given-names>
</name>
<etal/>
</person-group>. (<year>2020</year>). <article-title>Deep learning-based information mining from ocean remote sensing imagery</article-title>. <source>Natl. Sci. Rev.</source> <volume>7</volume>, <fpage>1584</fpage>&#x2013;<lpage>1605</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1093/nsr/nwaa047</pub-id>
</citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liang</surname> <given-names>X. S.</given-names>
</name>
<name>
<surname>Robinson</surname> <given-names>A. R.</given-names>
</name>
</person-group> (<year>2004</year>). <article-title>A study of the iceland-faeroe frontal variability using the multiscale energy and vorticity analysis</article-title>. <source>J. Phys. Oceanogr.</source> <volume>34</volume>, <fpage>2571</fpage>&#x2013;<lpage>2591</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1175/JPO2661.1</pub-id>
</citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Zheng</surname> <given-names>Q.</given-names>
</name>
<name>
<surname>Li</surname> <given-names>X.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Characteristics of global ocean abnormal mesoscale eddies derived from the fusion of sea surface height and temperature data by deep learning</article-title>. <source>Geophys. Res. Lett.</source> <volume>48</volume>, <elocation-id>e2021GL094772</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1029/2021GL094772</pub-id>
</citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mauzole</surname> <given-names>Y.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Objective delineation of persistent sst fronts based on global satellite observations</article-title>. <source>Remote Sens. Environ.</source> <volume>269</volume>, <elocation-id>112798</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.rse.2021.112798</pub-id>
</citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Meng</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Rigall</surname> <given-names>E.</given-names>
</name>
<name>
<surname>Chen</surname> <given-names>X.</given-names>
</name>
<name>
<surname>Gao</surname> <given-names>F.</given-names>
</name>
<name>
<surname>Dong</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Chen</surname> <given-names>S.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Physics-guided generative adversarial networks for sea subsurface temperature prediction</article-title>. <source>IEEE Trans. Neural Netw. Learn. Syst.</source>, <fpage>1</fpage>-<lpage>14</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.48550/arXiv.2111.03064</pub-id>
</citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Miller</surname> <given-names>A. J.</given-names>
</name>
<name>
<surname>Poulain</surname> <given-names>P. M.</given-names>
</name>
<name>
<surname>Warn-Varnas</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Arango</surname> <given-names>H. G.</given-names>
</name>
<name>
<surname>Robinson</surname> <given-names>A. R.</given-names>
</name>
<name>
<surname>Leslie</surname> <given-names>W. G.</given-names>
</name>
</person-group> (<year>1995</year>). <article-title>Quasigeostrophic forecasting and physical processes of iceland-faroe frontal variability</article-title>. <source>J. Phys. Oceanogr.</source> <volume>25</volume>, <fpage>1273</fpage>&#x2013;<lpage>1295</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1175/1520-0485(1995)025&lt;1273:QFAPPO&gt;2.0.CO;2</pub-id>
</citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Oey</surname> <given-names>L. Y.</given-names>
</name>
<name>
<surname>Ezer</surname> <given-names>T.</given-names>
</name>
<name>
<surname>Forristall</surname> <given-names>G.</given-names>
</name>
<name>
<surname>Cooper</surname> <given-names>C.</given-names>
</name>
<name>
<surname>Dimarco</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Fan</surname> <given-names>S.</given-names>
</name>
</person-group> (<year>2005</year>). <article-title>An exercise in forecasting loop current and eddy frontal positions in the gulf of mexico</article-title>. <source>Geophys. Res. Lett.</source> <volume>32</volume>,  <fpage>L12611</fpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1029/2005GL023253</pub-id>
</citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Patil</surname> <given-names>K.</given-names>
</name>
<name>
<surname>Deo</surname> <given-names>M. C.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Prediction of daily sea surface temperature using efficient neural networks</article-title>. <source>Ocean Dyn.</source> <volume>67</volume>, <fpage>357</fpage>&#x2013;<lpage>368</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/s10236-017-1032-9</pub-id>
</citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Patil</surname> <given-names>K.</given-names>
</name>
<name>
<surname>Deo</surname> <given-names>M. C.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Basin-scale prediction of sea surface temperature with artificial neural networks</article-title>. <source>J. Atmos. Ocean. Technol.</source> <volume>35</volume>, <fpage>1441</fpage>&#x2013;<lpage>1455</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1175/JTECH-D-17-0217.1</pub-id>
</citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Popova</surname> <given-names>E. E.</given-names>
</name>
<name>
<surname>Srokosz</surname> <given-names>M. A.</given-names>
</name>
<name>
<surname>Smeed</surname> <given-names>D. A.</given-names>
</name>
</person-group> (<year>2002</year>). <article-title>Real-time forecasting of biological and physical dynamics at the iceland-faeroes front in june 2001</article-title>. <source>Geophys. Res. Lett.</source> <volume>29</volume>, <fpage>14</fpage>&#x2013;<lpage>1&#x2013;14&#x2013;4</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1029/2001GL013706</pub-id>
</citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Prasad</surname> <given-names>N. N.</given-names>
</name>
<name>
<surname>Rao</surname> <given-names>J. K.</given-names>
</name>
</person-group> (<year>1990</year>). <article-title>The estimation of the mean squared error of small-area estimators</article-title>. <source>J. Am. Stat. Assoc.</source> <volume>85</volume>, <fpage>163</fpage>&#x2013;<lpage>171</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1080/01621459.1990.10475320</pub-id>
</citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Reichstein</surname> <given-names>M.</given-names>
</name>
<name>
<surname>Camps-Valls</surname> <given-names>G.</given-names>
</name>
<name>
<surname>Stevens</surname> <given-names>B.</given-names>
</name>
<name>
<surname>Jung</surname> <given-names>M.</given-names>
</name>
<name>
<surname>Denzler</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Carvalhais</surname> <given-names>N.</given-names>
</name>
<etal/>
</person-group>. (<year>2019</year>). <article-title>Deep learning and process understanding for data-driven earth system science</article-title>. <source>Nature</source> <volume>566</volume>, <fpage>195</fpage>&#x2013;<lpage>204</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1038/s41586-019-0912-1</pub-id>
</citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ruiz</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Claret</surname> <given-names>M.</given-names>
</name>
<name>
<surname>Pascual1</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Olita</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Troupin</surname> <given-names>C.</given-names>
</name>
<name>
<surname>Capet</surname> <given-names>A.</given-names>
</name>
<etal/>
</person-group>. (<year>2019</year>). <article-title>Effects of oceanic mesoscale and submesoscale frontal processes on the vertical transport of phytoplankton</article-title>. <source>J. Geophys. Res. Oceans</source> <volume>124</volume>, <fpage>5999</fpage>&#x2013;<lpage>6014</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1029/2019JC015034</pub-id>
</citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shi</surname> <given-names>X.</given-names>
</name>
<name>
<surname>Chen</surname> <given-names>Z.</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>H.</given-names>
</name>
<name>
<surname>Yeung</surname> <given-names>D. Y.</given-names>
</name>
<name>
<surname>Wong</surname> <given-names>W. K.</given-names>
</name>
<name>
<surname>Woo</surname> <given-names>W. C.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Convolutional lstm network: a machine learning approach for precipitation nowcasting</article-title>. <source>Adv. Neural Inf. Process. Syst.</source> <volume>28</volume>, <fpage>802</fpage>&#x2013;<lpage>810</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.5555/2969239.2969329</pub-id>
</citation>
</ref>
<ref id="B36">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Smedstad</surname> <given-names>O. M.</given-names>
</name>
<name>
<surname>Hurlburt</surname> <given-names>H. E.</given-names>
</name>
<name>
<surname>Metzger</surname> <given-names>E. J.</given-names>
</name>
<name>
<surname>Rhodes</surname> <given-names>R. C.</given-names>
</name>
<name>
<surname>Shriver</surname> <given-names>J. F.</given-names>
</name>
<name>
<surname>Wallcraft</surname> <given-names>A. J.</given-names>
</name>
<etal/>
</person-group>. (<year>2003</year>). <article-title>An operational eddy resolving 1/16&#xb0; global ocean nowcast/forecast system</article-title>. <source>J. Mar. Syst.</source> <volume>40</volume>, <fpage>341</fpage>&#x2013;<lpage>361</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/S0924-7963(03)00024-1</pub-id>
</citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Srivastava</surname> <given-names>N.</given-names>
</name>
<name>
<surname>Mansimov</surname> <given-names>E.</given-names>
</name>
<name>
<surname>Salakhudinov</surname> <given-names>R.</given-names>
</name>
</person-group> (<year>2015</year>). &#x201c;<article-title>Unsupervised learning of video representations using lstms</article-title>,&#x201d; <source>International Conference on Machine Learning</source> <volume>37</volume>, <fpage>843</fpage>&#x2013;<lpage>852</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.48550/arXiv.1502.04681</pub-id>
</citation>
</ref>
<ref id="B38">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Toggweiler</surname> <given-names>J. R.</given-names>
</name>
<name>
<surname>Russell</surname> <given-names>J.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>Ocean circulation in a warming climate</article-title>. <source>Nature</source> <volume>451</volume>, <fpage>286</fpage>&#x2013;<lpage>288</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1038/nature06590</pub-id>
</citation>
</ref>
<ref id="B39">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Vaswani</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Shazeer</surname> <given-names>N.</given-names>
</name>
<name>
<surname>Parmar</surname> <given-names>N.</given-names>
</name>
<name>
<surname>Uszkoreit</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Jones</surname> <given-names>L.</given-names>
</name>
<name>
<surname>Gomez</surname> <given-names>A. N.</given-names>
</name>
<etal/>
</person-group>. (<year>2017</year>). <article-title>Attention is all you need</article-title>. <source>Adv. Neural Inf. Process. Syst.</source> <volume>30</volume>, <fpage>5998</fpage>&#x2013;<lpage>6008</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.48550/arXiv.1706.03762</pub-id>
</citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname> <given-names>Z.</given-names>
</name>
<name>
<surname>Bovik</surname> <given-names>A. C.</given-names>
</name>
<name>
<surname>Sheikh</surname> <given-names>H. R.</given-names>
</name>
<name>
<surname>Simoncelli</surname> <given-names>E. P.</given-names>
</name>
</person-group> (<year>2004</year>). <article-title>Image quality assessment: from error visibility to structural similarity</article-title>. <source>IEEE Trans. image Process.</source> <volume>13</volume>, <fpage>600</fpage>&#x2013;<lpage>612</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1109/TIP.2003.819861</pub-id>
</citation>
</ref>
<ref id="B41">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Wang</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Long</surname> <given-names>M.</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Gao</surname> <given-names>Z.</given-names>
</name>
<name>
<surname>Yu</surname> <given-names>P. S.</given-names>
</name>
</person-group> (<year>2017</year>). &#x201c;<article-title>Predrnn: recurrent neural networks for predictive learning using spatiotemporal lstms</article-title>,&#x201d; in <source>Advances in neural information processing systems</source> (Long Beach), <volume>30</volume>.  Available at: <uri xlink:href="https://proceedings.neurips.cc/paper/2017/file/e5f6ad6ce374177eef023bf0c018b6-Paper.pdf">https://proceedings.neurips.cc/paper/2017/file/e5f6ad6ce374177eef023bf0c018b6-Paper.pdf</uri>.</citation>
</ref>
<ref id="B42">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Wu</surname> <given-names>H.</given-names>
</name>
<name>
<surname>Zhang</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Gao</surname> <given-names>Z.</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Yu</surname> <given-names>P.</given-names>
</name>
<etal/>
</person-group>. (<year>2021</year>). <article-title>Predrnn: a recurrent neural network for spatiotemporal predictive learning</article-title>. <source>IEEE Trans. Pattern Anal. Mach. Intell.</source> <volume>45</volume> (<issue>2</issue>), <fpage>2208</fpage>&#x2013;<lpage>2225</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.48550/arXiv.2103.09504</pub-id>
</citation>
</ref>
<ref id="B43">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Zhang</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Zhu</surname> <given-names>H.</given-names>
</name>
<name>
<surname>Long</surname> <given-names>M.</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Yu</surname> <given-names>P. S.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Memory in memory: a predictive neural network for learning higher-order non-stationarity from spatiotemporal dynamics</article-title>,&#x201d; in <source>Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)</source>, <fpage>9154</fpage>&#x2013;<lpage>9162</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1109/CVPR.2019.00937</pub-id>
</citation>
</ref>
<ref id="B44">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wei</surname> <given-names>L.</given-names>
</name>
<name>
<surname>Guan</surname> <given-names>L.</given-names>
</name>
<name>
<surname>Qu</surname> <given-names>L. Q.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Prediction of sea surface temperature in the south china sea by artificial neural networks</article-title>. <source>IEEE Geosci. Remote Sens. Lett.</source> <volume>17</volume>, <fpage>558</fpage>&#x2013;<lpage>562</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1109/LGRS.2019.2926992</pub-id>
</citation>
</ref>
<ref id="B45">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Willmott</surname> <given-names>C. J.</given-names>
</name>
<name>
<surname>Matsuura</surname> <given-names>K.</given-names>
</name>
</person-group> (<year>2005</year>). <article-title>Advantages of the mean absolute error (mae) over the root mean square error (rmse) in assessing average model performance</article-title>. <source>Clim. Res.</source> <volume>30</volume>, <fpage>79</fpage>&#x2013;<lpage>82</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.3354/cr030079</pub-id>
</citation>
</ref>
<ref id="B46">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Woodson</surname> <given-names>C. B.</given-names>
</name>
<name>
<surname>Litvin</surname> <given-names>S. Y.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Ocean fronts drive marine fishery production and biogeochemical cycling</article-title>. <source>Proc. Natl. Acad. Sci. U. S. A.</source> <volume>112</volume>, <fpage>1710</fpage>&#x2013;<lpage>1715</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1073/pnas.141714311</pub-id>
</citation>
</ref>
<ref id="B47">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Dong</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Sun</surname> <given-names>X.</given-names>
</name>
<name>
<surname>Lima</surname> <given-names>E.</given-names>
</name>
<name>
<surname>Mu</surname> <given-names>Q.</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>X.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>A cfcc-lstm model for sea surface temperature prediction</article-title>. <source>IEEE Geosci. Remote Sens. Lett.</source> <volume>15</volume>, <fpage>207</fpage>&#x2013;<lpage>211</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1109/LGRS.2017.2780843</pub-id>
</citation>
</ref>
<ref id="B48">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Lam</surname> <given-names>K. M.</given-names>
</name>
<name>
<surname>Sun</surname> <given-names>X.</given-names>
</name>
<name>
<surname>Dong</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Lguensat</surname> <given-names>R.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>An efficient algorithm for ocean-front evolution trend recognition</article-title>. <source>Remote Sens.</source> <volume>14</volume>, <elocation-id>259</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.3390/rs14020259</pub-id>
</citation>
</ref>
<ref id="B49">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yin</surname> <given-names>X. Q.</given-names>
</name>
<name>
<surname>Oey</surname> <given-names>L. Y.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>Bred-ensemble ocean forecast of loop current and rings</article-title>. <source>Ocean Model.</source> <volume>17</volume>, <fpage>300</fpage>&#x2013;<lpage>326</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.ocemod.2007.02.005</pub-id>
</citation>
</ref>
<ref id="B50">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname> <given-names>Q.</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>H.</given-names>
</name>
<name>
<surname>Dong</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Zhong</surname> <given-names>G.</given-names>
</name>
<name>
<surname>Sun</surname> <given-names>X.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Prediction of sea surface temperature using long short-term memory</article-title>. <source>IEEE Geosci. Remote Sens. Lett.</source> <volume>14</volume>, <fpage>1745</fpage>&#x2013;<lpage>1749</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1109/LGRS.2017.2733548</pub-id>
</citation>
</ref>
<ref id="B51">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zheng</surname> <given-names>G.</given-names>
</name>
<name>
<surname>Li</surname> <given-names>X.</given-names>
</name>
<name>
<surname>Zhang</surname> <given-names>R. H.</given-names>
</name>
<name>
<surname>Liu</surname> <given-names>B.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Purely satellite data&#x2013;driven deep learning forecast of complicated tropical instability waves</article-title>. <source>Sci. Adv.</source> <volume>6</volume>, <elocation-id>eaba1482</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1126/sciadv.aba1482</pub-id>
</citation>
</ref>
</ref-list>
</back>
</article>
