A Novel Initialization Technique for Decadal Climate Predictions

Model initialization is a matter of transferring the observed information available at the start of a forecast to the model. An optimal initialization is generally recognized to be able to improve climate predictions up to a few years ahead. However, systematic errors in models make the initialization process challenging. When the observed information is transferred to the model at the initialization time, the discrepancy between the observed and model mean climate causes the drift of the prediction toward the model-biased attractor. Although such drifts can be generally accounted for with a posteriori bias correction techniques, the bias evolving along the prediction might affect the variability that we aim at predicting, and disentangling the small magnitude of the climate signal from the initial drift to be removed represents a challenge. In this study, we present an innovative initialization technique that aims at reducing the initial drift by performing a quantile matching between the observed state at the initialization time and the model state distribution. The adjusted initial state belongs to the model attractor and the observed variability amplitude is scaled toward the model one. Multi-annual climate predictions integrated for 5 years and run with the EC-Earth3 Global Coupled Model have been initialized with this novel methodology, and their prediction skill has been compared with the non-initialized historical simulations from CMIP6 and with the same decadal prediction system but based on full-field initialization. We perform a skill assessment of the surface temperature, the heat content in the ocean upper layers, the sea level pressure, and the barotropic ocean circulation. The added value of the quantile matching initialization is shown in the North Atlantic subpolar region and over the North Pacific surface temperature as well as for the ocean heat content up to 5 years. Improvements are also found in the predictive skill of the Atlantic Meridional Overturning Circulation and the barotropic stream function in the Labrador Sea throughout the 5 forecast years when compared to the full field method.


INTRODUCTION
Providing reliable climate information for the near-term future is of paramount importance for many socioeconomic sectors, such as agriculture, energy, health, and insurance. Incorporating this information in the decision-making process is a key goal for the climate service community (Goddard, 2016;Otto et al., 2016). Decadal predictions cover 1-year to 10-year timescales.
The information extracted from this timescale is involved in the process of planning for adaptation strategies (Goddard et al., 2012;Brasseur and Gallardo, 2016). There is an internationally coordinated effort to produce and study predictions that cover such a timescale known as the Decadal Climate Prediction Project (DCPP, Boer et al., 2016), which contributes to the Coupled Model Intercomparison Project Phase 6 (CMIP6, Eyring et al., 2016).
Decadal predictability can arise from two main sources, namely the radiative forcings and the internal variability. Analogously to climate projections, decadal predictions are partly considered a boundary value problem whose crucial goal is to estimate the response of the system to variations in external forcings (Meehl et al., 2009). Apart from the anthropogenic contribution to changes in the external forcing (Booth et al., 2012), there also exist changes of natural origins, such as solar variability and volcanic eruptions, which are known to strongly affect the natural variability in the North Atlantic (Borchert et al., 2021;Mann et al., 2021).
The sources of interannual to decadal predictability originating from the slow components of the internal climate variability are associated mainly with the sea surface temperature (SST) state and ocean heat content (Doblas-Reyes et al., 2013a;Guemas et al., 2013;Merryfield et al., 2020). El Niño Southern Oscillation (ENSO) is the main process that contributes to the forecast quality in the Tropics as well as in large parts of the world, thanks to its expanded remote impacts, known as teleconnections (Doblas-Reyes et al., 2013b;Beverley et al., 2019). Another source of predictability is the sea ice, which is predictable 3 years ahead (Tietsche et al., 2014;Day et al., 2015) and whose main source of predictability is given by its persistence (Blanchard-Wrigglesworth et al., 2011). Climate predictability over the extratropics is controlled by teleconnections with the Tropics and the Arctic (Jung et al., 2015), as well as with soil moisture and land snow (Bellucci et al., 2015a).
The leading modes of decadal variability that dominate the Atlantic and the Pacific oceans are, respectively, the Atlantic multidecadal variability (AMV) and the Pacific decadal variability (PDV) (Kushnir et al., 2019). Although the processes responsible for the AMV and its predictability are not fully understood (Latif and Keenlyside, 2011;Cassou et al., 2018), climate models suggest that this mode of variability is linked to the variations in strength of the Atlantic Meridional Overturning Circulation (AMOC) (Zhang and Wang, 2013), which in turn is driven mainly by the convection activity that takes place in the northern high latitudes (Ortega et al., 2015;Robson et al., 2016), by a wind-dependent contribution (Mignot et al., 2006) and also by volcanic eruptions (Borchert et al., 2021;Mann et al., 2021). Since the AMOC exerts significant influences on the European climate through its net northward heat transport in the Atlantic, having good levels of prediction skill over the North Atlantic is of crucial importance. The North Atlantic upper-ocean heat content has an impact on Atlantic hurricanes and inland temperature and precipitation (Dunstone et al., 2011;Gastineau and Frankignoul, 2015;Buckley et al., 2019).
In order to capture the oscillations and the impacts of these variability modes, an effective initialization is crucial. Initialized predictions generally show improved forecast skill with respect to historical simulations, which only respond to changes in radiative forcings (Corti et al., 2015;Boer et al., 2016). Initializing a climate prediction consists of incorporating into the model the observed state information at the initial time of the forecast. However, models have systematic errors, which are associated with misrepresentation of key processes that are unresolved at the particular model grid and need to be parameterized (Nadiga et al., 2019). The main consequence of model errors is the difference between the model and observed mean state. Such a difference complicates the initialization task and one open issue is how to provide the model with the best estimate of the real initial state, without introducing inconsistencies that could compromise the prediction quality (Brune and Baehr, 2020).
One of the common initialization strategies is the full-field initialization (FFI), where the initial state is the best estimate of the observed climate state-the reanalysis (Pohlmann et al., 2009). After initialization, the prediction drifts away from the real-world attractor toward the mean model-biased state. To account for such a bias, a posteriori bias correction needs to be applied; various techniques that take into account the forecast time, start date, or initial condition dependence of the bias have been designed and implemented (Kharin et al., 2012;Fuĉkar et al., 2014). The correction of the bias in interannual predictions takes up the challenge of disentangling the small magnitude of climate signal to be predicted from the initial drift to be removed (Smith et al., 2013).
An alternative technique to limit the drift is the anomaly initialization (AI) that aims at phasing the model variability with the observed one by assimilating the observed anomaly onto the model mean state (Smith et al., 2008;Pohlmann et al., 2013). Previous studies have applied these initialization techniques to different models to highlight the relative strengths and limitations (Hazeleger et al., 2013;Smith et al., 2013;Bellucci et al., 2015b;Marotzke et al., 2016). The results show that in their standard implementation, at interannual time scales, the differences in skill between these techniques are small and limited to specific regions (Smith et al., 2013). The best strategy has been suggested to be model dependent because models have different biases (Magnusson et al., 2013;Polkova et al., 2014).
In this work, we present a new initialization method, the quantile matching (QM) technique that aims at reducing the drift and at limiting inconsistencies coming from the differences between the model and the observed variability amplitude. After performing the decadal hindcasts with EC-Earth3, we explore the impact of the new initialization method on the forecast skill by comparing the predictions initialized with QM, with a set of predictions initialized with FFI and a set of historical simulations. Section 2 introduces the initialization method, its implementation, and describes the model set-up in use. Section 3 is organized as follows: section 3.1 provides an overview of the behavior of the decadal predictions based on the new initialization method in terms of mean bias and drift. The prediction skill of the surface and sub-surface fields is presented in section 3.2, and the skill in the North Atlantic region is explored in section 3.3. Finally, the main findings are summarized in section 4.

Quantile Matching as Initialization Method
Predictions after initialization can experience two potential problems that can affect the forecast skill. On the one hand, rapid model adjustments known as initial shocks can occur (He et al., 2017). They can be generated from different mechanisms, one of them being the imbalance caused by the use of inconsistent atmosphere and ocean reanalyses as initial states. Previous studies suggested that the impact of initial shocks on the forecast skill is negligible at seasonal timescale (Mulholland et al., 2015). Nevertheless, this is not the case at longer timescales, where initial shock plays an important role in sensitive areas such as the North Atlantic subpolar region (Kröger et al., 2018;Bilbao et al., 2021). On the other hand, the model tends to adjust toward its biased mean climate after being initialized. Such a drift, which is a common feature of predictions initialized with FFI, is known to affect the forecast quality (Hazeleger et al., 2013).
While initial shocks can occur with both FFI and AI, the drift is expected to be largely reduced in predictions initialized with AI, as such technique employs an initial state that belongs to the model attractor and only imposes the observed variability (i.e., the initial state is the sum of the model mean climate and the observed anomalies). However, the use of AI can introduce observed anomalies whose amplitude does not belong to the range of the internal variability generated by the model. Volpi et al. (2017b) addressed this issue by weighting the observed anomalies with the ratio between the model and observed standard deviations. This technique of weighting, together with the anomaly initialization of the ocean density, showed improved skill in predicting the sea-ice variability, the AMV and the SST in the Labrador Sea, in parts of the North Pacific and Southern Ocean.
However, such refinement employed the standard deviation to characterize the variability amplitude, although the statistical distribution of variability might be skewed. The QM introduced here, therefore, expands over the idea of weighting the observed anomalies by respecting the distribution of the model variability. The QM consists of initializing the prediction with the model state whose percentile in the model distribution is the same as the percentile of the observed state in the observed distribution at the initialization time. The added value of this method is as follows: • The initial state belongs to the model attractor (as any other AI technique), • By matching the cumulative distributions of the model and observations, the observed initial state is effectively scaled with respect to the model variability.
Since with the current version of the model we do not have available an analogous set of decadal predictions as in Volpi et al. (2017b), we cannot evaluate the impact of respecting the distribution of the model variability. Therefore, the objective of this study is to assess the relative benefits and drawbacks of the novel method with respect to the state-of-the-art decadal predictions initialized with FFI. The QM is applied to all the grid-points of all ocean prognostic variables, which are the ones that are directly predicted by the model. For the implementation of the technique, the ocean reanalysis NEMOVAR-ORAS4 (Mogensen et al., 2012) is taken as the observational truth. Although NEMOVAR-ORAS4 is subject to some uncertainties, it has the advantage of providing observationally constrained and physically consistent values for all the prognostic ocean variables. At the time of this study, there was only one EC-Earth3 historical simulation available that stored the initial conditions for November. Therefore, the model distribution of each variable and grid point is computed using that historical simulation (r4i1p1f1). Figure 1 illustrates an example of the implementation of the QM method for sea surface temperature (SST) at one grid point. The blue curve represents the cumulative distribution function (defined as the probability of a variable to take a value smaller than the value given in the x-axis) of SST calculated with one member of the ocean reanalysis NEMOVAR-ORAS4, over the period 1960-2014, for the grid point considered. Similarly, the SST cumulative distribution function calculated with the historical simulation of EC-Earth3 is shown in red, for the same grid point. The circles in the NEMOVAR-ORAS4 distribution indicate the value taken by the reanalysis on the 1 st of November of the years marked in the figure. Assuming November 1960 as the target initial date, the model is initialized with the model value (marked with a yellow star) whose cumulative distribution function matches the observed one at the initialization time.
The ratio between the model and the reanalysis SST variance is shown in Figure 2. Regions with the highest variance mismatch (darkest colors in Figure 2) are the areas where the QM method applies larger corrections to scale the observed variability onto the model amplitude.

Model Description and Experimental Set-Up
The model in use for this study is the CMIP6 version of EC-Earth3 GCM (Döscher et al., 2021). Its atmospheric component is the European Centre for Medium-Range Weather Forecasts Integrated Forecasting System (IFS cycle cy36r4) in its standard resolution, with 91 vertical levels and a T255 horizontal resolution. The ocean component is the NEMO model version 3.6 (Madec and NEMO System Team, 2015), with ORCA1 configuration (about 1 degree with enhanced tropical resolution) and 75 vertical levels. The sea-ice component is LIM3 (Rousset et al., 2015) directly embedded into NEMO. The atmospheric and ocean components are coupled via OASIS3 (Craig et al., 2017). Information on the dynamic vegetation is prescribed from a previous simulation with LPJ-GUESS (Smith et al., 2014).
The benchmark hindcasts are a full field initialized experiment and an ensemble of 15 uninitialized CMIP6 historical simulations (Histo), assessed by Bilbao et al. (2021). All the experiments, including our QM, are carried out with the same model version. In the FFI experiment, all the variables from each model component are initialized with observational estimates (reanalysis). The atmosphere and land surface initial conditions are taken from the ERA-40 reanalysis (Uppala et al., 2005) for start dates before 1979 and ERA-Interim (Dee et al., 2011) FIGURE 1 | Example of the implementation of the quantile matching method: cumulative distribution function for the SST in one grid point of the Tropical Pacific in blue for NEMOVAR-ORAS4 and in red for a historical simulation of EC-Earth3. The circles in the reanalysis distribution indicate the value on November 1 of the year indicated in the figure. The prediction at that grid point will be initialized with the value from the historical simulation, which has the same cumulative distribution as NEMOVAR-ORAS4 at the initialization time (the yellow stars indicate the value for the start dates of 1960 and 2010). This calculation is made for all the grid points and all the ocean prognostic variables.
FIGURE 2 | Ratio of SST model variance over the NEMOVAR-ORAS4 variance calculated over the November 1960-2014 start dates. The NEMOVAR-ORAS4 variance is calculated concatenating the 5 ensemble members available, while for the model one EC-Earth3 historical simulation is used.
Frontiers in Climate | www.frontiersin.org afterwards. The ocean initial conditions are taken from the 3D-Var five-member ocean reanalysis NEMOVAR-ORAS4, while the sea-ice initial conditions are produced with a NEMO-LIM simulation driven by DFS5.2 forcing fluxes (Brodeau et al., 2009). The DFS5.2 forcing data corresponds to a corrected version of ERA-interim accounting for radiative and precipitation observations in the Arctic. The solar irradiance, volcanic and anthropogenic aerosol load, and greenhouse gas concentrations are prescribed using the CMIP6 radiative forcing estimates up to 2014. After that date, the SSP2-4.5 scenario (O'Neill et al., 2016) is used.
Both initialized experiments (FFI and QM) are composed by 10 ensemble members, initialized every November from 1960 until 2014 and running for 5 years. The ensemble is generated using perturbations in the 3-dimensional temperature field in the atmospheric component, and using the 5-member ensemble from the NEMOVAR-ORAS4 reanalysis in the ocean.
The QM is applied to the ocean component and is performed for all the ocean prognostic variables and grid points, by matching the 5 members of NEMOVAR-ORAS4 separately, in order to obtain 5 different initial conditions. The atmospheric and sea-ice components of the QM experiment have identical initial condition as the FFI.

Bias and Skill Estimation
The forecast time-dependent climatologies are computed for the longest common period . For each forecast year (from 1 to 5), a set of predictions is available within the 1965-2014 period (e.g., 1965 is predicted at forecast year 1 for the prediction starting in November 1964, and also as forecast year 5 for the prediction starting in November 1960). The forecast timedependent bias is calculated as the difference between the model and the observed climatologies. The drift, i.e., the evolution of the bias with forecast time, is estimated as the difference between the first and the last forecast year climatologies for the model. This estimate of the drift provides a simplistic picture but it already highlights the main features of the model drift.
To measure the forecast quality, we use the anomaly correlation (AC) and the root mean square error (RMSE). The confidence interval is calculated with a t-distribution for the AC and with a χ 2 distribution for the RMSE. The serial dependence between the hindcasts is accounted for in the computation of Frontiers in Climate | www.frontiersin.org the confidence interval using the Von Storch and Zwiers (2001) formula. The confidence interval also takes into account the trend, that is not removed in the computation of the skill. The skill scores are computed either on annual mean values, or after smoothing the timeseries with a 1-year running mean in order to filter out seasonal climate variability and focus on interannual prediction skill. To measure the impact of the QM on the forecast quality, we compute the differences of ACs between the QM and FFI, and between QM and Histo. We use the method by Siegert et al. (2017) to identify the statistically significant differences in skill between the various experiments.
We have used several observational datasets for verification purposes: the GISTEMPv4 (Lenssen et al., 2019) NASA gridded surface temperature anomaly, the EN4 (Good et al., 2013) for the ocean heat content, and HadISST (Rayner et al., 2003) for the computation of the AMV. For the study of the initial SST bias and drift, we have used the NEMOVAR-ORAS4 reanalysis (as it represents the initial reference state for QM and the initial condition for FFI). Since observations of the full AMOC cell and the barotropic stream function are not available, they are validated against the NEMOVAR-ORAS4 reanalysis. The sea level pressure is validated against the ERA-40 and ERA-Interim reanalyses.

Forecast Drift and Mean State Biases
Initializing the predictions with a state that belongs to the model attractor implies that the mean state of the predictions is biased with respect to the observations since the beginning of the forecast. The top left panel of Figure 3 shows the SST model bias with respect to the NEMOVAR-ORAS4 reanalysis in the first forecast month. The model displays a widespread warm bias over the Tropical Pacific extending toward the extratropics in the eastern basin. An even more pronounced warm bias appears in the Southern Ocean, while the Subpolar North Atlantic region presents an intense cold bias. These features are consistent with the ones found in the bias of Histo (Supplementary Figure 1). The top right panel of Figure 3 shows the annual mean climatologies of the global SST as a function of the forecast time. Consistently with the bias map, the FIGURE 4 | Anomaly correlation for the surface temperature calculated against the GISTEMPv4 dataset. The first column represents the quantile matching (QM) skill in the first forecast year (top) and the 2-5 forecast years (bottom). The middle column shows the difference between QM and Histo skill, and the third column the difference between QM and full-field initialization (FFI) skill. The black dots indicate regions of significant anomaly correlation (AC) (first column), and regions where the difference in AC is statistically significant with 95% confident level (second and third column). QM climatology (in red) results warmer than the NEMOVAR-ORAS4 climatology (black line), and remains close to the Histo climatology (in yellow) throughout the whole forecast period. In contrast, the climatology of the FFI experiment starts closer to NEMOVAR-ORAS4 but it drifts after the first forecast year. The reason for which the FFI experiment drifts away from the Histo climatologies is extensively investigated in Bilbao et al. (2021), where it is suggested to be an effect of the initial shock that leads to a collapse in Labrador Sea convection. This issue will be further discussed in section 3.3. However, the global drift toward colder temperatures is not occurring all around the globe as it is shown in the bottom right panel of Figure 3. Although a positive drift in SST is present in the Southern Ocean, the largest negative ones occur in the North Atlantic, particularly in the subpolar gyre region. Conversely, and as expected, the QM largely prevents the model drifts, with values of even an order of magnitude lower with respect to the FFI (bottom left panel of Figure 3).

Global Skill
We first evaluate the global skill in predicting the surface fields. The QM experiment exhibits high forecast quality in surface temperature (first column Figure 4) and sea level pressure (first column Figure 5) as indicated by the AC against the GISTEMPv4 dataset and ERA-40-ERA-Interim, shown in the first columns of Figures 4, 5, respectively. The skill in surface temperature in the first forecast year is globally significant, the main exceptions being a few regions over Asia and the Southern Ocean and a small region in the North Atlantic. At longer forecast times (2-5 forecast years), the region of non-significant skill in the North Atlantic expands, as well as in the Southern Ocean, and a new region in the central Tropical Pacific loses significance. Also the sea level pressure partly loses skill after the first forecast year and the regions with significant skill remain the subtropical gyre region in the Pacific and the Southern Ocean, which are the regions with the highest model biases during the first forecast month (Figure 3). When compared to Histo, the QM predictions show significant improvements over a predominant part of the Pacific and over the North Atlantic subpolar region (NASP) as well as the Southern Ocean, during the first forecast year for both the surface temperature (middle column Figure 4) and the sea level pressure fields (middle column Figure 5). The significant added-value with respect to Histo in the NASP surface temperature is maintained at longer forecast years (bottom panels of Figure 4). However, the improvement is partially lost at forecast years 2-5 in the zone of the inter-gyre position, where the subpolar and subtropical FIGURE 6 | Skill difference in the upper 300 m ocean heat content, computed against EN4 observational dataset. The first column shows the anomaly correlation (AC) difference between quantile matching (QM) and Histo for the first forecast year (top) and the 2-5 forecast years (bottom). The second column shows the difference between QM and full-field initialization (FFI). The third column is a zoom on the regional mean of the western subpolar North Atlantic sector (WSPNA; 50-65 • N, 60-30 • W) as indicated by the black box. The AC and root mean square error (RMSE) are calculated with yearly mean data along the forecast time and are shown, respectively, in the top and bottom panel. In red is shown the QM experiment, in blue the FFI, and in yellow the Histo. The thin lines represent the 95% confidence interval obtained with a t-distribution for the correlation and a χ 2 distribution for the RMSE. gyres meet. The QM also outperforms the FFI in predicting the NASP surface temperature and, more importantly, the improved skill maintains significance throughout the whole forecast time (third column Figure 4). However, analogously to what is shown in the comparison with Histo, there is a degradation of skill along the Gulf stream for the forecast years 2-5, which could potentially be due to a wrong positioning of the current in the QM predictions. Differences between QM and FFI or Histo inland temperature are marginal. The AC results are broadly consistent with the residual correlations computed according to Smith et al. (2019), which are shown in Supplementary Figure 2 of the Supplementary Material.
The comparison of QM with FFI for the sea level pressure (third column Figure 5) highlights a statistically significant skill improvement in the Antarctic circumpolar current and a degradation of skill in the Tropical Pacific and the Indian Ocean during the first forecast year, while significance in the skill difference is lost at forecast years 2-5.
We now focus on the upper layer ocean heat content (0-300 m) because of its crucial role in the ocean meridional transport: Figure 6 shows the skill comparison between QM and Histo/FFI. Similarly to what is shown for the surface temperature, the QM significantly improves the skill in the Pacific and the NASP region with respect to Histo, although there is not a clear improvement over the Southern Ocean. Besides, skill over the Indian Ocean is improved. Part of those improvements are also maintained at longer timescales. When comparing with the FFI prediction, we find that the added value of the QM is mainly located over the NASP region, consistently with the surface temperature results.
We now zoom over the western subpolar North Atlantic region (50-65 • N, 60-30 • W, WSPNA), as this is a region in which SST and ocean heat content have been shown to influence the temperature and precipitation over the neighboring continents (Buckley et al., 2019). The third column of Figure 6 shows the AC (first row) and the RMSE (second row) over the WSPNA region as a function of the forecast time. The highest values of ACs and the lowest values of RMSE over the WSPNA region and during the entire forecast time are obtained with the QM (red lines). Conversely, the Histo simulations (yellow lines) do not show significant AC in any of the forecast years, whereas the FFI loses its skill after the first forecast year.

Skill in the North Atlantic
In the previous section, we have shown significant improvements in the NASP region, specifically in the Western sector of QM with respect to FFI. Bilbao et al. (2021) performed an assessment for an analogous decadal prediction experiment based on FFI, and highlighted the collapse of deep convection in the Labrador Sea with a consequent weakening of the AMOC, subsequent to initialization. This led to a lack of prediction skill for the upper ocean heat content and the surface temperature in the NASP. We therefore look at the deep convection activity in the Labrador Sea in an attempt to explain the origins of the QM skill improvements presented in the previous section. Figure 7 shows the ensemble mean of the mixed layer depth in the Labrador Sea (55-65 • N, 56-46 • W), as a proxy for the deep convection intensity, for the February-March-April mean, which are the months of deepest mixing. In order to better understand the behavior of the Histo simulations, we have isolated three members that show no deep convection throughout most of the historical period (top right panel of Figure 7), from the rest of the Histo ensemble (top left panel). The Histo ensemble mean with regular convection is representative of the response to external forcings only and shows lower values than the NEMOVAR-ORAS4 reference. In the experiments with FFI, deep convection collapses very quickly with forecast time (bottom right panel of Figure 7) preventing FFI to follow the NEMOVAR-ORAS4 variability, consistently with what was found by Bilbao et al. (2021). In contrast, QM (bottom left panel of Figure 7) is able to partly reproduce the variability present in NEMOVAR-ORAS4: after initialization, it shows a slight increase in convective activity during the two peaks observed in the 1980s and in the 1990s. However, this does not directly translate into an increase of forecast skill, as the QM does not improve with respect to the Histo ensemble that preserves the convection. While Histo has a constant skill throughout the whole forecast time, QM presents statistically significant AC only during the first year (top left panel of Figure 8). This is probably due to the fact that the skill in predicting the mixed layer depth is dominated by the external forcing changes that have induced an increase in stratification hindering convection activity. The additional model variability that attempts at capturing the observed one rather constitutes an additional noise, which is detrimental to the forecast performance. The results in terms of RMSE (bottom left panel of Figure 8) are consistent with what is shown by the AC.
We assess the skill in predicting the barotropic circulation by looking at the barotropic stream function, which characterizes the horizontal water transport integrated vertically in the Labrador Sea (right column of Figure 8). We find that Histo do not show any skill. The initialized predictions, on the other hand, have similar skill at the beginning of the forecast. However, the skill of FFI deteriorates with forecast time and is lost at forecast year 3, while the skill of QM is maintained throughout the whole forecast time.
In summary, we have shown that the QM avoids the Labrador Sea deep convection collapse that occurs in the FFI experiment, and improves the prediction skill of the barotropic stream function with respect to both the FFI and the Histo simulations. . The second row shows the respective root mean square error (RMSE). Quantile matching (QM) is shown in red, the full-field initialization (FFI) in blue, the Histo with convection is shown in yellow, and in purple the Histo members with no convection. The skill are computed against NEMOVAR-ORAS4, using the February-March-April mean for the mixed layer depth, and a running mean of 12 months for the barotropic stream function. The thin lines represent the 95% confidence interval obtained with a t-distribution for the correlation and a χ 2 distribution for the RMSE.
We are now interested in investigating the effect of these improvements on the AMOC. Since the AMOC skill depends on the latitude range at which it is considered, we show the results in a Hovmoller diagram of the AC (Figure 9). We have computed the AMOC index as the maximum of the Atlantic meridional overturning stream function in adjacent latitude-depth boxes of latitude range of 2 degrees (from 20 • to 60 • N) and depth range of 900-3,000 m depth. If we focus north of 40 • N, the FFI starts with high skill that progressively deteriorates until the third forecast year down to an AC smaller than 0.1, probably due to the collapse in deep convection. This is largely improved by the QM that shows high skill in predicting the AMOC, particularly at subpolar latitudes, for the entire forecast time. The skill of FFI at lower latitudes is marginally higher than the QM one throughout the whole forecast time. One reason to explain this is that the detrimental effect of the convection collapse might need longer time to propagate at lower latitudes than the 5 years that we have available. The skill of the Histo simulations is constant with forecast time. This is consistent with the results in Borchert et al. (2021), where the good representation of the NASP SST in the CMIP6 historical simulations is attributed to the AMOCrelated response to the forcings (volcanic eruptions and partly solar forcing).
Finally, we complement the North Atlantic analysis with the assessment of the skill in predicting the AMV. The AMV index is calculated as the difference between the regional SST anomalies in the North Atlantic (0 • to 60 • N and 80 • to 0 • W) and the global mean SST anomalies (between 60 • S and 60 • N), following the definition by Trenberth and Shea (2006). The ensemble QM predictions successfully capture the observed warming trend (left panel of Figure 10). In terms of skill, both the initialized predictions and the Histo simulations succeed in predicting the AMV index with skill significantly different than 0 along the whole forecast period as shown by the confidence intervals (thin lines in Figure 10). The prediction skill of FFI starts close to the skill of Histo and it might be generated FIGURE 9 | Hovmoller diagrams of the AMOC anomaly correlation with NEMOVAR-ORAS4. The Atlantic Meridional Overturning Circulation (AMOC) is calculated as the maximum of the Atlantic meridional overturning stream function over adjacent boxes of 2 • latitude (from 20 • to 60 • N) and 900-3,000 m depth. The panels show, respectively, the quantile matching (QM) (top left), the full-field initialization (FFI) (top right), the Histo (bottom left), and the difference between the QM and the FFI skill (bottom right).
FIGURE 10 | Atlantic multidecadal variability (AMV) index for the quantile matching (QM) predictions (left). The index computed from HadISST is shown in black. The different colors represent the different start dates. The anomaly correlation (AC) and the root mean square error (RMSE) of the AMV index are shown, respectively, in the central and right panels. The skill scores are calculated against the HadISST dataset, applying a 2-year running mean to the data. As in previous figures, the thin lines represent the 95% confidence interval obtained with a t-distribution for the correlation and a χ 2 distribution for the RMSE. by the AMV persistence. During the first two forecast years, QM outperforms FFI, both in terms of AC and RMS (second and third panel of Figure 10), although the difference is not statistically significant.

DISCUSSION AND CONCLUSIONS
In this study, we have presented a novel initialization method for decadal climate predictions, namely the quantile matching. This method aims at tackling two major issues: • The drift coming from initializing the predictions away from the model preferred trajectories, similarly to any other anomaly initialization method; • The potential inconsistencies between the observed/model distribution of variability.
The quantile matching method selects the initial condition of the prediction as the model state, which is identically located in the model distribution as the observed initial state in the observed distribution. Therefore, the initial state belongs to the model attractor, reducing the drift. Moreover, matching in the observed and model statistical distributions scales the observed variability toward the model one, correcting any potential amplitude incompatibilities. The forecast skill assessment of a decadal prediction with EC-Earth3, initialized with the quantile matching method, has led to the following findings: • Surface temperature has high significant skill throughout the forecast years, in agreement with other studies (e.g., Smith et al., 2019). • In the North Atlantic subpolar region, the quantile matching enhances the surface temperature and ocean heat content skill, compared to the historical simulations and the full field initialized predictions. • The quantile matching avoids a collapse of deep convection in the Labrador Sea that occurred in the full field initialized predictions. This could be due to the positive effect of initializing the predictions on the model attractor. However, initialization does not seem to have a direct impact on the skill of the mixed layer depth, as no significant skill improvements are detected compared to the historical simulations. This is probably due to the fact that the skill in predicting the mixed layer depth is dominated by the trend that the historical simulations is able to capture. • The skill of the barotropic stream function in the Western subpolar North Atlantic sector for the quantile matching predictions is significant throughout the whole forecast period and is the highest compared to both the full field initialized predictions and the historical simulations. • The quantile matching has a positive impact on the Atlantic Meridional Overturning Circulation, which is skillfully predicted throughout the whole forecast time. The Atlantic Multidecadal Variability is well-forecasted along the whole forecast time by both the initialized predictions and the historical simulations.
The effect of the reduced prediction drift translates into an improved forecast skill of the meridional and barotropic ocean transports in the North Atlantic.
Considering that the ocean holds most of the memory in the climate system, we have performed the quantile matching on the ocean model component only. However, previous studies have highlighted the importance of having a sea-ice initial state consistent with the ocean state (Volpi et al., 2017a;Tian et al., 2020). Therefore, the use of different initialization techniques for the ocean and sea-ice components could have generated some instabilities in the quantile matching decadal predictions. A step further for this work would be performing the quantile matching also in the sea-ice model component. Besides, the use of more sophisticated atmosphere-ocean coupled initialization technique (Counillon et al., 2014;Brune and Baehr, 2020) would address the issue of the initial shock originated by the use of disjoint reanalyses for the initialization of the model components (Mulholland et al., 2015).

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
DV, VG, PO, and FD-R designed the quantile matching experiment. DV performed the decadal climate predictions and performed the analysis with the contributions of RB, VM, and RM. PE assisted with the settings of the workflow manager. AA contributed to the cmorizations of the outputs. RM and AA ran the benchmark simulations. PO, VG, FD-R, VM, and SC contributed to the discussion and the interpretation of the results. DV and VM prepared the manuscript with the contribution of all co-authors.

FUNDING
This study was supported by the project LISTEN funded by the European Commission Horizon 2020 Marie Skłodowska-Curie Actions -IF (GA 799930).

ACKNOWLEDGMENTS
The authors thankfully acknowledge the computer resources from the ECMWF special project INCIPIT (spitvolp) and the technical assistance provided by ECMWF and BSC. The climate simulations have been performed using Autosubmit workflow manager (Manubens-Gil et al., 2016). The authors thank Paolo Davini for providing the restart files of the historical simulation used to implement the quantile matching. We acknowledge Saskia Loosveldt and the earthdiags and ESMValTool suite developers, as well as the startR and s2dverification (Manubens et al., 2018) software packages developers, as these tools were used to postprocess, analyze, and visualize the results presented in this work. PO acknowledges support by the Spanish Ministry of Economy, Industry and Competitiveness through the Ramon y Cajal grant (RYC-2017-22772).