Multi-Model Multi-Physics Ensemble: A Futuristic Way to Extended Range Prediction System

In an endeavor to design better forecasting tools for real-time prediction, the present work highlights the strength of the multi-model multi-physics ensemble over its operational predecessor version. The exiting operational extended range prediction system (ERPv1) combines the coupled, and its bias-corrected sea-surface temperature forced atmospheric model running at two resolutions with perturbed initial condition ensemble. This system had accomplished important goals on the sub-seasonal scale skillful forecast; however, the skill of the system is limited only up to 2 weeks. The next version of this ERP system is seamless in resolution and based on a multi-physics multi-model ensemble (MPMME). Similar to the earlier version, this system includes coupled climate forecast system version 2 (CFSv2) and atmospheric global forecast system forced with real-time bias-corrected sea-surface temperature from CFSv2. In the newer version, model integrations are performed six times in a month for real-time prediction, selecting the combination of convective and microphysics parameterization schemes. Additionally, more than 15 years hindcast are also generated for these initial conditions. The preliminary results from this system demonstrate appreciable improvements over its predecessor in predicting the large-scale low variability signal and weekly mean rainfall up to 3 weeks lead. The subdivision-wise skill analysis shows that MPMME performs better, especially in the northwest and central parts of India.


INTRODUCTION
The Indian summer monsoon is an economically prodigious phenomenon accountable for the gross domestic product (GDP) of the world's second-largest populated country (Gadgil and Gadgil, 2006). A voluminous scientific literature unveils the manifold aspects and theories concatenating the events of this significant annual occurrence (Raghavan, 1973;Rao, 1976;Sikka and Gadgil, 1980;Parthasarathy et al., 1992;Webster and Yang, 1992;Goswami et al., 1999;Wang and Fan, 1999;Jiang et al., 2004;Joseph and Sijikumar, 2004;Goswami, 2005;Annamalai, 2010;Rajeevan et al., 2010). Apart from being a decisive economic factor, the monsoon has perpetuated the research in recent decades to undertake the emanating climate changes and accompanied extreme weather conditions (Goswami et al., 2006(Goswami et al., , 2019Ajayamohan and Rao, 2008;Guhathakurta and Rajeevan, 2008;Rajeevan et al., 2008;Joseph et al., 2015;Parker et al., 2016;Sooraj et al., 2016;Houze et al., 2017;Roxy et al., 2017). In the above view, the prediction of monsoon is not only exigent but is highly inevitable. The significant rainfall contribution from intraseasonal scale variability in the monsoon highlights the stature of sub-seasonal to seasonal (S2S) scale prediction (Abhilash et al., , 2014bVitart and Robertson, 2018;Robertson et al., 2019). The deterministic prediction on the S2S scale has limitations, and therefore probabilistic methods or ensemble prediction systems are considered (Molteni et al., 1996;Buizza et al., 2007Buizza et al., , 2008Vitart and Molteni, 2009;Rashid et al., 2011). The prediction from an ensemble of perturbed initial conditions (ICs) is one of the popular techniques. An idea initiated from the extratropical cyclogenesis problem (Bjerknes and Solberg, 1922) and an instigating theory of baroclinic instability (Charney, 1947;Eady, 1949), the perturbations in the atmospheric flows became a central solution to the initial value problem of numerical weather prediction (O'Malley, 1988). Later, it was polished into a well-versed technique to generate the ensemble of ICs to enhance prediction skill across various weather scales Kalnay, 1993, 1997;Palmer, 1995, 1998). The atmospheric lagged average is another traditional ensemble generation method (Hoffman and Kalnay, 1983;Kalnay and Dalcher, 1987;Chen et al., 2013) where the forecast from different initialization for the same target period is amalgamated into ensemble mean. These two techniques are famously known to address the uncertainties sourced from ICs.
Some of the recent literature incline toward grand ensemble based on multiple models (Krishnamurti et al., 2000;Sahai et al., 2013;Abhilash et al., 2015;Kalnay, 2019). The advantages in one or more aspects of one model formulation over the other could provide better assistant in the multi-model approach. The concept of inter-model diversity arises from the need to address another class of errors recognized as model-errors. Although there is a varying perspective on the nature and origin of these errors, they are largely attributed to the representation of physical processes in the model. The approximations considered while formulating parameterization schemes and misrepresentation of significant sub-grid scale phenomena in the model could cause biases in the predicted fields. Further, it is proposed that the multi-physics ensemble scheme can be an alternative to account for these model-errors (Richardson, 1997;Harrison et al., 1999;Orrell et al., 2001). The intra-model diversification introduced by using more than one physical parameterization showed significant improvement over single physics predictions (Stensrud and Fritsch, 1994;Berner et al., 2011;Tapiador et al., 2012;Greybush et al., 2017;Xu et al., 2020).
The above-mentioned ensemble prediction techniques have advantages as well as limitations when it comes to real-time prediction. For example, the perturbed initial conditions based ensemble could palliate the growth of initial errors, but such ensemble tends to be under dispersive, leading to presumptuous probabilistic prediction and underestimated larger weather anomalies (Stensrud et al., 2000). Similarly, lagged ensembles with improper weights from older initializations can debase the mean forecast (Abhilash et al., 2014b). Further, the modelerror ensemble techniques require physical consistency among the members in terms of errors but are known to increase the ensemble spread (Green et al., 2017). Therefore, careful examination of these techniques is required to achieve the desired improvement.
The efficacy of any prediction tool is determined by its validity and reliability measured as the forecast skill (Murphy, 1991;Casati et al., 2008). Many skill assessment and verification methods are available to evaluate and compare various prediction strategies (Ghelli and Ebert, 2008;Jolliffe and Stephenson, 2011;Ebert et al., 2013). These methods increase the confidence in any prediction approach and motivate to understand and improve the limitations in the hypothesis formulation.
The skill analysis is vital, especially for the complex monsoon systems giving a significant annual rain share. In the present study, we evaluate the skill of a multi-model multi-physics ensemble prediction strategy for the Indian summer monsoon. This strategy is a part of developing a new extended range ensemble forecasting framework and here we will compare it to its current operational version. The functional version is only a multi-model ensemble prediction system (Sahai et al., , 2016Abhilash et al., 2014cAbhilash et al., , 2015 developed under the "National Monsoon Mission(NMM)" project (Rao et al., 2020) and has received acclaim on its successful implementation in 2016. This operational version is being used for extended range prediction (ERP) at the India Meteorological Department (IMD) and provides outlooks for rainfall, heatwaves, cyclones, and other meteorological parameters for various sectoral applications . The next ERP version under development uses a multi-physics approach along with the multi-model framework. The results presented here are from the preliminary runs of this new version generated from unperturbed ICs. The comparison with the older version highlights its usefulness and drawbacks. This documentation will be handy for further improvements and modifications in the new framework.
The next section elaborates more on both the prediction systems as well as methodologies and datasets utilized in the study. The skill of ERP systems is discussed in the subsequent section, followed by conclusions.

DATA AND METHODOLOGY
The operational ERP system at IMD is a multi-model ensemble framework . It comprises of two horizontal resolution variants (with 382 and 126 truncations) of two models; climate forecast system version 2 (CFSv2) and atmospheric global forecast system (GFS) from National Centers for Environmental Prediction (NCEP) (Saha et al., 2014). Further, this total of four variants run with a four-member ensemble of perturbed atmospheric ICs. These atmospheric ICs are obtained from National Center for Medium Range Weather Forecasting and oceanic ICs from Indian National Center for Ocean Information Services for CFSv2. Additionally, the real-time sea-surface temperature (SST) from CFSv2 after bias-correction is used as forcing to GFS (detailed technique can be seen in Abhilash et al., 2014a, and Kaur et al., 2020). This ERP system was developed and thoroughly tested for skill at Indian Institute of Tropical Meteorology (IITM) under NMM. The operational forecasts are generated every week with Wednesday ICs for the next 32 days, also on the fly hindcast for 2003 to 2015 is produced for each IC. This system is henceforth addressed as ERPv1 in the paper.
The successor version of the above-mentioned prediction system is in the final development stage. This new ERP system also has two model variants CFSv2 and GFS, but the two resolution variants are now replaced with one seamless mode where the horizontal resolution of T574 transitions into the coarser T382 resolution after 15 days. Additionally, a multiphysics strategy is adopted for generating ensemble. We have used three convective parameterization permutations with two micro-physics parameterizations. These convection schemes include Simplified-Arakawa Shubert (SAS) (Pan and Wu, 1995), revised deep-convection SAS (NSAS) (Han and Pan, 2011), and revised SAS with modified shallow-convection (NSAS_SC) (Han and Pan, 2011). Zhao and Carr (ZC) (Zhao and Carr, 1997) and Ferrier (FER) (Ferrier et al., 2002) are the two micro-physics schemes incorporated in the new formulation. The resultant six physics combinations are SASZC, SASFER, NSASZC, NSASFER, NSASZC_SC, and NSASFER_SC. CFSv2 runs with all six combinations, whereas GFS has only four and does not include SASZC and SASFER. Similar to ERPv1, GFS is forced with bias-corrected CFSv2 real-time SST. The NCEP climate forecast system reanalysis ICs are utilized for both CFSv2 and GFS. The new multi-physics multi-model prediction contains 36 days forecast initialized on 1st, 6th, 11th, 16th, 21st, and 26th of each month for hindcast period 2001-2015. We are going to label this physics-based multi-model ensemble as MPMME hereafter.
The anomaly correlation coefficient (ACC), Pearson correlation, Heidke skill score (HSS) (Barnston, 1992), root mean square error (RMSE), root mean square skill score (RMSS), and Brier skill score (BSS) (Brier, 1950) are the verification matrices used to analogize MPMME skill with ERPv1. The verification is done for the weekly mean rainfall forecast at 4-week leads. The week 1 lead corresponds to the initial 7 days forecast, subsequent 8-14 days constitute week 2; similarly, 15-21 and 22-28 days forecast defines week 3 and 4, respectively, using common hindcast 2003-2015 from both the versions of ERP system. The sample size considered for ERPv1 is 22 weeks × 13 years = 286 forecasts for each lead. Similarly, MPMME has a sample of 24 weeks × 13 years = 312 forecasts. The skill scores are computed against observed daily rainfall from Tropical Rain Measure Mission (TRMM) merged rainfall provided by IMD (Mitra et al., 2009;Pai et al., 2014). The Monsoon Intraseasonal Oscillation (MISO) indices are computed following Sahai et al. (2013) and Suhas et al. (2013). It is specified that the MPMME includes only control runs (i.e., six members from CFSv2 and four members from GFS). Therefore, We have selected only 10 members from ERPv1 (three from each variant of CFS model and two from each GFS variant) for a fair comparison with 10-member MPMME.

RESULTS
The hindcast from both ERPv1 and MPMME is analyzed for skill in predicting Indian summer monsoon weekly mean rainfall (ISMR) from June to September.  Rajeevan et al. (2010) over central India is the widely used prototypical monsoon region known as the core monsoon zone. Figure 1 and Table 1 illustrate the skill of predicted weekly mean rainfall averaged over the monsoon zone at 4-week leads. The ERPv1 has 0.78 and 0.63 ACC in week 1 and 2 lead, respectively. It is improved by almost a factor of 0.1 in MPMME for both the weeks. Although the skill is dropped in the third, it is still above 0.4, the practical skill limit. In the 4th week, the skill further declined. The difference between the deterministic prediction skill of both systems over the monsoon zone is statistically significant at 99.9, 95%, and above 90% confidence level for week 1, 2, and 3, respectively, and the difference in skill is not significant in the 4th week. Apart from spatial non-uniformity, the monsoonal rainfall has well-documented temporal variability that arises from intraseasonal fluctuations. These fluctuations are recognized as spells of increased and minimum to no rain conditions over the monsoon zone. The transitions between these two spells are challenging but crucial, and models would have difficulties predicting such transitions, limiting the predictability of monthly rainfall. Figure 2 compares the monthly skill of weekly averaged rainfall over the monsoon zone for both systems. June and September have higher skill than July and August in both systems, which could be attributed to model inefficiency to predict frequent synoptic-scale systems in later months. However, the coefficient values are >0.6 for both systems in the first 2 weeks, which are reduced in following leads.
Regarding improvement, the month of June (Figure 2A) record the highest increase in the skill where at all 4 leads, MPMME shows 10, 18, 32, and 12% improvement over ERPv1, which is significant at 95% confidence level. The significant phenomena during June, such as monsoon onset and cyclonic system genesis impact the subsequent progress of the monsoon. These events are important to be predicted especially for dam management for releasing and storage planning of water, for agro-met services to begin sowing, and for disaster mitigation due to extreme rainfall activities. Hence, improvement in prediction skill of June will be highly beneficial for real-time ERP of monsoon onset and extreme rainfall conditions. Further, July, August, and September witness an increase in ACC up to week 3 lead (except week 2 lead during August) for MPMME. Relatively less skill is seen for the 4th week for these months than ERPv1, but the difference is insignificant as ERPv1 skill is also <0.4.
HSS gives fractional betterment of the forecast over a reference forecast, which is climatology in our case. HSS for deterministic forecast verification of various thresholds for weekly mean rainfall over monsoon zone is plotted in Figure 3. The skill decreases for higher rainfall thresholds at all weekly leads, indicating both versions' limitation in predicting heavy rainfall. However, the MPMME could perform better than reference forecast minimum up to 3 weeks leads for the given thresholds. The figure affirms the improvement in MPMME performance over ERPv1.

Skill Assessment Over Meteorological Subdivisions
The ERP broadens the application spectrum of the meteorological forecast. These different sector-specific applications stipulate meteorological information at the finer spatial scale. However, the generation of stakeholder requisite forecast products requires skill assessment at the smaller spatial scale because the area-averaged precipitation skill will not be sufficient. Therefore, we will look into the skill for the meteorological subdivisions of India. There are 36 such subdivisions across the country; for further details, please refer Joseph et al. (2019).
The standard signal to noise ratio (SNR) is considered here to find out the limit of rainfall predictability (lead at which SNR becomes one) for these subdivisions. Figure 4 shows the spatial map of predictability with color indicating the number of predictable days. In both ERPv1 ( Figure 4A) and MPMME (Figure 4B), maximum subdivisions show predictability of 10-14 days. A very few numbers of subdivisions, i.e., 5, have predictability >16 days. The number of such subdivisions with predictability higher than 16 days is almost doubled in MPMME. Simultaneously, the number is reduced by two for subdivisions with <8 predictable days in MPMME compared to ERPv1. In total, more than 12 subdivision show improvement in predictability by 2-4 days in MPMME, these subdivisions fall into north and northwest India. A similar increment is also seen for a few subdivisions in southern peninsular and northeast India. In contrast, for many subdivisions in central India, the predictability remains unchanged in MPMME, except a very few subdivision (i.e., 4) where predictability dropped by 1-2 days.
The week-wise anomaly correlations for subdivisions are shown in Figure 5, where ACC>0.2 is statistically significant at a level of 99.9%. Both ERPv1 and MPMME have good skill in the week 1 and 2 forecast, with MPMME outperforming ERPv1 for maximum subdivisions. The lead-in prediction skill is maintained in week 3 by MPMME, where many subdivisions have ACC >0.2 and 0.3 in contrast to ERPv1. Week 4 is less skillful than the first 3 weeks in both ERPv1 and MPMME, where most of the subdivisions shows ACC smaller than 0.2. Figure 6 illustrates the RMSS values from ERPv1 and MPMME at 4 leads for meteorological subdivisions; the shaded values (i.e., >0) indicates reasonable prediction skill. Furthermore, similar to ACC, RMSS is also better in MPMME than ERPv1 for up to 3 weeks. Subdivisions in northeast India are less skillful in both ERPv1 and MPMME; the previous authors have linked lower predictability to more rainfall contribution from less predictable synoptic systems over the northeast regions (Abhilash et al., 2018;Joseph et al., 2019).
Overall, MPMME show reasonable improvement in deterministic skill over northwest and central India compared to ERPv1. The studies have reported the more frequent occurrence of extreme rainfall over these regions (Singh et al., 2011;Woo et al., 2019;Joseph et al., under revision;Rai et al., 2020). Therefore, improved predictability and prediction skill for these regions in MPMME can effectively improve the extreme event prediction (will be addressed in a separate study).

Prediction Skill for Monsoon Intraseasonal Oscillation
MISO is one of the most dominant mode of low-frequency intraseasonal variability, known to provide predictability in the extended range during the Indian summer monsoon. The enhanced skill witnessed in earlier sections could be explained by analyzing the model's ability to capture this large-scale signal.
The MISO prediction skill is computed in terms of bivariate anomaly correlation coefficient (BVCC) and root mean square error as mentioned in Rashid et al. (2011) of predicted MISO Indices from all ICs with the observed. The leading pair of model predicted MISO Indices from all ICs with the observed counterpart are utilized for ACC and RMSE computation. The BVCC is plotted in Figure 7 along with the RMSE for MISO indices as a function of lead days. We consider BVCC >0.5 and RMSE lower than 1.4 as a threshold for skillful MISO prediction. The horizontal line intersects the BVCC axis at 0.5 and RMSE axis at 1.41 to track the significant skill and error limit. The black line represents the combined skill for all MISO phases, whereas blue and cyan show the evolution of CC and RMSE for the transition to the active and break phase, respectively. The figure clearly shows a gain in skill for MPMME over ERPv1. The ERPv1 reaches the prediction limit in around 19 days, whereas the MPMME has this limit beyond 21 days.
The study from Goswami and Xavier (2003) reveals that the potential predictability for break (less to no rain) conditions during monsoon is high compared to the active. They also suggested that higher predictability of transition to break phase is due to governance of error growth in this phase by low frequency (30-60 days) signal. Abhilash et al. (2014b) also showed that ERP (from CFSv2-based 11-member ensemble) of breaks are more skillful. Similar inference can be made from Figure 7 for phasedependent prediction skill of both systems; ERPv1 and MPMME also show slightly better predictability for break transition (Cyan line in Figures 7A,B) than active (blue). MPMME have improved skill for both (active and break) phases in comparison to ERPv1. The higher predictability in northwest regions of India is associated with low-frequency monsoon oscillations , which is evident in Figure 5. Therefore, the 2-4 days increase in predictability in MPMME over these regions can be attributed to about 2 days gain in the skill of this low variability signal skill, i.e., MISO.

Probabilistic Forecast Skill
In the previous sections, we have evaluated the deterministic prediction skill from ERPv1 and MPMME; in this section, we look into some probabilistic verification. The BSS is calculated for categorical rainfall probabilistic prediction (Figure 8). Based on the tercile method, three categories are defined as above normal (when the rainfall amount is more than upper tercile value), near normal (when it is between upper tercile and lower tercile values), and below normal (when it is below the lower tercile value). The probabilities for either category are 100% for the observation.
The BSS compares the brier score of the forecast with the reference forecast (climatology), assuming a 33% equal occurrence probability for each category. A BSS value >0 FIGURE 8 | Brier skill score from extended range prediction system (ERPv1) and multi-physics multi-model ensemble (MPMME) for (A) above normal, (B) near normal, and (C) below normal categorical rainfall forecast over monsoon zone.
indicates an improvement over climatology. Both ERPv1 and MPMME have better skill in predicting the above normal ( Figure 8A) and below normal ( Figure 8C) categories up to 2 weeks. The near-normal ( Figure 8B) rainfall predictions are comparatively less skillful in both systems. It is interesting to note from the figure that the considerable improvement is there for MPMME over ERPv1 in almost all categories in the first 2-week leads. The skill of both system reduces at longer leads.
The analysis presented in this section favors MPMME until almost 21-day lead. However, the current results are only from multi-physics ensemble, i.e., addressing model-errors to some extent. Nevertheless, the model also suffers from initial condition errors at longer leads. A few earlier studies have also concluded that physics ensemble along with perturbed initial conditions ensemble could provide better skill by addressing two major bias components (Stensrud and Fritsch, 1994;Stensrud et al., 2000). Therefore, the results of this study can be ameliorated further with the careful selection of physics and initial condition ensemble.

CONCLUSIONS
The present work highlights the improvements of a physicsbased multi-model extended range ensemble prediction system over its predecessor operational version in predicting ISMR. This new MPMME framework distinguishes itself from the ERPv1 in its single seamless horizontal resolution, and most importantly, for considering multiple realizations of atmospheric dynamics achieved by permutations of convection and microphysics parameterizations.
The skill of MPMME and ERPv1 for ISMR is compared using different verification scores for spatiotemporal forecast evolution at weekly leads. ACC for hindcast of 2003-2015 from MPMME signifies an improvement for monsoon zone rainfall over ERPv1 up to 3-week lead. The MPMME prediction skill for 4 months from June to September also witnesses increment up to 3-week lead. The month of June has the highest skill for MPMME at all 4-week leads, which will come in handy for predicting monsoon onset and extreme rain-producing systems in the onset phase. The HSS for rainfall over monsoon zone elucidates the enhanced skill at all thresholds of the weekly mean rainfall for MPMME over ERPv1.
The MPMME extends the predictability limit by 2-4 days compared to ERPv1 as indicated by sub-division map. Similarly, the conclusion drawn from subdivision-wise ACC and RMSS favors MPMME in most regions through 7, 14, and 21-day leads. The subdivisions in the northwest and central parts of the country exhibit a maximum increase in the skill. All the phases of large-scale MISOs have a better prediction from MPMME than ERPv1, reflecting in the overall gain in predictability at the subdivision level. The tercile-based categorical rainfall prediction is verified for the probabilistic skill of both systems. BSS for these categorical rainfall occurrences exhibits the superiority of physics-based MPMME over ERPv1.
Although the results presented here are from the preliminary development stage, the different verifications used in the study support the MPMME over its operational version ERPv1 up to 21 days lead. Further assessment of signal and noise added per physics combination could assist in considering the weighted average of these ensembles to generate the forecast products. Since we have only considered the unperturbed control initial condition in MPMME, adding a few perturbed IC members could further help improve the prediction, especially at longer leads, by controlling the growth of initial condition uncertainty (Stensrud and Fritsch, 1994;Stensrud et al., 2000). The study proffers the utility of physics-based ensemble and finds its scope in further exploration. It is anticipated that the enhanced temporal skill for June and spatial skill for northwest and central regions of India could probably improve the extreme event prediction.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
AS, SJ, RP, and RC have conceptualized the MPMME strategy. RP, RM, AD, and MK were involved in the model runs, datahandling, and processing. MK has done the analysis, plotting, and structured the manuscript draft. All authors have contributed in writing and editing the final manuscript.

ACKNOWLEDGMENTS
Research at IITM was fully supported by the Ministry of Earth Sciences and authors sincerely acknowledge it. The analysis and model integrations are performed on Aditya and Pratyush HPCS. We were thankful to Dr. D. R. Pattanaik for ERPv1 runs. The present work was part of Ph.D. thesis of MK. We express our gratitude to two reviewers for their insightful review, which contributed significantly to our manuscript's quality.