Application progress of ensemble forecast technology in influenza forecast based on infectious disease model

To comprehensively understand the application progress of ensemble forecast technology in influenza forecast based on infectious disease model, so as to provide scientific references for further research. In this study, two keywords of “influenza” and “ensemble forecast” are selected to search and select the relevant literatures, which are then outlined and summarized. It is found that: In recent years, some studies about ensemble forecast technology for influenza have been reported in the literature, and some well-performed influenza ensemble forecast systems have already been operationally implemented and provide references for scientific prevention and control. In general, ensemble forecast can well represent various uncertainties in forecasting influenza cases based on infectious disease models, and can achieve more accurate forecasts and more valuable information than single deterministic forecast. However, there are still some shortcomings in the current studies, it is suggested that scientists engaged in influenza forecast based on infectious disease models strengthen cooperation with scholars in the field of numerical weather forecast, which is expected to further improve the skills and application level of ensemble forecast for influenza.

(i.e., probability of a patient to infect others), the probability of conversion from a latent period person to an infected person, the recovery rate of infected persons, the mortality rate and the coefficient of government interventions.After setting the relevant sensitivity parameters and the initial values of the differential equations (such as the number of cases at present, etc.) in advance, the number of influenza cases in the future can be achieved by numerical integration of the differential equations.
After decades of continuous development, the infectious disease models have shown good potentials for application.However, the initial values in infectious disease models still inevitably have certain errors, and the relevant sensitivity parameters in the models are all set according to users' experiences.Due to the high nonlinearity of infectious disease models, the error of the initial values and the relevant sensitive parameters will be amplified with the extension of forecast lead time and eventually lead to large biases of the forecast results, which limits the accuracy of the model forecast results to a certain extent.Therefore, it is worthy to quantitatively reflect the uncertainty of the initial values and sensitive parameters in infectious disease models, thus to solve the uncertainty problems in the single deterministic forecast result and improve the accuracy and application level of the infectious disease model forecasts.In view of this, learning from and applying the ensemble forecast technology developed in the field of numerical weather forecast is expected to effectively solve the above problems.
In recent years, it is noticed that the ensemble forecast technology has been applied in forecasting influenza cases based on infectious disease models, this paper will review the literature.Two keywords of "influenza" and "ensemble forecast" are selected to search and select the relevant literatures, which are then outlined and summarized.In addition, some suggestions are put forward, according to the author's experiences in research and application of ensemble forecast technology for several years.

Introduction of ensemble forecast technology
Ensemble forecast technology is developed in the field of numerical weather forecast.The essence of numerical weather forecast is to calculate the forecast value in the future by repeatedly integrating the differential equations representing the atmospheric motion started from the initial values, which is consistent with the essence of forecasting influenza cases based on infectious disease models.Due to the chaotic characteristics of the atmosphere, any small error in the initial values may quickly diverge the outcomes after a period of integration, and sometimes may even result in completely opposite results.In order to solve the above problems, the concept of ensemble forecast was put forward in the 1970s (3): Based on a certain mathematical method, a set of initial values with certain probability density function (PDF) distribution characteristics are firstly generated (as shown in Figure 1), each initial value may represent the real condition of the atmosphere.After this, ensemble forecast results can be achieved by numerical integration of each initial value (usually combined with different physical process parameterization schemes, planetary boundary layer conditions or even based on different models), thus to inferring the evolution of the PDF of atmospheric states over different forecast lead time.
Ensemble forecast is no longer single deterministic forecast, but a group of forecasts, each of which can be called an ensemble member, and the divergence degree of ensemble members' forecasts (i.e., the ensemble spread) can be used as a quantitative representation of the forecast uncertainty (i.e., the forecast error).Appropriate post processes for ensemble members' forecasts can achieve corresponding post-processed deterministic forecast products, and the forecast performance of these products are usually significantly better than that of the original single deterministic forecast.In addition, modern ensemble forecasts are expressed probabilistically other than deterministically, more decision mistakes could be avoided if the decisions are made based on whether the probabilities exceed some prior determined threshold for action, which is an important aspect for the application of ensemble forecast technology.
Ensemble forecast has become a relatively mature technology in the field of numerical weather forecast, and has been widely used in the operational forecasting practice (4).Meanwhile, as a scientific way to solve the uncertainty problems existing in single deterministic forecast, it has also been widely used in the fields of aviation ( 5), biology (6), hydrology (7), electricity (8), economy (9) and infectious disease prevention and control in recent years, providing great enlightening significance for solving the prediction problems in related fields.
3 Application progress of influenza ensemble forecast

Application progress of influenza ensemble forecast in the United States
The United States is one of the country's most seriously affected by seasonal influenza, and the Department of Environmental Health Sciences of Columbia University has out several studies on influenza ensemble forecast for some megacities in the past decade.Shaman and Karspeck (2) established an influenza ensemble forecast system based on the SIRS model and ensemble adjusted Kalman filter (EAKF) assimilation technology developed in the field of numerical weather forecast.This system uses EAKF assimilation method to assimilate the data of current influenza cases updated on relevant websites in real time, thus to generate 250 sets of initial values, the SIRS model is then used to integrate the initial values to achieve 250 sets of forecast values.On this basis, the ensemble forecast system was tested and evaluated for forecasting influenza cases in New York City from 2003 to 2008, In general, the influenza ensemble forecast system can accurately forecast the peak timing about 7 weeks in advance of the actual peak, and the spread of the ensemble members' forecasts can be used to enhance the confidence in the accuracy of forecast results.
In the influenza epidemic seasons of 2012 and 2013, the abovementioned influenza ensemble forecast system (2) was operationally implemented in real time and provided forecast results of influenza cases in 108 cities of the United States (10), which was the first operational ensemble forecast system for influenza.According to the related evaluation results: The influenza ensemble forecast system could accurately forecast the peak timing about 9 weeks in advance of the actual peak.In general, the forecast accuracy gradually increased with the season progressed.By the 52th week, prior to peak for the majority of cities, 63% of all ensemble forecasts were accurate.
The nonlinear growth of errors is the main source of forecast errors in infectious disease models.In order to further optimize the influenza ensemble forecast system, on the basis of the previous works, Pei and Shaman (11) quantitatively estimated the nonlinear error results of the above-mentioned influenza ensemble forecast system through the error breeding analysis method and then accordingly corrected the forecast errors.After this, the ensemble forecast experiments for influenza cases in 95 cities of the United States from 2003 to 2008 were conducted, evaluation results indicate that: In general, through the nonlinear error correction process, the forecast accuracy of the peak time and peak intensity of influenza outbreak are both improved.
On the basis of the previous works, Pei et al. (12) found that the initial value error and random error in the infectious disease model have similar growth characteristics in the process of model integration through several diagnostic analysis processes, which further confirmed that the nonlinear dynamic error growth is the main source of the forecast error of infectious disease models.On this basis, the direction of the fastest growth of initial value error was found by singular vector analysis method and then accordingly used to optimize the initial value perturbation scheme.After this, the ensemble spread increases significantly so that the forecast uncertainty could be better represented, and the ensemble forecast accuracy is also further improved.
To sum up, the United States is the country with the most research on influenza ensemble forecast technology.In recent years, an influenza ensemble forecast system was built, and some ensemble forecast researches such as forecast results evaluation, error evolution characteristic diagnosis and analysis, ensemble forecasting initial value perturbation scheme optimization have been done.The newlydeveloped influenza ensemble forecast system has been operationally implemented and provided reference for scientific prevention and control.

Application progress of influenza ensemble forecast in subtropical regions
Influenza outbreaks in temperate regions usually present the characteristics of seasonal circulation, while that in tropical and subtropical regions presents irregular non-seasonal distribution characteristics and can breakout throughout the year.Therefore, the forecast of influenza cases in tropical and subtropical regions is more difficult.
Yang et al. ( 13) established an influenza ensemble forecast system with ensemble size of 500 for the Hong Kong city in subtropical region based on the SIRS model and the EAKF assimilation technology, which is similar to the ensemble forecast system constructed by Shaman and Karspeck (2).Based on this, the ensemble forecast system was tested and evaluated for influenza cases in Hong Kong from 1998 to 2013.Overall, the influenza ensemble forecast system was able to predict the peak timing and peak intensity of 44 influenza pandemics caused by single influenza strain or multiple influenza strains in the past 16 years.The overall forecast accuracy of 1-3 weeks in advance was 37%, and the forecast accuracy increased with the ensemble spread.The maximum accuracy of the peak time (intensity) of the pandemic caused by different strains is 43-93% (45-89%).In general, for non-seasonal influenza pandemics in subtropical regions, which are difficult to predict, the influenza ensemble forecast system can forecast accurately at least three weeks in advance.
The influenza ensemble forecast system for Hong Kong is generally similar to that established by the Department of Environmental Health Sciences of Columbia University, but its overall forecast accuracy is obviously worse, which may be mainly due to the lower predictability of influenza outbreaks in subtropical regions compared to temperate regions.

Application progress of super ensemble forecast technology for influenza
In addition to establishing ensemble forecast system based on a single model, the forecast results based on different models can be directly combined to form ensemble forecasts, which is called multi-model super ensemble forecast in the field of numerical weather forecast.Generally speaking, each model has its certain advantages and disadvantages.Thus, the super ensemble forecast may absorb (avoid) the advantages (disadvantages) of each single model, so as to achieve more accurate forecast results.In recent years, several studies have been fulfilled on the multi-model super ensemble forecast for influenza.
To incorporate all available data and methods to achieve a more accurate forecast of influenza cases, the Centers for Disease Control and Prevention of the United States has organized seasonal influenza forecasting challenges since the 2013 season.In the 2017 and 2018 influenza seasons, the 22 teams participating in the challenge combined the forecast results of their respective model through the machine learning method (14), and the specific weights for each model determined by its forecast accuracy in previous seasons.It is found that the forecast results after weighted integration are obviously better than that of the 22 teams, which shows good potentials to be operationally implemented.Yamana et al. (15) also completed a similar study on the seasonal influenza, but during the weighted integration process based on the multi-model super ensemble forecast results, the same weight was applied to each model.The results showed that the forecast results of the multi-model ensemble forecasts outperform those of each single model, and very poor forecast results were less likely to occur.Different from the above schemes for determining weight of each single model, McAndrew and Reich ( 16) generated the weights of each model by its forecast accuracy updated weekly in real time and found that the forecast accuracy based on this weighting scheme are better than that of the above-mentioned two schemes (14, 15).
To sum up, scheme for determining weight should be selected according to specific needs or situations when carrying out weighted integration processes for multi-model super ensemble forecast results, since each scheme has its own advantages and disadvantages.In general, the development of super ensemble forecast and proper weighted integration process could achieve more accurate forecast results.

Discussion
In recent years, several influenza ensemble forecast systems were established and some related researches were conducted such as forecast results evaluation, error evolution characteristic diagnosis and analysis, ensemble forecasting initial value perturbation scheme optimization, super ensemble forecast and so on.Some wellperformed influenza ensemble forecast systems have been operationally implemented and provided references for scientific prevention and control.In general, ensemble forecast can represent various uncertainties in forecasting influenza cases based on infectious disease model and achieve more accurate forecasts and more valuable information than the single deterministic forecast, showing a good prospect for application.In addition, the development of super ensemble forecast and proper weighted integration process could achieve more accurate forecast results.
However, there are still some weakness in the above-mentioned works: Firstly, some of the above-mentioned influenza ensemble forecast systems use the EAKF assimilation method to generate initial values.In fact, there are many other initial value perturbation technologies (17) in the field of numerical weather forecast that can be applied to establish influenza ensemble forecast system, which are expected to reflect the forecast uncertainty of infectious disease model more reasonably and improve the corresponding ensemble forecast skills; Secondly, at present, the post process technologies for influenza ensemble forecast products are mostly simple ensemble average or weighted average based on super ensemble forecast.It is expected to further improve the accuracy and application level of influenza ensemble forecast products by learning to and applying other mature post-process technologies (18) in the field of numerical weather forecast, such as the probability-matching ensemble mean, merged optimal ensemble quantile and Bayesian average; Thirdly, modern ensemble forecasts are expressed probabilistically other than deterministically, more decision mistakes could be avoided if the decisions are made based on whether the probabilities exceed some prior determined threshold for action, which is an important aspect for the application of ensemble forecast technology (19).However, at present, probability forecast is rarely used in the influenza ensemble forecast system, strengthening the application of ensemble probability forecast is expected to further improve the application level of influenza ensemble forecast and reduce decision-making errors.
To further improve the skills and application level of ensemble forecast for influenza, I strongly suggest that scientists engaged in influenza forecast based on infectious disease models should strengthen cooperation with scientists in the field of numerical weather forecast, which is expected to produce innovative academic ideas and achieve new breakthroughs through interdisciplinary cooperation.
Due to the limitation of words, this study only reviews the application progress of ensemble forecast technology in influenza forecast based on infectious disease model.In fact, there are many other similar studies involving other infectious diseases such as dengue (20) and COVID-19 (21), which may be reviewed in the future.

FIGURE 1
FIGURE 1The schematic diagram of ensemble forecast (Black line: deterministic forecast; Blue line: ensemble members' forecasts; Red dotted line: observation).