BRIEF RESEARCH REPORT article

Front. Phys., 17 July 2020

Sec. Social Physics

Volume 8 - 2020 | https://doi.org/10.3389/fphy.2020.00304

Country-Wise Forecast Model for the Effective Reproduction Number Rt of Coronavirus Disease

  • 1. Centre for Biotechnology and Bioengineering, Universidad de Chile, Santiago, Chile

  • 2. Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA, United States

  • 3. Escuela de Ingeniería en Bioinformática, Universidad de Talca, Talca, Chile

  • 4. Departamento de Ingeniería Química, Biotecnología y Materiales, Facultad de Ciencias Físicas y Matemáticas, Universidad de Chile, Santiago, Chile

Abstract

Due to the particularities of SARS-CoV-2, public health policies have played a crucial role in the control of the COVID-19 pandemic. Epidemiological parameters for assessing the stage of the outbreak, such as the Effective Reproduction Number (Rt), are not always straightforward to calculate, raising barriers between the scientific community and non-scientific decision-making actors. The combination of estimators of Rt with elaborated Machine Learning-based forecasting techniques provides a way to support decision-making when assessing governmental plans of action. In this work, we develop forecast models applying logistic growth strategies and auto-regression techniques based on Auto-Regressive Integrated Moving Average (ARIMA) models for each country that records information about the COVID-19 outbreak. Using the forecast for the main variables of the outbreak, namely the number of infected (I), recovered (R), and dead (D) individuals, we provide a real-time estimation of Rt and its temporal evolution within a timeframe. With such models, we evaluate Rt trends at the continental and country levels, providing a clear picture of the effect governmental actions have had on the spread. We expect this methodology of combining forecast models for raw data to calculate Rt to serve as valuable input to support decision-making related to controlling the spread of SARS-CoV-2.

1. Introduction

Different aspects of modern society favored the rapid spread of COVID-19 at a global level [1] so that it was declared as a pandemic by the World Health Organization in March 2020 [2]. SARS-CoV-2, the novel coronavirus associated with this disease, was identified for first time in the region of Wuhan, China, by sequence sampling from patients showing symptoms similar to pneumonia [3]. Genomic studies of SARS-CoV-2 suggest a phylogenetic relation with RaTG13, an endogenous variant reported in bats, based on the 96.2% identity between the two genomes [4]. Three different variants of SARS-CoV-2 have been reported, which are distributed on Asia, Europe, and America [5], to date accounting for 54 strains [6]. Additionally, among 103 strains of SARS-CoV-2 analyzed by Tang et al. [7], 101 exhibited a complete link between two specific Single Nucleotide Polymorphisms (SNPs): 72 strains exhibited a “CT” haplotype (defined as lineage L, because it is at the Leucine codon) and 29 strains exhibited a “TC” haplotype (defined as lineage S, because it is at the Serine codon) at these two SNPs. These lineages present significant differences of prevalence (70 and 30 %, respectively for L and S), and evolutionary analyses suggested that the S lineage appeared to be more related to corona viruses in animals, leaving open for question whether these lineages might have different rates of transmission or replication [7]. All of the variability and particularities of SARS-CoV-2 mentioned above make the development of a vaccine or effective treatments more difficult, demanding a considerable effort from governmental actors to control the COVID-19 outbreak.

In the current scenario, mathematical models, data mining, and pattern recognition techniques play fundamental roles in understanding, forecasting the evolution of the spread, and supporting public health policies. Herein, we present some remarkable examples of these. Hu et al. [8] proposes a prognosis model to estimate in real time the number of contagious people and the time when the propagation of COVID-19 will finish. Guo et al. [9] developed predictive models for early detection and generation of alerts to avoid SARS-CoV-2 outbreak. Following the same objective, applications of mathematical models based on the well-known SIR model proposed by Kermack and McKendrick [10] have been employed to assess the situations in different countries and as a support for health policies [11, 12]. Nevertheless, the use of these models required the resolution of inverse problems, demanding extensive volumes of data and elaborate strategies to identify their parameters. Moreover, these models fail to represent the spread in countries with heterogeneous demographics [13]. Machine Learning approaches have been extensively used in the diagnosis of COVID-19, especially in the fields of X-ray and image analysis using deep convolutional neural networks techniques [1418], to predict critical patients to optimize hospital resources [19, 20], and to search for candidate drugs for the treatment of SARS-CoV-2 [21, 22].

Despite enormous efforts to make a prognosis of different variables to support and guide health policies, relevant parameters for studying the evolution of this outbreak are not always adequately delivered to the decision-making actors. The Effective Reproduction Number Rt, for example, is a well-known parameter used to evaluate the propagation of a disease. In previous work [23], we proposed a simple and fast methodology to estimate this rate directly from raw data. In this work, we applied a different approach to study Rt and its evolution. Through data mining and forecasting techniques, based on Auto-Regressive Integrated Moving Average (ARIMA) models, we identify different spreading behaviors of the pandemic in countries around the world and develop models to forecast the spread of this pandemic. Using the forecast for the number of infected (I), recovered (R), and dead (D) individuals, we calculate Rt and its temporal evolution.

2. Methods

The workflow to create forecast models of relevant variables necessary to estimate the Effective Reproduction Number Rt can be summarized as follows. First, the variables Infected (I), Dead (D), and Recovered (R) are processed to obtain the daily values. Next, Logistic Growth models were applied to estimated Infected (I) cases, and ARIMA models were used to create a forecast model of Dead (D) and Recovered (R) values. Finally, all predicted variables were employed to estimate Rt.

2.1. Preparation of Datasets

All datasets were gathered from public repositories, which are updated on a daily basis [24]. Data pre-processing, such as filtering and scaling, was performed with scripts written in Python version 3.6 [25].

2.2. Estimation of Rt

Using the data gathered for each country, we proceed to estimate Rt using the methodology proposed by Contreras et al. [23]. Assuming that the spreading dynamics of COVID-19 in a certain territory are well-described by a SIR model, represented by Equations (1)–(3), we can easily derive an expression for Rt.

Assuming that function I, active cases, can be expressed as a function of the susceptible fraction S, I(S), applying the chain rule in Equation (2) and replacing Equation (1), we obtain:

where . Following the formalism of Contreras et al. [23], after using the hypothesis , we write the discrete version of the equation in a given timeframe [ti−1, ti] that is consistent with the temporal resolution of the data:

As the different reported fractions must sum up the total population, applying a mass balance, we may state the following dynamic condition:

By using Equation (6) in Equation (5), we obtain Equation (7):

where ΔI, ΔR, and ΔD, represent the new reported infections, recoveries, and deaths in the estimation timeframe. To smooth the different trends, we apply mobile averages, which is also our variability estimation method. From its definition, Rt ≥ 1 indicates that the outbreak might have exponential growth, while Rt < 1 would indicate a disappearing infection. The above results from the analysis of Equation (2),

which, under the hypothesis , has a unstable bifurcation when Rt = 1, exhibiting an exponential growth or decay depending on whether Rt is greater or lower than 1, respectively.

2.3. Forecast Models

Auto-Regressive Integrated Moving Average (ARIMA) models, which are related to auto-regression techniques [26], were used to develop forecast model to predict the variables related to the number of deaths (D), and the number of recovered individuals (R). The selection of hyperparameters related to algorithm was based on the maximization of the performance metrics of the produced models, in this case, Root Mean Square Error (RMSE). All models were implemented using Python version 3.6 [25] and the libraries statsmodels [27] and scikit-learn [28].

Logistic Growth models [29], which follow Equation (9), were applied to create predictive models of the number of confirmed cases (I). Parameters r, P, and K were obtained and optimized for each country-model, applying Non-linear Least Square Estimation.

Finally, Rt for each country is estimated using Equation (7) considering the predicted variables by the prognosis models previously explained.

3. Results: Forecast Models

Forecast models of the variables Infected (I), Recovered (R), and Dead (D) were developed for 185 countries that track the progression of the COVID-19 outbreak, including countries, such as the United States, Italy, Australia, Chile, and Brazil, among others. Using the predictions generated by the forecast models, we estimate Rt and its evolution over time (Figure 2). The performance of each model was assessed using a root mean square error (RMSE)-based criterion. Figure 1 shows the RMSE histograms for each forecast variable in the different countries considered. Each histogram presents a division marked by a red line at RMSE = 1, setting a threshold for considering only those countries where the quality of the data provided was sufficient to obtain reliable predictors.

Figure 1

A more detailed assessment of the models can be made through the use of the statistical distributions of the RMSE for each variable under study. Table 1 shows the error ranges obtained for each model divided into quartiles. Forecast models for variables D (Dead) and Rt present narrower ranges and lower values, mainly because of the low variability that these variables present in each country. Moreover, I and R sometimes exhibit abrupt increases on particular days and are more susceptible to presenting errors in data acquisition, as the distribution of resources (sampling capabilities) and the criteria for clinical recovery are not homogeneous.

Table 1

VariableLower Q1Q1–Q2Q2–Q3Higher Q3
Dead (D)<0.180.18; 0.440.44; 1.45>1.45
Infected (I)<0.960.96; 2.722.72; 7.41>7.41
Recovered (R)<0.730.73; 1.971.97; 4.84>4.84
Rt<0.120.12; 2.142.14; 4.32>4.32

Summary of performance measures by quartile based on RMSE distributions.

Data quality and the performance of the generated forecast model are deeply connected. In this example, if the forecast model for D has an RMSE ranking in the first quartile (Q1), the forecast models for the other variables are also likely to be satisfactory.

4. Evolution of COVID-19, Public Politics, and Tendencies of Countries

Figure 3A shows the SARS-CoV-2 propagation trend for different countries, divided by continents. To date, countries, such as South Korea, China, and Australia have successfully controlled the spread of the pandemic, as they have reached the Rt < 1 zone. However, attention should be paid to slight increases in Rt weeks after reaching control of the spread, as they could account for new outbreaks. Nevertheless, such outbreaks can occur regardless of the stage of evolution of the pandemic. For example, countries like France and Ecuador, which have not yet reached the control threshold but are approaching it, have shown patterns indicating new contagion peaks (see Figure 3A). The USA and Ecuador show values far above the control threshold Rt = 1, without a clear decreasing tendency. Countries, such as Chile, Canada, and Brazil, although presenting lower Rt values, are still fighting to control the spread of the virus. It is possible to associate differences in the Rt values with the actions applied to combat the SARS-CoV-2 outbreak. Moreover, Figure 3A highlights the effect of different health policies or government actions, such as border closings, periods of isolation or quarantine, and cancellation of massive events, on the spread of the virus. The effects after the application of the action plans are not immediate due to the incubation and spread dynamics of the virus, among other reasons. However, the trend is clear: Rt curves decrease –on average– over time, which is consistent with the progressive actions countries have executed. A detailed analysis of Chilean trends on Rt is presented in Contreras et al. [31], and iconic dates for control measures in other countries from Figures 2, 3A are listed in Table 2.

Figure 2

Figure 3

Table 2

CountryDateEvent descriptionSource
Chile2020-04-01Ministry of Health takes control of the
management of public and private infrastructure.
[32]
2020-04-08Compulsory use of masks in public transport and
crowded places.
2020-04-22Sanitary customs at airports.
2020-05-15Extension of quarantine in different districts of
Santiago
China2020-04-08China lifts lockdown on Wuhan[33]
2020-05-01Hubei province authorities state that lockdown measures
introduced due to the coronavirus disease (COVID-19)
outbreak will be loosened
[34]
2020-06-01Chinese authorities ban behaviors deemed “uncivilized,”
including placing a prohibition on sneezing or coughing without
covering the nose or mouth and imposing a requirement to
“dress properly.”
[35]
2020-06-08Chinese authorities announce that 95 foreign airlines
will be permitted to resume commercial flights to Chinese
destinations
[36]
USA2020-04-21Total closure of borders[37]
2020-06-07New York is out of quarantine[38]
2020-05-08Reopening of business in California[39]
2020-05-15Reopening of business in New York[40]
2020-05-18Reopening of business in Florida[41]

Summary of the main governmental actions carried out by iconic countries to control the SARS-CoV-2 outbreak.

A statistical analysis of the value of Rt for the most recent day of analysis (June 21) is presented in Figure 3B. A limited number of countries, such as China or S. Korea, have controlled the spread of the virus. However, a significant number of countries present Rt values >4.6, belonging to the third quartile of the local distribution. In other words, most of the countries reporting progression of the COVID-19 outbreak have not reached the control threshold. At the continental level, Europe and Asia have a greater tendency to higher quartiles, while most African states belong to the first quartile, indicating satisfactory control of the outbreak. Nevertheless, those values should be analyzed carefully, as the latter effect might be rather a sampling effect than a planned situation, as the testing capabilities of most African countries have proven to be overridden by the contingency [42, 43]. Moreover, there are several sources of error to be considered in the analysis of Rt, some of them associated with the data processing and reporting protocols or rather with the nature of the virus.

Despite the several applications of Rt for the evaluation of government action plans and health policies and the assessment of the SARS-CoV-2 outbreak in a country, the estimators used remain somewhat naive, as they rely on the quality of the data. For example, some peaks that can be explained because of incorrect data-reporting or another sampling errors can be spotted in Figures 2, 3A. As estimators do not consider possible errors related to the COVID-19 detection tests, temporal delays between diagnosis and records, or discrepancies among the clinical recovery criteria, proper data pre-treatment should be carried out before using them in order to correct some of these errors. Moreover, in countries with limited resources that do not have sufficient testing capacity to apply screening tests, Rt trends will be altered and negatively affected, since the real dynamics will remain masked and uncertain.

5. Discussion

We have developed prognostic models for the variables infected (I), recovered (R), and dead (D) to enable the estimation of the rate of spread of novel SARS-CoV-2 through the Effective Reproduction Number Rt in different countries worldwide. The models implemented are based on the use of logistic growth techniques in combination with auto-regression, assessing their performance by using the root mean square error (RMSE). Of the models generated for the 185 countries that record data related to the COVID-19 outbreak, 25% have RMSE values under the typical threshold of 1, therefore having predictions for Rt with minimal errors. The source code is available on request.

Asian countries, such as China and S. Korea have controlled the spread in recent weeks, while in Europe, the average trend approaches control. However, new data provide evidence of new outbreaks of COVID-19. At the same time, the panorama in America is much more complicated, since the trends clearly show Rt = 1 roaming far above the control threshold.

Despite the usability of Rt, work should be done on estimating the magnitude of sources of error and the variability of the data. For instance, uncertainties in diagnosis, and differences in the testing strategy and clinical criteria of recovery might lead to temporal misclassification of patients, among others, therefore heavily impacting the reported value of Rt. Moreover, we found discrepancies between the data provider servers of Dong et al. [24] and Info [30] that should be carefully studied. The lack of a protocol to assess and incorporate such errors can lead to unrealistic estimations of Rt, which are particularly dangerous. In this way, new strategies for estimating sources of error in Rt, together with the proposed forecasting methodology, can provide a robust tool for decision-making agents in the COVID-19 pandemic.

Statements

Data availability statement

Publicly available datasets were analyzed in this study. This data can be found here: https://www.worldometers.info/coronavirus/, https://www.gob.cl/coronavirus/cifrasoficiales/, and https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data.

Author contributions

DM-O and SC: conceptualization. DM-O: methodology. ÁO-N, DM-O, and SC: validation and writing, review, and editing. DM-O, SC, GC-M, and YB-S: investigation. ÁO-N and DM-O: supervision and project administration. ÁO-N: funding resources. All authors contributed to the article and approved the submitted version.

Acknowledgments

The authors gratefully acknowledge support from the Centre for Biotechnology and Bioengineering-CeBiB (PIA project FB0001, Conicyt, Chile). DM-O gratefully acknowledges Conicyt, Chile, for Ph.D. fellowship 21181435. DM-O greatly thanks Cristofer Quiroz for his support in the search for databases and related services for the development of forecast models.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Summary

Keywords

COVID-19, SARS-CoV-2, effective reproduction number Rt, public-health policies, epidemiologic modeling

Citation

Medina-Ortiz D, Contreras S, Barrera-Saavedra Y, Cabas-Mora G and Olivera-Nappa Á (2020) Country-Wise Forecast Model for the Effective Reproduction Number Rt of Coronavirus Disease. Front. Phys. 8:304. doi: 10.3389/fphy.2020.00304

Received

02 May 2020

Accepted

02 July 2020

Published

17 July 2020

Volume

8 - 2020

Edited by

Matjaž Perc, University of Maribor, Slovenia

Reviewed by

Michele Bellingeri, University of Parma, Italy; Xiao Han, University of California, Davis, United States

Updates

Copyright

*Correspondence: Álvaro Olivera-Nappa

This article was submitted to Social Physics, a section of the journal Frontiers in Physics

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics