Forecasting the 2020 COVID-19 Epidemic: A Multivariate Quasi-Poisson Regression to Model the Evolution of New Cases in Chile

Objectives: To understand and forecast the evolution of COVID-19 (Coronavirus disease 2019) in Chile, and analyze alternative simulated scenarios to better predict alternative paths, in order to implement policy solutions to stop the spread and minimize damage. Methods: We have specified a novel multi-parameter generalized logistic growth model, which does not only look at the trend of the data, but also includes explanatory covariates, using a quasi-Poisson regression specification to account for overdispersion of the count data. We fitted our model to data from the onset of the disease (February 28) until September 15. Estimating the parameters from our model, we predicted the growth of the epidemic for the evolution of the disease until the end of October 2020. We also evaluated via simulations different fictional scenarios for the outcome of alternative policies (those analyses are included in the Supplementary Material). Results and Conclusions: The evolution of the disease has not followed an exponential growth, but rather, stabilized and moved downward after July 2020, starting to increase again after the implementation of the Step-by-Step policy. The lockdown policy implemented in the majority of the country has proven effective in stopping the spread, and the lockdown-relaxation policies, however gradual, appear to have caused an upward break in the trend.


INTRODUCTION
The pathogen SARS-Cov-2 has caused the infection called Coronavirus disease 2019 (COVID- 19), spreading worldwide in just a few months. On January 30, 2020, the World Health Organization (WHO) declared the COVID-19 outbreak a Public Health Emergency of International Concern, considering the occurrence of cases in five WHO regions within 1 month (1).
Chile is an interesting case study to monitor the evolution of COVID-19: It is not a developed country, in spite of its membership in the Organization for Economic Co-operation and Development (OECD). Income inequality has been a persistent discussion topic for decades (2). Yet, its authorities took early measures to augment emergency room capacity and to restrict individual freedom of movement, in order to be able to face the pandemic.
On February 8, 2020, the Chilean government declared a national health emergency in the country, beginning Phase 1 of the epidemic "as no cases had been reported yet, " to deal with the imminent arrival of the virus. On March 3, 2020, the first case (an international traveler) was announced (3), meaning that the country had entered thus Phase 2 of the epidemic "all cases corresponding to people who had traveled abroad." On March 6, 2020, the Chilean Ministry of Health (CMoH) issued a new legal order to increase its attributions to be able to mitigate the imminent local spread of the virus. Two weeks later (March 18, 2020), the Chilean government decreed a state of constitutional exception due to national catastrophe, after the WHO declared COVID-19 a pandemic on March 11 (1), enabling the government to restrict free movement and association. Some policies were implemented both at the national (curfews and prohibition of crowded events) and local (zonal weekly quarantines and regional sanitary blockages) levels, depending on the evolution of the disease (4).
By the end of June 2020, Chile had over two hundred and thousand confirmed cases of COVID-19. The majority of new cases concentrated in the Metropolitan Region (RM, where Santiago, the capital city, is located), where about a third of the country's population is concentrated. Chile has the highest testing rate in Latin America, with nearly one million tests carried out by June 23rd (4,5). Overall case fatality was of 12,278 as of September 15th 2020.
Full lockdown in the RM was implemented on May 13th 2020. Progressive improvements in the daily number of both infected individuals and casualties, prompted the government to announce on July 16th a new policy called Paso-a-Paso (Step-by-Step), aimed at a slow relaxation of the confinement measures. It was devised as a five-stage program: Lockdown, Transition, Preparation, Early Reopening, Advanced Reopening, each with specific restrictions and obligations for individuals. The progress from one stage to the next is gradual, and each municipality will be centrally assigned to each stage according to their epidemiological statistics, with continuous monitoring of those indicators in order to allow them to progress or return between stages. On July 28th, the plan was finally implemented, where seven municipalities in RM and two others in the Valparaíso region exited Lockdown to be assigned to Transition. This policy slowed down the previous downward trend in the number of new cases.
With this reality in mind, we set ourselves to forecast and model the evolution of the disease in Chile. Several studies have modeled and predicted the spread of COVID-19 in different countries, using data that begins with the first reported cases. Some studies (6,7) have fit data using Gaussian models or other standard regression models, which are inappropriate given the nature of discrete count data.
Phenomenological models (8)(9)(10)(11)(12)(13) have been previously applied to various infectious disease outbreaks including other respiratory illnesses, such as severe acute respiratory syndrome (SARS) and pandemic influenza. These models, including the sub-epidemic growth model, can capture empirical patterns of past epidemics, and are useful in generating short-term forecasts of the daily trajectory of the epidemic. These approaches are especially useful when epidemiological data are limited. Realtime short-term forecasts generated from such models can be useful to guide the allocation of resources that are critical to bring the epidemic under control. Remuzzi and Remuzzi (14) used exponential growth models to predict the early propagation of the virus in Italy. Canals et al. (3) also used an exponential growth model to predict in the case of Chile. Maier and Brockmann (15) used sub-exponential growth in confirmed cases of recent COVID-19 outbreak in Mainland China. Exponential growth models, however, are unrealistic in scenarios where additional information is available: Once an epidemic has progressed, and mitigation measures start to have effects, contagion rates are slowed down, with a reduction of the count of new cases, making exponential growth models less appropriate for modeling purposes. Hence, logistic growth models are a better option to model data in these instances. For example, Roosa et al. (16), Aviv-Sharon and Aharoni (17), and Chen et al. (18) have used generalized logistic growth models and the Richards model (19) to generate forecasts of the cumulative reported cases of COVID-19 in China, Asia, and USA, respectively.
For our study, we have used a novel multi-parametric method that extends the standard logistic growth curve, allowing us to understand the past and predict the future evolution of the disease in Chile. The model is a nonlinear quasi-Poisson regression specification that explicitly accounts for overdispersion of the count data. The trend has been estimated using a Richards growth curve, incorporating weekday-specific effects and policy interventions as control variables. This sort of specification has not been used so far in previous COVID-19 studies. In specific, our approach allows for additional flexibility compared to other studies that analyze the evolution of COVID-19 in Chile [e.g, (3,20)]. That additional flexibility of our specification allows us to both forecast and simulate multiple alternative scenarios, such as the continuation of the Lockdown policy (in contrast to the Step-by-Step policy) and changes in the growth rate of the epidemic.
In section 2, we describe our data, our forecasting methodology and model; in section 3, we present our estimation results; in section 4, we discuss our findings and conclude. In the Supplementary Material document, we offer an Appendix with additional comparative statics for different scenarios (with additional tables and results included as well).

Description of the Data
Data used in this work comes from the epidemiological reports from the CMoH, spanning from February 28 until September 15, 2020. These epidemiological reports are updated overtime by the independent expert panel working with the CMoH, and it is updated regularly to adjust for errors and misreports. These most accurate counts are collected on the Chilean Ministry of Science (CMoSc) website at https://www.minciencia.gob.cl/covid19. Considering the many issues regarding collection and publication of COVID-19 data in Chile, we believe that this data source is the best option available to analyze the Chilean case, as other authors have similarly done [e.g., (3,20,21)].
The dataset includes the total count of confirmed cases according to (a) the date that COVID-19 symptoms first appeared (as provided by the patient), and (b) Polymerase chain reaction test (PCR) prognosis notification date (as registered by the physician on the CMoH surveillance system). It is important to mention that this case count is retroactively corrected as new cases are confirmed and the epidemiological situation evolves as measured by the CMoH epidemiological department. In this study, we decided to use the daily count of cases according to PCR notification date, due to higher reliability.
Since our study is based on secondary data from the CMoH's official daily public reports as published by the CMoSc, it did not require approval from an Ethics Committee.

Richards Growth Curve Models
The Richards growth curve model (19), a generalization of the logistic curve (22), is a growth curve model for population studies used in cases where growth is not symmetrical about the point of inflection (23,24). It has been widely used to describe epidemiological processes for real-time prediction of outbreak of diseases [e.g., SARS (25), dengue (26), influenza H1N1 (27), and COVID-19 (7,18)].
Here, K is a parameter corresponding to the total count of infected people by the end of the pandemic, r is the daily hazard (infection) rate, t m is the lag phase, and α is a variable which fixes the point of inflection and control asymmetry parameter. The first derivative of this function with respect to time t allows us to model the number of new cases.

The Quasi-Poisson Approach
Expanding upon the logistic asymmetric Richards curve discussed previously, we have fitted a Generalized Quasi-Poisson Nonlinear Regression to model the evolution of daily cases, using explanatory covariates, to predict the daily number of COVID-19 cases in Chile. Poisson regressions are models used to model count data, assuming that the response variable is Poisson distributed. Denote {Y t } as the number of confirmed COVID-19 cases at time t, {X t } the vector of collected covariates at time t, The Poisson regression assumes that the response variable, conditional on the past, follows the following probability model: Here, rate λ t = g(F t−1 , β) is a function of the covariates, and of unknown β parameters to be estimated. If g(·) is a linear combination of the β parameters, then the model is considered a Poisson Generalized Linear Model.
A key assumption for the validity of Poisson regression models is that both the mean and the variance are the same. In our case, the variance is larger than the mean. We address this using a quasi-Poisson regression. In this model, count data is assumed as generated by an exponential family distribution where the variance is equal to the mean multiplied by an over-dispersion parameter φ > 1, thus, In our proposed model, covariates collected in {X t } include a weekday seasonal effect as well as holiday dummies. Both of these are crucial, considering that most PCR testing labs do not operate on weekends or holidays. Additionally, our proposal considers an intervention variable to capture the Step-by-Step confinement reduction policy, added to the Richards curve estimate.
As such, the model allows us to obtain an estimate for the basic reproduction number, R 0 (t).

Implementation of Modeling Analysis
To estimate the parameters of the generalized linear model given by expression (1), we used the software R. The "gnm" library includes the function gnm(). The iterative algorithm requires starting values for the parameters, which were obtained through the function nls() in the "nls" library. To obtain confidence  intervals for out-of-sample predictions, we approximated the quasi-Poisson likelihood with negative binomial distributions via a bootstrap of size 10,000, using the ciTools library in R, as described by (28). The intervals are, thus, built as follows: (1) is fitted to obtain estimates θ and Cov( θ). The number of simulations is set at 10,000. 2. Simulate 10,000 draws of the coefficients θ * ∼ N( θ , Cov( θ )). 3. Simulate Y * |F t−1 from the response distribution using the following approximation:
Because information is updated daily, for replication purposes, full data was updated in our spreadsheet on September 16, 2020. For the actual realizations of data from that date until October 30, 2020, we used the November 7, 2020 update.

RESULTS
The model was fitted to the observed daily cases from February 28 to September 15. Table 1 summarizes the estimated parameters, standard errors (SE), and 95% confidence intervals (CI). Parameters β 4 -β 5 were statistically significant at p < 0.05, and all other parameters were significant at p < 0.001. The point estimate of the quasi-Poisson over-dispersion parameter isφ = 68.79. For further illustration, Figure 1 offers a depiction of how well our estimated model fits the data. On Figure 1A, we display the   Figure 1B showcases the cumulative count of COVID-19 cases in Chile, with the black line denoting the true cumulative count, and the blue line denoting fitted values. Accordingly, the goodness of fit of the model is assessed with the Heinzl-Mittlböck Pseudo R 2 (29). The Pseudo R 2 equals 95.3%, confirming the excellent fit of the model to the data.
Note that the proposed model offers a good visual fit to the evolution of the epidemic, where the curve succeeds at capturing the exponential growth of the data as well as the seasonal effects corresponding to weekdays/weekends.
The intervention variable captures the (statistically significant) slowdown of the decay effect typically present toward the end on Richards curves, as caused by the introduction of the Step-by-Step policy.  Table 2 displays the values of the predicted new cases for specific dates of Figure 2, between September 16 and October 30, including 95% CI. Please note that, as can be observed from reading Table 2, all our predicted daily cases fall within the 95% CI, until the end of October 2020. Our predicted total number of cases at that date was almost 488,000 cases, against the actual count of 498,466 observed that day.

DISCUSSION AND CONCLUSIONS
The estimated model evidences that the trend of the data changes once the Step-by-Step governmental policy was implemented. Before that date, the daily count of new cases was decreasing at a faster rate than after the policy. This is a measure of the effects that the implementation or removal of lockdown policies have in the infection curve in the middle-to-short-run, understanding that new policy measures are likely to cause structural changes to the shape of the curve. From the proposed model, we can observe that the estimated daily growth rate of COVID-19 in Chile is about 4.5% (95% CI: [4%, 5%]). Compared the to the rates observed in the US (16.9%, 95% CI: [15.9%, 17.8%]) (18) and China (17.12%) (30), we can conclude that the infection rate in Chile is about three times smaller. The growth rate in Chile in the second half of September 2020 implies that the cumulative count of COVID-19 cases in Chile doubles every 2 weeks.
The accuracy of our model, naturally, is contingent on the governmental level policy decisions as they emerge. The restrictive lockdown policies imposed by the CMoH in the RM starting on May 15 had an impact in the slow down of the epidemic's spread two weeks later (the disease's incubation period): We predict, using our model, that the count of COVID-19 cases would have been 491,096 by July 28 without lockdown policies, compared to the actual count (with lockdown in place) of 267,846 cases. This reduction in the total count made possible for the government to launch the Step-by-Step policy. The Step-by-Step policy generates a break point in the downward trend observed before July 28. We forecast that the introduction of that change might increase the daily count of new cases up to ten times the expected count under lockdown. The short run impact is thus relevant, particularly considering that our model doesn't consider the possibility of a second outbreak of the disease, which in practice cannot be ruled out from happening. Indeed, a natural limitation of our study is the fact that the COVID-19 epidemic is still under development. Thus, the estimates of the parameters of the model have a substantial amount of uncertainty associated to them. For instance, after the end of the epidemic, the interpretation of the parameter K is the count of infected individuals at the end of the pandemic, which is not an entirely valid interpretation for our current data.
Our ability to understand the COVID-19 epidemic is essential in order to curb its global spread. Our study provides an important framework to inform public health decision-making designed to end the epidemic in different regions, by not only aiding decision makers in Chile, but also illustrating the usefulness of the quasi-Poisson modeling approach to follow the evolution of the disease when availability of data is limited.
Our selection of the growth model was based on intensive testing of other models: This functional specification produced the most accurate results, provided enough flexibility, and generated key information, including the exponential growth rate, the doubling time for the epidemic, and the effect of the governmental policy interventions in the level at which the rate of growth of the epidemic levels off. Our key contribution is methodological: We strongly believe that models from the generalized logistic family, such as the one we present here, are useful to be able to track the future trends of diseases like COVID-19.
Naturally, our study has limitations. Particularly, the time frame of the study corresponds to the data available until September 2020, and in the long run, it doesn't account for further interventions (as they indeed took place: Authorities implemented further relaxations of the containment measures in November and December, causing the growth rates of the epidemic to continue to increase until the date of this revision in March 2021).
One of the main limitations of the proposed approach is the fact that its ability to make accurate predictions only works for the short-term, being unclear in general for how long such predictions remain reasonably accurate. However, it is also fair to recognize that this limitation is not unique to the proposed approach, as any other model-based methods that serve similar purposes also present the same limitation. Table 2 shows in its final column the actual observed data, to make comparisons for the out-of-sample observations: In each case, the true observed daily cases fell within the 95% forecast confidence intervals (as we also displayed in Figure 2). As expected, the model's shortterm daily forecast of new cases is close to the observed new cases. All in all, given the aforementioned reasons, long-term inferences using this type of model should not be considered. In spite of its limitations, findings from this study provide useful information to inform public health decision-making and policies designed to end the epidemic.
From a policy perspective, as Hodgins and Saad (31) noted, the high-income countries' blueprint of suppression and maintenance is less likely to be effective in low-andmiddle-income countries. In specific, strict lockdowns like the ones implemented in Chile have had substantial negative impacts on the economy, access to education, and disruption of routine clinical services. This was the motivation behind the introduction of policies like Step-by-Step. It is unrealistic that radical suppression can be considered a viable policy in the long run. Tools like the one we present in our article, however, enable policy makers to keep a close eye on the evolution of the disease. In any case, it is crucial that authorities understand that the relaxation of protective measures caused by policy announcements such as the Vacation Permits released since December 2020 for the Summer Season in 2020/2021 had a direct effect on the count of new COVID-19 cases, as reflected by the upsurge of new cases present throughout the first trimester of 2021 in Chile.

DATA AVAILABILITY STATEMENT
The original dataset used in the study is included in the Supplementary Material (ZIP file), further inquiries can be directed to the corresponding author.

AUTHOR CONTRIBUTIONS
MV, CV, and BQ conducted all forecasts and data analyses, wrote the text of the manuscript in September 2020, and revised subsequent corrections to the document in March 2021. All authors read and approved the final manuscript.