Estimation and monitoring of COVID-19 transmissibility from publicly available data

The COVID-19 pandemic began in the city of Wuhan, China, at the end of 2019 and quickly spread worldwide. The disease is caused by contact with the SARS-CoV-2 virus, which probably jumped from an animal host to humans. SARS-CoV-2 infects various tissues in the body, notably the lungs, and patients usually die from respiratory complications. Mathematical models of the disease have been instrumental to guide the implementation of mitigation strategies aimed at slowing the spread of the disease. One of the key parameters of mathematical models is the basic reproduction ratio R0, which measures the degree of infectivity of affected individuals. The goal of mitigation is to reduce R0 as close or below 1 as possible, as it means that new infections are in decline. In the present work, we use the recursive least-squares algorithm to establish the stochastic variability of a time-varying R0(t) from eight different countries: Argentina, Belgium, Brazil, Germany, Italy, New Zealand, Spain, and the United States of America. The proposed system can be implemented as an online tracking application providing information about the dynamics of the pandemic to health officials and the public at large.


INTRODUCTION
On the 11th of March 2020, the World Health Organization (WHO) declared the 2019 coronavirus disease Table 1. Sociodemographic data for the eight countries targeted in this study and associated COVID-19 data. COVID-19 data accessed on 22/05/20 on https://coronavirus.jhu.edu/map.html. Intervention data according to (36). GDP per capita in US$. Interventions are categorised into Alert Levels 1 to 4 according to the New Zealand framework (L1 = Level 1, Prepare; L2 = Level 2, Reduce; L3 = Level 3, Restrict; L4 =Level 4, Eliminate) (36). The COVID-19 data used in this report is publicly available from The Center for Systems Science

FEDERAL UNIVERSITY OF PARÁ 3
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Silveira & Pereira
Estimation and monitoring of COVID-19's trasmissibility d dt r(t) = γi(t) (4) whereλ = λ/P n , λ is the infection rate, γ is the remove rate, and κ is the incubation rate. From these 85 parameters, it is possible to calculate the basic reproduction ratio, R 0 = λ/γ. Thus, R 0 is not solely 86 dependent on the infection rate, but also on the frequency of removals due to death or recoveries.

87
Assuming that the incubation period of the disease is instantaneous and the duration of infectivity is the 88 same as the length of the disease, we can consider both groups E and I as contagious and E(t) := I(t).  For a general f (t) function, a Backward discrete-time derivative approximation is: where T s = 1 is the sampling interval in days and ∆ = 1 − q −1 is the discrete difference operator, defined 100 in q −1 , the backward shift operator domain. The discrete-time approximations of eqs. (5) to (7) are given 101 by the following difference equations, respectively: This is a provisional file, not the final typeset article 4 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 26, 2020. . https://doi.org/10.1101/2020.05.24.20112128 doi: medRxiv preprint

Estimation and monitoring of COVID-19's trasmissibility
The discrete-time SIR system described above considers time-varying parameters in order to continuously adapt the model as new data becomes available. Using the time-series of infections and removals (due to Eq. (12) is based on the estimation error of r(k). By applying the recursive least-squares method to 108 minimize J r , it is possible to optimally estimateγ(k) using the following equation: We assume that the estimation error is Gaussian, e r (k) ∼ (0, σ 2 er ), with zero mean and variance σ 2 er , such 110 that r(k) =r(k) + e r (k). The estimator gain, the parametric estimation and error variance minimization 111 are solved recursively, as follows: where f f is the forgetting factor of the recursive least-squares estimator and the closer it is to one the 113 lesser the estimator forgets. Similarly, the error covariance matrix can be reset periodically to prioritize 114 more recent data. Specifically in this work, p r (k) is a scalar vector, since only a single parameter is being 115 estimated. Choices to initialize the covariance matrix may vary, depending on prior available covariance 116 information or other positive definite matrix. The higher its magnitude the higher the estimator gain in the 117 transitory dynamical stage of the estimation procedure.

118
In regard to Eq. (11) and the estimation ofγ(k), since the infection numbers increase before any removal 119 report is available during the first stages of the pandemic, during this periodγ(k) tends to zero and 120 eventually makes the time-varying estimated reproduction number tend to infinity: Since Eq. (10) depends on both SIR parameters, due to its stronger dependence ofγ(k) and Eq. (11), its 122 estimated value is substituted into Eq. (10) and another recursive least-squares problem is constructed to 123 FEDERAL UNIVERSITY OF PARÁ 5 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 26, 2020. . https://doi.org/10.1101/2020.05.24.20112128 doi: medRxiv preprint

Silveira & Pereira
Estimation and monitoring of COVID-19's trasmissibility estimateλ(k), using The solution is akin to the minimizing the estimation error e i (k) = i(k) −î(k), using the following 125 equations for estimator gain, parametric estimation update, and error covariance minimization, respectively: Time-series data for Eq. (9) is not available and the evolution of the Susceptible compartment in time is 127 in fact estimated based on the known initial condition (i.e., the population P n ) and on the estimatedλ(k).

128
Thus, it is always estimated and fed back to (19), such that the correct form to represent it, within this 129 time-varying SIR model, is by rewriting Eq. (9) based on the estimated Susceptible: The estimation of the time-varying reproduction number, based on the derived discrete-time SIR model, 131 is given by: We also adopted two modifications to the nominalR 0 (k) equation: a moving 4-day average to compensate 133 for the randomness of daily updates on incidence data, as seen in the German Daily Situation Report of the known as the effective reproduction number (14): With this formulation, it is possible to analyze the transmission ratio of the pandemics on a daily basis,

142
This is a provisional file, not the final typeset article 6 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 26, 2020. Henceforth we assume that we are able to estimate the reproduction number on a daily basis and it is thus 144 possible to consider it as another output of the proposed pandemic model. Thus, relying on the time-series 145 ofR 0 (k) and knowing it is correlated with the number of infected and removed individuals, we deploy 146 machine learning techniques to identify a dynamic system that fits the data.

147
Differently from the real-time monitoring/sensing procedure adopted to estimateR 0 (k), now we are 148 interested in obtaining a general model that can describe the dynamics of a system,R 0 (q −1 ), for a certain 149 period of interest. In order to model such a system, we use the non-recursive least-squares estimation 150 technique (46) and propose a black-box polynomial model: where e R 0 (q −1 ) is the Gaussian process based on the estimation error e R 0 (k) of estimated polynomials 152 shown in (26).

153
This second-order autoregressive with exogenous inputs (ARX) based model structure is assumed 154 considering the fundamental simplicity of the SIR model, in which the Infected and Removed systems 155 together form a second-order system.

156
The non-recursive least-squares estimator is a batch processing technique, used to optimally estimate 157 the set of parameters that minimizes a quadratic performance index as the one shown in (12), but using a 158 vector-matrix form of error, e R 0 =R 0 − Φθ. This vector-matrix system is defined as: The above equations represent, respectively, the vector of errors, the vector of observed outputs, the 160 estimated parameters vector and the matrix of regressors. The latter is based on the vectors of regressors 161 formed up to N registered samples, with such vectors defined as:

FEDERAL UNIVERSITY OF PARÁ 7
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Estimation and monitoring of COVID-19's trasmissibility
The solution to obtain the estimated parameters is straightforward and given by By assuming an ARX linear model it is possible to evaluate the pandemics from the perspective of   The higher the variability associated with the stochastic ratio the higher we expect the variance or linear 171 power of the error to be and, consequently, the more uncertain is the official COVID-19 data reported by 172 health authorities.

RESULTS
We used publicly available data to validate the algorithms and the estimated time-varying SIR model

184
The estimators were implemented with a forgetting factor of 0.98 and the covariance matrices were reset 185 to ones every seven days to prioritize more recent data (47). Also, we adopted a moving 4-day average 186 forR 0 (k), shown in (24), to compensate for daily random effects (45). We only used data from 22 March 187 2020 onwards, when all eight countries already had more than 100 infections reported.
188 Figures 2 and 3 show the dynamics of both Infected and Removed cases using the SIR model. and New Zealand (see Table 2).

193
The trajectory of the curves shown in Fig. 4 can also be associated with some extraordinary events that   Fig. 2). This apparent contradiction is related to the low number 200 This is a provisional file, not the final typeset article 8 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 26, 2020.   . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 26, 2020.   Belgium with 26.6% and whose transmission ratio is below Argentina's, reinforcing the notion that the 203 transmission ratio is not a static parameter and is best approached by dynamical systems theory.

204
The results displayed in Figure 4 seem at odds with a recent report that claimed that Brazil's R 0 had  . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Silveira & Pereira
Estimation and monitoring of COVID-19's trasmissibility stochasticity and uncertainties of R 0 estimates. Both Germany and Italy (Fig. 5)   to the pandemics (51). This outcome is captured by our R 0 sensors, with the Brazilian stochastic ratio 220 being in decrease, as Recovered data becomes more available (see Fig. 6).

221
In Fig. 7 we present linearized R 0 estimates for Spain and Belgium. Spain's estimates have suffered 222 the influence of annotation errors (by subtracting infected individuals in 24 Apr. 2020) which eventually 223 provoked oscillations in the real-time ratio estimate, making it zero-cross on April 27th. Such an error was 224 washed out by the linearized estimate and has certainly contributed to its increased degree of uncertainty 225 (see Table 3).

226
Belgium's degree of uncertainty was the worst among all the countries, at least during the period we 227 analyzed. One possible clue to understand why Belgium's stochastic ratio became so variable is to consider 228 the impossibility to linearize its dynamics and the associated increase in error. However, our estimation 229 procedure considers the error as Gaussian, and its mean value in fact approaches zero. Thus, the linearized 230 estimate based on 30-days of data has a high probability to be close to the values shown.

231
Belgium has also shown an increased R 0 during the analyzed period. The low number of recoveries 232 (see Table 2) influenced the frequency of removals and the basic reproduction number, which is the ratio 233 between the frequency of infections and the frequency of removals (see eq. (23)). Even though a recent 234 report (52) proposed that Belgium's R 0 at the beginning of May 2020 was 0.8, according to our adaptive 235 FEDERAL UNIVERSITY OF PARÁ 11 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 26, 2020.   SIR-based R 0 estimations, the current situation in Belgium is uncertain with a probable R 0 close to 3.0, 236 which is the mean value of 30-days linearized dynamics (see Table 4). the frequency of removals, which makes R 0 increase. 241 We compiled a table of the uncertainty of the estimated stochastic ratios by country (see Table 3), where 242 the percentage of uncertainty is proportional to the linear power of the Gaussian estimation error. Table 4 243 shows the estimated R 0 estimates by country and demonstrate the usefulness of our approach to provide a 244 real-time picture of the pandemic that can be used to support decision-making.

245
This is a provisional file, not the final typeset article 12 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 26, 2020.    . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 26, 2020. . https://doi.org/10.1101/2020.05.24.20112128 doi: medRxiv preprint provide some answers to that degree of uncertainty regarding the results presented by those epidemiological 261 models.

262
During this pandemic, we are interacting with it in an unprecedented manner, trying to control its spread in 263 real-time, as in feedback control systems. Therefore, we are interfering in COVID-19's dynamics, affecting 264 its behavior and parameters, and even discussing the development of real-time monitoring techniques to 265 work against it using automation and control systems technologies. However, this closed-loop control 266 system needs humans in the loop and some data annotation mistakes may occur, such as: seasonal dynamic 267 changes related to workers shifts, weekend reduced reports and even possible political interference in the 268 data log may compromise the stability of estimators due to these disturbances.

269
Our comparative approach shows that the strict measures adopted by some countries managed to stabilize  . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 26, 2020. . https://doi.org/10.1101/2020.05.24.20112128 doi: medRxiv preprint