Mobility data shows effectiveness of control strategies for COVID-19 in remote, sparse and diffuse populations

Data that is collected at the individual-level from mobile phones is typically aggregated to the population-level for privacy reasons. If we are interested in answering questions regarding the mean, or working with groups appropriately modeled by a continuum, then this data is immediately informative. However, coupling such data regarding a population to a model that requires information at the individual-level raises a number of complexities. This is the case if we aim to characterize human mobility and simulate the spatial and geographical spread of a disease by dealing in discrete, absolute numbers. In this work, we highlight the hurdles faced and outline how they can be overcome to effectively leverage the specific dataset: Google COVID-19 Aggregated Mobility Research Dataset (GAMRD). Using a case study of Western Australia, which has many sparsely populated regions with incomplete data, we firstly demonstrate how to overcome these challenges to approximate absolute flow of people around a transport network from the aggregated data. Overlaying this evolving mobility network with a compartmental model for disease that incorporated vaccination status we run simulations and draw meaningful conclusions about the spread of COVID-19 throughout the state without de-anonymizing the data. We can see that towns in the Pilbara region are highly vulnerable to an outbreak originating in Perth. Further, we show that regional restrictions on travel are not enough to stop the spread of the virus from reaching regional Western Australia. The methods explained in this paper can be therefore used to analyze disease outbreaks in similarly sparse populations. We demonstrate that using this data appropriately can be used to inform public health policies and have an impact in pandemic responses.

The parameters can be explained as follows.The rate of infection is δ.The parameter that governs movement from the Exposed (E) to the Infected (I) state is β, which is related to the latent period of the disease: the time between an individual's infection and the onset of infectiousness1 .The flow out of the I state depends on γ, which is the inverse of the period composed of the time the individual spends being infectious but asymptomatic plus the time he or she spends being symptomatic but not yet removed (such as after testing positive).People can move forward between the different vaccinated (V m where m = 0, 1, 2, 3) states as they receive more doses of their vaccine, but not backwards.This movement is characterized by parameters µ k (t), where k = 1, 2, 3, with µ 1 (t) being the amount of people who received the first dose of the vaccine at day t.Further, the rate of infection parameter, δ, decreases proportionately to the effectiveness of each dose.We take ε m as the efficacy of each dose, and the rate of infection parameter becomes (1 − ε m )δ for each V m state, where ε 0 = 0. Lastly, due to a waning immunity against the virus, there is a possibility for reinfection, albeit at a lower rate (Pulliam et al., 2021;Altarawneh et al., 2022;Turner, May 21, 2022).As such, we define λ as the parameter governing flow between the Removed (R) state and V 3 .We set this as a connection to V 3 rather than V 0 as a recovered individual will still have some protection, and they will not be getting vaccinated again if they are already vaccinated, hence it will be misleading to put them in the Unvaccinated (V 0 ) state.Therefore, the equations for each town i are given below.Note we also add the mobility aspect to these single town equations, represented by the function f(•).This mobility component is described in the main text.
Through a range of methods, including literature reviews and fitting Western Australia's outbreak data, we were able to derive values for each of the parameters, as shown in Table S1 (Berman, May 27, 2022).To work out ε m , we used the vaccine efficacy presented in (Andrews et al., 2022), and derived a general value for each dose based on the proportion of different vaccine types (e.g.AstraZeneca, Pfizer, Moderna) available in Australia at the time of writing.Since γ is the time taken before an individual is removed from the population rather than recovered, we could not rely on the recovery time for COVID-19 as reported in the literature.Therefore, we derived γ by fitting the compartmental model using SciPy's c urve fit function (Virtanen et al., 2020) to the daily infectious cases of the Omicron outbreak in South Australia.As this work was done in the midst of these outbreaks occurring, we had to use the most recently available data, which initially was the South Australia outbreak.The rationale was that these states are similar in their population distribution and density.When estimating δ, we opted for a similar method.However, this did not translate well between the states, meaning the compartmental model did not do a good job of estimating Western Australia's case numbers.This showed us that Western Australia and South Australia, as similar as they are in population density and distribution2 , are still different case studies.This illustrates the danger in using results from one outbreak to be generalized to another.Therefore, we approximated δ using SciPy's curve fit and fitting the model to the WA outbreak, attributing it as a time sensitive parameter due to the different behaviour of the virus at different stages of the outbreak.We take this as reasonable act as the aim of the paper was not to estimate the rate of infection in Western Australia, but rather show the validity of applying the GAMRD to a sparse population, and show how, when combined with a compartmental model, it can make meaningful conclusions about COVID-19 in Western Australia.

Kalman filtering vs. a moving average
In our work, we chose to use a Kalman filter as the smoothing mechanism for our noisy time series data of movement between two towns in a given week.We opted to use the Kalman filter over simpler techniques, such as a moving average, because the Kalman filter is able to model and estimate the underlying state of a system using probabilistic inference (Walker and Mees, 1996;Anderson, 2012).Further, the Kalman filter can effectively estimate the true underlying value even in the presence of noisy observations.Conversely, a moving average is a simple smoothing technique that calculates the average of a subset of data points over a fixed window.It can reduce high-frequency noise but may not handle complex noise patterns as effectively as the Kalman filter.Due to the level of complexity present in the transport network, we chose to use the Kalman filter.
With a Kalman filter, we could also investigate whether adding the relationship between edges to the Kalman filter improved the model.To do this, we initialised the covariance matrix (Υ 0 ) as the Pearson correlation coefficient between two edges.Hence, the covariance matrix contained information on how the time series of one edge (e.g.Perth to Broome) related to that of another (e.g.Albany to Kalgoorlie).After a few time steps, this model quickly approximated the result of applying the Kalman filter where the covariance matrix was initialised as the identity matrix.However, the key difference was that the correlation coefficient initialisation did much better at approximating the initial transient, and so it was chosen over the identity matrix initialisation.
Figure S2 shows the difference between the different initialisations of Υ 0 , as well as a comparison to a five-step moving average.By examining the figure, we can see the Kalman filter is not as affected by noise.Further, since it takes into account the dynamics of the system we can see it often holds a pattern of behaviour for longer (such as going up more slowly after the initial dip in movement).Pragmatically, any of the methods illustrated would have been reasonable choices to smooth this data.However, for the reasons mentioned above, we chose to employ the Kalman filter with the covariance matrix initialised as the Pearson correlation matrix.

Figure S2.
A comparison of how different methods perform in smoothing out the time series.We can see they are all relatively similar but have some small differences.

Transport network before and after adding stochastic effects
It is perhaps easiest to appreciate the full effect of the stochastic step on a visual transport network for a given week.Figure 3 shows the transport network in Western Australia before and after adding the stochastic step.We can see that this paints a much more realistic picture of travel in a given week.Therefore, this achieved our desired goal of smoothing the data by spreading out the movement around the different towns without artificially inflating the movement by adding new travelers.

USING EFFECTIVE REPRODUCTION NUMBER TO ESTIMATE VACCINATION LEVELS REQUIRED TO SUPPRESS OUTBREAK
On January 21st 2022, the Western Australian government reversed its decision to open the Western Australian hard border on February 5th (Kagi, January 21, 2022).The reasons cited for this was the emergence of the Omicron strain, and the concern that two vaccine doses offered limited protection against catching Omicron.Hence, it was hoped that the government's delayed opening would allow for an increase in third dose vaccination rates.
With the emergence of Omicron in Western Australia and the inability to suppress it as had been done previously with other strains, the government ultimately opened the border a month later, on March 3rd (Marcus, March 3, 2022).We want to estimate the booster (third dose) vaccination rate that would have been required for Western Australia to effectively suppress the virus, had it been kept out.
To do this, we consider the reproduction number: where S 0 is the proportion of individuals initially in the Susceptible compartment, and γ is the recovery rate.The meaning of the reproduction number is that it is the average number of individuals that an infected person infects.If R 0 > 1, an epidemic ensues.If R 0 < 1, the disease dies out.
However, our model presents four susceptible states, rather than one, with different size populations and different (yet related) rates of infection parameters.Further, after some time t we can have people in the Removed state, where they cannot be infected with the virus.Hence, we rename the basic reproduction number the effective reproduction number (Delamater et al., 2019), as by definition it does not assume a complete susceptibility of the population.We want an effective reproduction number R eff (t) that can adapt based on time t and takes into account the four different susceptible states (one for each vaccination level) and their rates of infection.Hence we firstly build a weighted, time dependent transmissibility parameter that encompasses all four susceptible states and their effective transmissibility parameters.We also want to know the proportion of people in the four Susceptible states combined, which we define as S 0 (t).Therefore, we define R eff (t) for our model: Here, ε m is the effectiveness of each dose of the vaccine, and Vm (t) represents the portion of the population in each of the four susceptible compartments.
We assume that we wish to suppress an outbreak that has not yet spread, meaning that effectively all of Western Australia's population is in one of the four susceptible states.Hence 3 m=0 Vm (t) = 1.
We first look at an analytic solution that assumes an individual is either completely unvaccinated or triple vaccinated.This will help us find an upper limit for the percentage of triple vaccinated people required to keep R eff (t) < 1.Hence, setting V 1 , V 2 = 0, V 0 + V 3 = 1, and setting R eff (t) < 1, the equation becomes: Earlier, we showed that γ = 1/3.778and ε 3 = 0.566, and we found that δ 0 = 0.442, δ 1 = 0.455, δ 2 = 0.905, δ 3 = 0.567.We hence find the required proportion of people to be triple vaccinated for each phase of the outbreak in Table S2.
Table S2.Required percentage of triple vaccinated people in Western Australia to suppress the spread of Omicron based on the different infectivity rates (δ) found for the four different phases of the outbreak, starting at days 0, 16, 44 and 80 since the first case was detected.The first two rows of Table S2 show that only 71% and 74% of people had to be triple vaccinated for R eff (t) < 1 under that specific rate of infection.However, these numbers should be interpreted with caution; the low values for δ are in part due to the social restrictions at the time.The gradual relaxation in social restrictions may have contributed to the increases in δ, and therefore we have to consider this when interpreting the numbers.This means that a 71% triple vaccination rate may have suppressed the outbreak in Western Australia, but it would have to have been combined with similarly strict social restrictions as we saw in the early stages of the outbreak.This may not be the preferred control method by policy makers given the lower severity of the Omicron strain in comparison to previous strains, however it is important to note for any new strains that may arise in the near future.As shown for δ = 0.905, no level of triple vaccination is enough to suppress the outbreak.Hence a certain level of vaccinations alone would have not been enough to suppress the Omicron outbreak, and therefore should not have been the sole consideration in opening the borders.

Start of
If we choose to remove the assumption that people are either unvaccinated or fully (triple) vaccinated, we have a greater range of possible solutions for Vm , depending on the δ we set.For example, for the δ = 0.442 case, we have a possible solution3 with V0 = 5%, V1 = 21%, V2 = 18%, V3 = 56%.However, for δ = 0.905, there is still no possible combination that suppresses the spread of COVID-19 in Western Australia.

WA TOWNS DATASET
Below is the dataset detailing the town population, location and which region the town belongs to, for all WA towns used.Color coding is used based on the colors shown in the coloured transport network in the main text.

Figure S1 .
Figure S1.The chosen compartmental model for COVID-19 modeling in Western Australia.

Figure 3a .
Figure 3a.Transport network prior to adding stochastic effects for a given week.

Figure 3b .
Figure 3b.Transport network after adding stochastic effects for a given week.

Figure 3 .
Figure 3. Comparing the transport network before and after the stochastic step.Note the added edges that are now present in the network.

Table S1 .
Best fit parameters for the first 141 days of the Western Australian Omicron outbreak, which is the limit put on when this work was carried out.