Estimating Exposure Risk to Guide Behaviour During the SARS-COV2 Pandemic

The end of 2020 and the beginning of 2021 was a challenging time for many countries in Europe, as the combination of colder weather, holiday celebrations, and the emergence of more transmissible virus variants conspired to create a perfect storm for virus transmission across the continent. At the same time lockdowns appeared to be less effective than they were earlier in the pandemic. In this paper we argue that one contributing factor is that existing ways of communicating risk—case numbers, test positivity rates, hospitalisations etc.—are difficult for individuals to translate into a level of personal risk, thereby limiting the ability of individuals to properly calibrate their own behaviour. We propose an new more direct measure of personal risk, exposure risk, to estimate the likelihood that an individual will come into contact with an infected person, and we argue that it can play an important role, alongside more conventional statistics, to help translate complex epidemiological data into a simple measure to guide pandemic behaviour. We describe how exposure risk can be calculated using existing data and infection prediction models, and use it to evaluate and compare the exposure risk associated with 39 European countries.


INTRODUCTION
Although multiple, positive, vaccine trial results created a strong sense of optimism toward the end of 2020, continued calls for people to observe recommended mitigation practices (social distance, mask wearing, hand hygiene, ventilating enclosed spaces etc.) were a constant reminder that the SARS-COV2 virus remained a clear and present danger, as many countries continued to struggle to contain it (1)(2)(3)(4)(5)(6)(7)(8)(9). The potential for a new and even more severe wave of infections in the northern hemisphere was widely signalled, and the combination of colder weather, holiday celebrations, and the emergence of more transmissible variants of the virus (10)(11)(12) all conspired to create a perfect storm for virus transmission by the end of 2020.
Moreover, recent reports have pointed to increased mobility levels and a gradual decrease in adherence to regulations during the latest lockdowns, compared with the first wave during early 2020 (13,14). While some have been quick to cite "lockdown fatigue" as an explanation, the evidence for this has been lacking (15)(16)(17). For example, despite claims of lockdown fatigue in the UK, recent research has concluded that there is little evidence of a decreasing trend in compliance that could be framed as a form of behavioural fatigue; see (16). However, the same study does acknowledge substantial capability, opportunity, and motivational factors that could be contributing to lower levels of adherence. Whether this turns out to be a meaningful distinction remains to be seen. Either way, a recent report by the World Health Organization (18) suggests ways for governments to address such adherence problemswhich, incidentally, it frames in terms of "pandemic fatigue"highlighting the importance of allowing people to live their lives by enabling them to reduce their levels of personal risk by using clear and simple forms of communication to guide behaviour.
One of the key challenges facing health officials and governments, in communicating the current state of the virus, has been a reliance on a parade of complicated epidemiological statistics-normalised case counts, the R number, test positivity rates, doubling rates-which can be difficult for the public to digest and even harder to translate into a level of personal risk. Consequently, our aim in this paper, is to propose a more direct measure of personal risk, which we call exposure risk, as an estimate of the likelihood that someone will be exposed to an infected individual on a per contact basis. This is effectively the probability that a single contact will turn out to be infected, while remaining silent on whether such a contact will result in actual transmission. We propose that this metric has a valuable role to play in helping the public to better interpret conventional pandemic statisitics, because it relates to their personal level of risk. As such, this metric is one of a growing number of tools and techniques that have been designed and developed during the pandemic to help governments and health officials to monitor and manage outbreaks (19)(20)(21)(22)(23)(24)(25). In what follows we will formally define the exposure risk metric and describe how it can be calculated from existing data sources. We then go on to present the results of an evaluation of exposure risk across Europe, paying particular attention to the differences between the early and later periods of the pandemic.

MATERIALS AND METHODS
In this section we formally define our proposed estimate of exposure risk before describing how we evaluate it in the context of a comparative analysis of European countries using publicly available datasets.

Defining Exposure Risk
Exposure risk is the probability that a single contact will expose an individual to the virus. This can be estimated as a per capita measure of the number of infected individuals at large in the community at a given point in time; see Equation (1).
number of non-isolating infectious individuals population size (1) An exposure risk of 0.01 means that 1 in 100 (1%) of the population are infected and at large in the community. This does not mean that there is a 1% chance of becoming infected from such a contact, as there are many other factors that determine whether transmission occurs (mask wearing, contact time, distance, environment, ventilation, variant transmissibility etc.). However, it does allow an individual to form a more intuitive understanding of the likelihood that they will come into contact with an infected individual during the course of a day or a week. And since exposure risk is additive across contacts, a 1% exposure risk becomes a 10% daily risk for an individual with 10 (independent) contacts per day, all other things being equal. Estimating the exposure risk is not straightforward. For a start, reported case numbers do not provide an accurate account of true infections; many infections are mild or even asymptomatic (26,27) and, as such, they are less likely to present for, or be identified by, testing, especially when testing capacity is limited or close to capacity. In response, a variety of models have been developed to predict the number of true infections associated with a given country or location (28)(29)(30). These models work by using data, such as confirmed cases and deaths, testing rates, epidemiological knowledge about SARS-COV2 etc. to estimate true infections and other important measures. Here we use 4 such models: the Imperial College model, the Institute for Health Metrics and Evaluation (IHME) model, the Youyang Gu model, and the London School of Hygiene and Tropical Medicine (LSHTM). They were chosen as representatives of the type of models that have been widely used during the pandemic and because their predictions are available for a wide range of countries; further details about these models and their data is available in (31).
Each model, m ∈ M, produces an estimate for the number of infections, m(d, l), on date d in location l, and in this work we generate an ensemble prediction based on the average of the predictions of the individual models; see Equation (2) i(d, l) = m∈M m(d, l)

|M|
(2) The difference between this estimate of infections and the number of confirmed cases is the number of undetected infections, u(d, l), in location l on date d; see Equation (3). Note the number of cases, c(d, l), is defined as the number of reported cases in l on date d. Since the reporting date of cases usually lags their infection date, it may be necessary to align infections and cases by shifting cases by an estimate of this lag. For this study we use a 10-day lag, which is unlikely to be correct in all situations, but the results have been determined to be not sensitive to minor variations in this lag; see section 4.
Next, we need to calculate the number of undetected infections that are active on a particular date, given that infected individuals remain infected for a number of days. Thus, we need to calculate the prevalence of undetected infections. For SARS-COV2, prevalence is usually calculated as the 14-day total of cases; this is also the recommended isolation time for (suspected) infected individuals. Thus, the prevalence of undetected infections, U(d, l), for location l on date d, is given by Equation 4.
In order to estimate the exposure risk we make two further assumptions: (1) that confirmed cases do not present an exposure risk, because they will be isolating; and (2) that undetected infections do present an exposure risk, because they may be circulating in the community, unaware they are infected. This is obviously a simplification of the reality (see section 4): cases will be contagious for a time before they are confirmed; not everyone will isolate as or when they should; many undetected cases may be asymptomatic or mild, thereby presenting a lesser risk; and some undetected cases may self-isolate if they are feeling unwell. Then, exposure risk is defined as the number of undetected infections as a fraction of population; see Equation (5).
It is useful to consider a relative version of exposure risk too, by calculating the current exposure risk as a fraction of the peak exposure risk at some earlier point in time, such as during the first wave of infections when many countries locked-down hard; see Equations (6)(7)(8). For example, the exposure risk in the Netherlands at the end of 2020 (0.005) was more than 3times lower than its peak exposure risk in early 2020, whereas Austria's late-2020 exposure risk (0.027) was more than 6-times higher than it was in early 2020. This relative exposure risk may help people to calibrate their actions relative to their springtime behaviour, which defined the pandemic-level behaviour of many.

Datasets and Approach
We evaluate exposure risk for 39 countries in Europe throughout the pandemic so far using public data covering the period from March 1, 2020 to February 28, 2021; confirmed cases and population data are available in (32) and infection prediction data are available in (31). We limit our analysis to European countries primarily because of their geographical proximity, which is a significant factor in the temporal pattern of the infection waves that have occurred so far. For each country and date we calculate its mean daily exposure risk and also its peak exposure risk during the early period (March 1-September 26 2020 inclusive) and the late period (October 2 2020-February 28 2021, inclusive); each period is 150 days in duration.

RESULTS
We present the results of this analysis in two parts. First, we begin with a case-study of the daily exposure risk for Ireland and the Netherlands during the pandemic before presenting the results obtained for the full set of 39 European countries.

Exposure Risk Case-Study
Ireland is an interesting case-study, because it went from being one of the best performing countries in Europe, in terms of daily cases per capita, to one of the worst in the world, all in a matter of only a few short weeks. The country experienced an explosion of cases over the Christmas period after having one of the lowest case-counts in Europe just weeks before; if there was ever a need to communicate exposure risk more effectively to people, then Ireland needed it. Figure 1A shows the total number of daily cases and the estimated infections. The shaded region is the difference between these cases and infections-that is, the number of undetected infections-and it is striking to see just how large this difference has been at different points in the pandemic. For example, at the peak of the first wave in April, Ireland reported more than 900 cases per day, but there was an estimated peak of approximately 7,500 infections. Ireland's explosive recent wave saw cases soar to more than 5,000 cases per day, but with infections predicted to be almost 3-times this number. The dotted line in Figure 1A shows the number of undetected infections as a fraction of total infections and it is frequently above 0.8 for Ireland. This is consistent with similar data elsewhere (33) and helps to highlight the scale of the difference between confirmed cases and true infections, even in countries with mature testing infrastructures.
The estimated infections indicate that Ireland's peak cases in October, while higher than the corresponding peak in April, was associated with fewer infections than in April. This suggests Ireland's improved testing infrastructure helped it to identify a much greater fraction of infections (approximately 50-60%) in October compared to the 10-20% of infections that were identified in April. Thus, while Ireland's October case numbers led it into a second, strict lockdown in November it is noteworthy that true infections in October reached lower levels than in April. The same cannot be said for its cases or infections during late December and January, however. Figure 1B shows the corresponding 14-day prevalence, for cases and infections; the prevalence of undetected infections is represented by the shaded region. The corresponding estimated exposure risk is also shown. At the end of 2020 even though Ireland's recent infection prevalence was much higher than it was in April 2020, the corresponding exposure risk was only marginally higher than the peak exposure risk from April, in part due to an improved ability to find cases in late 2020. Unfortunately, even Ireland's enhanced testing infrastructure was unable to cope with the growth of cases at the end of 2020 and the exposure risk peaked at 0.026 (2.6%) in late January 2021 compared to 0.019 in April 2020; thus the relative exposure risk is Ireland was approximately 37% higher in January 2021 compared with the April 2020 wave. Figure 2 shows the equivalent graphs for the Netherlands. As was the case in Ireland, the Netherlands suffered from a significant outbreak in April 2020, with a large sustained outbreak in late 2020 and early 2021. However, the testing infrastructure in the Netherlands was able to cope with this wave and, unlike Ireland, exposure risk remained low (< 1%) throughout.

Comparing Cases, Infections, and Exposure Risk in Europe
Broadly speaking many European countries have seen a similar pattern of cases and infections over the course of 2020. The bar charts in Figure 3 show (A) the total number of confirmed cases, per 100,000 of population, and (B) the estimated infections, per 100,000 of population, for each country in Europe during the early and late periods; countries are ordered, left to right, in descending order of population size.
It is clear in Figure 3A that the confirmed case counts associated with the late period are now significantly higher than the early period for every country. On average, European countries confirmed just under 350 cases per 100 k in the early period, compared with just under 4,600 cases per 100 k in the late period, a relative increase of more than 13x. In part this can be explained by significant improvements in testing capability-more testing means more confirmed casesalthough this is far from a complete explanation since the estimated infections tell a similar story in Figure 3B, indicating that infection rates have also been higher in the late period.
In fact, there are no countries with fewer infections during the late period compared with the early period; on average the late period generated 15,193 infections per 100 k compared with 3,394 infections per 100k in the early period, greater than a 4x increase. These differences in mean cases and infections, between the early and late periods, are statistically significant, based on a one-sided t test; t (76) =-11.36, p < 0.001 for cases and t (76) =-9.44, p < 0.001 for infections. Thus, we can state with some confidence that the late period has been more severe in Europe, because it has resulted in significantly more cases and more infections, even allowing for improvements in testing.
Notably, this does not necessarily mean that the exposure risk is correspondingly greater in the late period, because exposure risk depends on the fraction of undetected infections, rather than the actual number of cases or infections. Since testing infrastructure has improved, we should expect fewer undetected infections, all other things being equal, and thus a relative improvement in the exposure risk. We can see this in Figure 4, which compares the peak exposure risk for countries between the early and late periods. There are nine countries (the UK, France, Italy, Spain, the Netherlands, Belgium, Sweden, Finland, and Norway) whose peak exposure risk was higher in the early period than in the late period. None had more infections in the early period, but they did have a greater proportion of undetected infections, and hence a higher exposure risk. On average, the peak exposure risk in the early period was 0.008 (0.8%), compared with 0.018 (1.8%) in the late period. That's more than twice the peak exposure risk in the late period but from 4x the infections and 13x the confirmed cases. Figure 4 also indicates the (current) exposure risk for each country at the end of the late period (February 28, 2021), using a single filled dot marker, and only 4 countries (Slovakia, Moldova, Albania, and Montenegro) were still peaking at that time. For example, Slovakia, which managed to contain infections very well in the early period (with very low peak exposure risk levels of << 0.01) was suffering from a peak exposure risk > 3% at the end of the late period. For completeness, Figure 4 also shows the fraction of undetected infections for each country in the pandemic so far, which varies from <0.5 (Estonia and Iceland) to almost 0.9 (Bulgaria, Boznia & Herzegovnia, and North Macedonia); on the average fraction of undetected infections overall was 0.70 (U).

DISCUSSION
We have described an approach to estimating the risk of SARS-COV2 exposure (per contact) based on the prevalence of undetected infections per capita. It is important to underscore that this measure is developed to help communicate the level of risk to the general public, rather than as a forecasting tool. Consequently, the level of accuracy of this metric is less important that its explanatory value; although obviously it is important for it to present a reasonably accurate estimate of personal risk if it is to be accepted and trusted. In this section we discuss some of the factors that are important in understanding the likely accuracy of this exposure risk estimate, and some of the ways that the estimate might be improved in the future.
To calculate exposure risk we made a number of assumptions that are worth revisiting. One important assumption was the availability of sufficiently accurate estimates of true infections. We based our estimates on the predictions of four prominent infection models, which have been used and relied upon throughout this pandemic and some recent studies have concluded that such models exhibit reasonable levels of prediction accuracy (34,35). Moreover, our ensemble approach, based on the average of these underlying models, can be expected to produce more robust estimates than if we relied upon any single model. While a detailed analysis of the accuracy of these models is beyond the scope of this paper, it is worth noting that the resulting fraction of undetected infections produced is in broad agreement with the seroprevalence data that does exist; see for example, (36).
Another assumption made was that our estimate of exposure risk should be based on the per capita, 14-day prevalence of undetected infections. The rationale for this is that confirmed cases will be isolating and therefore should not present a significant exposure risk, while undetected cases can be expected to be circulating in the community. There are a number of points worth noting regarding the validity of this assumption and the accuracy of the resulting estimate: 1. Most confirmed cases will have been contagious before they developed symptoms and, therefore, before they were confirmed (37), and many of these will have been circulating in the community. By excluding confirmed cases we are underestimating the exposure risk. 2. Recent studies suggest that about a fifth of infections are genuinely asymptomatic (37,38) and there is some evidence to support the idea that asymptomatic cases are likely to be less contagious than symptomatic ones (38). It is reasonable to assume that asymptomatic cases are more likely to be undetected and thus by including these undetected cases we may be overestimating the exposure risk. On the other hand it is also worth remembering that exposure risk deals with exposure rather than transmission so one could dismiss the lower contagiousness of asymptomatic cases as moot. 3. A majority of undetected cases must be symptomatic, however-if up to 80% of infections are undetected but only 20% are asymptomatic, then on average about 60% of infections are symptomatic and undetected-but presumably they are likely to be milder on average than the typical confirmed case and, again, there is some evidence that milder cases can be less contagious than more severe cases (39), because they shed the virus for a shorter period. Once again this may contribute to an over-estimation of transmission risk.
Undoubtedly, there is potential for error in our approach to estimating exposure risk, but the competing nature of these sources of error may limit its extent in practice. Indeed it may be possible to fine-tune the exposure estimate further too, for example by developing a weighted model which combines confirmed and undetected infections, using differently weighted exposure periods. Either way, the present model provides a useful and straightforward starting point that is likely to be sufficiently accurate and robust as a practical public-facing measure. After all, the primary objective is to help people to understand the level of risk (e.g., low, moderate, and high) so that they can calibrate their behaviour, and, as such, a high degree of precision is less important. We assumed a fixed 10-day lag between case reporting and infection dates and this is another potential source of error. This could be addressed by more accurately accounting for this lag, which is likely to change from location to location depending on factors such as testing capacity. Indeed, it is likely that much more precise estimates of the infection dates of cases are readily known as a result of testing and therefore could be incorporated on a location-by-location basis. Related to this is the issue that not all countries report their cases in a consistent fashion, and the public case data occasionally includes adjustments to rectify such issues. In general these issues appear to be rare and mostly associated with the early months of the pandemic, although it is likely too that case reporting will have been less accurate during the peaks of the late periods when the testing infrastructure of many countries was under great stress.
The approach so far assumes exposure risk to be evenly distributed across a population. This is very unlikely to be the case and studies have highlighted significant variation in seroprevalence at a regional level (40,41) and among different ethnic or socio-demographic groups (42,43). However, this is not a limitation of the approach proposed per se, as much as it is an artefact of our choosing to work with country-level data. Certainly, by working with more fine-grained data (tests, cases, and deaths) it will be possible to generate better estimates of infections and exposure risks for different local regions or among different socio-demographic clusters or age groups. Such data will be available within the data repositories of most health systems and, as such, should be straightforward to use to generate these more precise estimates of exposure risk.
It is worth revisiting why we should go to the trouble of calculating the exposure risk metric in the first place. The contrary position might be that, while it is a meaningful metric, from the point of view of helping an individual to calibrate their level of personal risk, it would be easier to use cases or test positivity rates in much the same way. The point is that neither confirmed cases nor test positivity rates correlate very closely with the exposure risk estimate. For example, the r 2 between confirmed cases (per 100k) and exposure risk during the pandemic, averaged across the 39 countries, is just 0.44 and the corresponding r 2 value for test positivity rates is 0.55. In other words, neither of these metrics on their own is sufficient to accurately estimate exposure risk. Not surprisingly, the estimate of the number of true infections is much more closely correlated with exposure risk (r 2 = 0.88) but even then it is not perfect, and after all, it is a simple calculation to transform infections into exposure risk to produce a far more relatable measure for the general public.
Finally, it is important to recognise that this estimate of exposure risk is not designed to predict the risk of transmission per se, which will depend on other factors and behaviours, such as mask wearing or the prevalence and transmissibility of new virus variants. It may be possible to estimate the likelihood of transmission, given exposure, based on an analysis of the effectiveness of masks, social distancing, and variant transmissibility, but this is beyond the score of this work.

CONCLUSIONS
As many countries, in Europe and elsewhere, continue to impose restrictions to control the recent wave of infections, it is becoming increasingly difficult to contain this virus, especially in the face of emerging variants that are more transmissible. The World Health Organisation has highlighted the importance of enabling people to live their lives safely by empowering them to reduce their levels of personal risk using clear communication messages based upon simple and intuitive metrics to guide their behaviour. We propose exposure risk as one such metric. We describe how it can be readily calculated from existing sources of public pandemic data, and compare the exposure risk of 39 European countries during the pandemic so far. By directly estimating how likely an individual is to be exposed to the virus, it can be argued that exposure risk provides a much more meaningful measure with which to guide behaviour.

DATA AVAILABILITY STATEMENT
All data used in this study are publicly available and cited in the main text. Further questions can be directed to the author.

AUTHOR CONTRIBUTIONS
BS is the originator of the ideas presented in this manuscript and has carried out of all of the work associated with it.

FUNDING
Supported by Science Foundation Ireland through the Insight Centre for Data Analytics under grant 12/RC/2289P2. Funding has been provided to support the research of the Insight Centre for Data Analytics but Science Foundation Ireland has not been involved in this research in any way. Funding is available to support open access publications.