SARS-CoV-2 Infections in the World: An Estimation of the Infected Population and a Measure of How Higher Detection Rates Save Lives

This paper provides an estimation of the accumulated detection rates and the accumulated number of infected individuals by the novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Worldwide, on July 20, it has been estimated above 160 million individuals infected by SARS-CoV-2. Moreover, it is found that only about 1 out of 11 infected individuals are detected. In an information context in which population-based seroepidemiological studies are not frequently available, this study shows a parsimonious alternative to provide estimates of the number of SARS-CoV-2 infected individuals. By comparing our estimates with those provided by the population-based seroepidemiological ENE-COVID study in Spain, we confirm the utility of our approach. Then, using a cross-country regression, we investigated if differences in detection rates are associated with differences in the cumulative number of deaths. The hypothesis investigated in this study is that higher levels of detection of SARS-CoV-2 infections can reduce the risk exposure of the susceptible population with a relatively higher risk of death. Our results show that, on average, detecting 5 instead of 35 percent of the infections is associated with multiplying the number of deaths by a factor of about 6. Using this result, we estimated that 120 days after the pandemic outbreak, if the US would have tested with the same intensity as South Korea, about 85,000 out of their 126,000 reported deaths could have been avoided.

This paper provides an estimation of the accumulated detection rates and the accumulated number of infected individuals by the novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Worldwide, on July 20, it has been estimated above 160 million individuals infected by SARS-CoV-2. Moreover, it is found that only about 1 out of 11 infected individuals are detected. In an information context in which population-based seroepidemiological studies are not frequently available, this study shows a parsimonious alternative to provide estimates of the number of SARS-CoV-2 infected individuals. By comparing our estimates with those provided by the populationbased seroepidemiological ENE-COVID study in Spain, we confirm the utility of our approach. Then, using a cross-country regression, we investigated if differences in detection rates are associated with differences in the cumulative number of deaths. The hypothesis investigated in this study is that higher levels of detection of SARS-CoV-2 infections can reduce the risk exposure of the susceptible population with a relatively higher risk of death. Our results show that, on average, detecting 5 instead of 35 percent of the infections is associated with multiplying the number of deaths by a factor of about 6. Using this result, we estimated that 120 days after the pandemic outbreak, if the US would have tested with the same intensity as South Korea, about 85,000 out of their 126,000 reported deaths could have been avoided.

INTRODUCTION
Governments and policymakers dealing with the COVID-19 pandemic will fail in their objectives if their actions are guided by misleading data or subsequent misinformation. The authorities should have reliable estimations of the number of SARS-CoV-2 infected individuals. However, there are few attempts to estimate the total amount of infections (1)(2)(3)(4)(5). Consequently, health systems face enormous challenges since an unknown and probably a high proportion of all SARS-CoV-2 infections remains undetected. Moreover, data suggest that infected individuals can be highly contagious before the onset of symptoms and SARS-CoV-2 can be also highly contagious in individuals who will never develop any symptoms (6)(7)(8)(9)(10).
Undetected infections are dangerous because infectious individuals spread the coronavirus in unpredictable ways. Undetected infections consist of non-PCR-tested individuals with symptoms and asymptomatic individuals (non-COVID-19 patients) that are likely to remain undetected over all phases of the infection. However, non-PCR-tested individuals with symptoms would tend to auto-select themselves, depending on the severity of their symptoms (from mild to severe), toward treatment and late detection. For this reason, it is important to know the proportion of the infected population which is asymptomatic or has such mild symptoms that self-select them into the group of non-PCR-tested individuals (11)(12)(13)(14)(15). Here, regarding the estimation of the number of infections, and for purposes of public health, I advocate the view by Amartya Sen and Martha Nussbaum that is preferable to be vaguely right than precisely wrong.
The public health problem is that undetected asymptomatic individuals, as well as late-detected SARS-CoV-2 infected individuals, increase the risk for vulnerable groups 1 . Since there is a transmission channel between the level of detection and the number of deaths, the early detection of asymptomatic infections, pre-symptomatic, and mild COVID-19 cases is a public health concern.
Moreover, undetected cases also are responsible for the collapse of the health system by numerous aggravated and sometimes unexpected COVID-19 patients requiring treatment in a short period. Overwhelmed health care systems reduce the recovery prospects of patients by the lack of treatment, undertreatment, increased risk of mistreatment of all patients, including those with COVID-19, and also put at unnecessarily risk the health workforce (21,22).
The problem is that many governments formulate their strategies and responses to the pandemic based on figures that they can control. This problem of reverse causality produces contra-productive incentives for governments since public opinion tends to negatively react to the report of the cumulative and the marginal numbers of detected (reported) cases. The contradiction is that something good, such as the increase in the testing efforts by governments can be perceived by the public opinion as something bad (due to the increase in detections). Worldwide, the media communicates confirmed cases and deaths as the relevant parameters to take into consideration when assessing the evolution of the pandemic. This is a mistake since this emphasis discourages governments from decidedly pushing for mass testing with the obvious consequence of an increased number of detected cases (although, as shown in this paper, there is a theoretical mechanism relating more testing with saving lives). More sophisticated observers would use the crude and adjusted case fatality ratios to assess the pandemic evolution. However, international comparisons show that crude and adjusted case fatality ratios are highly heterogeneous and their use can be misleading (23,24). For instance, the simple division of the cumulative number of deaths by the cumulative number of confirmed cases underestimated the true case fatality ratio in past epidemics (24,25). Although nowadays many case fatality ratios have been estimated in this pandemic correcting many of the observed past biases (26)(27)(28), they are still depending on testing efforts made by countries.
The problem with heterogeneous case fatality ratios (different proportions of all cases that will end in death due to methodological differences on the denominator) is that they are not anchored at any exogenous information that allows researchers to perform international or territorial comparisons based on credible, and transparent assumptions. Consequently, to rely on the number of confirmed cases makes international comparations impossible since governments have shown to implement highly heterogeneous SARS-CoV-2 testing strategies ending up in different levels of location-based under-ascertainment.
In an attempt to solve the mentioned problem, we anchor our analysis in the cumulative number of deaths, which is a statistic much more difficult to alter, in free societies, than the number of SARS-CoV-2 tests 2 .
We use this information together with the newest and sound estimates of the age-stratified infection fatality ratios (IFRs) provided in the recent SARS-CoV-2 related literature. In particular, we base our analysis on the IFR of 0.657% reported in Verity et al. (26). This IFR is very close to the 0.75% reported in a meta-analysis of 13 IFR estimates from a wide range of countries, and that were published between February and April of 2020 (30). We also assume orthogonal attack rates of the infection which is also supported by recent literature (16). By weighting the age-stratified IFRs by the country population agegroups shares in each country, it is possible to obtain countryspecific IFRs.
The relevance of this study is 3-fold: Firstly, the estimation of the true number of infections includes not only confirmed cases but COVID-19 undetected cases, as well as SARS-CoV-2infected individuals without the disease, or in a pre-symptomatic stage. Therefore, to provide an estimation of the true number of SARS-CoV-2 infections is of more utility than to be only informed about the number of confirmed infections. This is because confirmed cases depend on the testing efforts that can be altered or even manipulated by governments. Moreover, one can compare the true estimate of infections with the number of COVID-19 patients that require hospitalization. Such ratios can contribute to predicting, with exogenousto-government information, shortages of the health systems.
Secondly, the estimation of the true number of SARS-CoV-2 infections allows us to estimate the detection rate of the infection, which is a measure of the performance of health systems and governments while facing the pandemic. One can expect that higher levels of detection of SARS-CoV-2 infections, which includes asymptomatic population, and those in their early stages of the infection (which are more infectious) can reduce the risk exposure of the susceptible population with relatively a high risk of death, that is, the elderly and those individuals with preexisting conditions (17). Accordingly, a highly neglected statistic, such as the detection rate should be considered highly relevant from the public health point of view. Thirdly, in this paper, we test the hypothesis that higher detection rates can save lives while providing a measure of this impact (having in mind that is preferable to be vaguely right than precisely wrong). Thus, this study aims to quantify the importance of testing while providing empirical support to the utility of implementing massive SARS-CoV-2 tests.
Overall, this study argues that it is crucial to compute the evolution of the cumulative number of estimated SARS-CoV-2 infected individuals, and subsequently, the cumulative detection rates. This information would provide public health managers and governments the incentives to improve detection rates, rather than to the opposite. Moreover, the identification strategy can be used at lower levels of aggregation, such as regions, provinces, and municipalities to improve responses to the pandemic, including the planning of selective lockdowns or spatial-selective enhancements of the installed critical care units.
In summary, this study proposes a baseline estimation of the number of SARS-CoV-2 infections and detection rates based on current information and transparent assumptions. However, the assumptions discussed later in this paper can be later modified to match the current scientific available evidence and countryspecific developments and contexts.

Data
For this research, we use the cumulative number of deaths and confirmed cases in the world and by country, published by OurWorldInData.org, a project of the Global Change Data Lab with the collaboration of the Oxford Martin Programme on Global Development at the University of Oxford 3 . Age-stratified demographic proportions of the population were obtained from the UN population data 4 (26), the estimated IFRs correct for many types of bias. The infection fatality ratios were obtained after combining adjusted case fatality ratios with data on infection prevalence amongst individuals returning home from Wuhan in repatriation flights. and death. Since this number is unknown, we approach to this number using the sum of the median incubation period as reported in Lauer et al. (31), and the mean number of days between the onset of symptoms and death as reported in Verity et al. (26). For our empirical exercise, we rely on World Development data by the World Bank (GDP per capita and health expenditure as a share of the GDP) 6 and in World Health Organization data for BCG vaccination 7 .
In this study, our regression analysis relies on data for 91 countries covering above 86% of the world population. The remaining countries were excluded because they either do not have significant mortality figures (for instance Uruguay, Monaco, Bermuda, etc.), or full data.

Estimation Strategy
In this study, we rely on a very simple rationale. At a given point in time, the cumulative number of deaths should be a proportion of the cumulative number of infections somewhat in the past. But how many days in the past? The answer lies in the sum of the number of days of incubation and the number of days between the onset of symptoms and death. This rationale follows a report focusing on the 40 most-affected countries by the pandemic in the world (32). However, in this paper, we deviated from the mentioned report by using the key parameters in a different way, which translated into a different estimation of the number of infected individuals.
On average, deaths occur ∼18 days (17.8 days with 95% credible interval [CrI] 16.9-19.2) after the onset of COVID-19 symptoms (26), while the incubation period of COVID-19 has been estimated in about 5 days (5.1 days with 95% CI, 4.5-5.8) as reported in Lauer et al. (31). Thus, by comparing the cumulative number of deaths at time t in country i (cdeaths (i,t) ) with the country-specific infection fatality ratio (ifr i ), which is assumed constant over time, it is possible to obtain a rough approximation of the cumulative number of SARS-CoV-2 infections 23 days (18 days + 5 days) in the past (cinfected (i,t −23 ) ) 8 .
6 https://data.worldbank.org/indicator/ (accessed April 24, 2020). 7 https://apps.who.int/gho/data/node.main.A830?lang=en (accessed April 24, 2020). 8 Differently to Bommer and Vollmer (32), we include the incubation period while avoiding the subtraction of the number of days between the onset of symptoms and detection to the relevant lag period. These differences explain the discrepancies between both set of estimates. Moreover, by combining the cumulative distribution function of the SARS-CoV-2 incubation period as reported in Lauer et al. (31) and an approximation of the Gamma distribution with correction for epidemic growth of the days between the onset of symptoms to death as reported in Verity et al. (26), one can calculate a vector of probabilities to weight the cumulative number of deaths required in equation 1. The weighting vector goes from t −2 (representing the proportion of deaths of those who experienced 1 day between infection and the onset of symptoms, plus one day from the onset of symptoms to death) to t −72 (representing the proportion of deaths of those who experienced 12 days between infection and the onset of symptoms and 60 days between the onset of symptoms to death). The smoothed approach produces almost an identical estimation of the cumulative number of infected individuals. Given that and for the sake of simplicity, we prefer to use the non-smoothing approach. Additionally, we use the ratio between the cumulative number confirmed (detected) cases at time t −23 in country i (cconfirmed (i,t −23 ) ) and the cumulative number of infected individuals (cinfected (i,t −23 ) ) at time t −23 in country i as a rough measure of the cumulative rate of detection of SARS-CoV-2 infections at time t − 23 .

Infection Fatality Ratio
In order to estimate the country-specific infection fatality ratio for country i used in equation 1, we weight the agestratified infection fatality ratios reported in Verity et al. (26), by the age-group population shares of country i. The calculation of the age-stratified infection fatality ratios relies on two assumptions that can be modified when producing point estimates of the number of individuals affected by a SARS-CoV-2 infection. Firstly, it assumes that there are no crosscountry differences in the average overall health status of the population, comorbidity, or in the soundness of the different health systems. In absence of standardized country-specific information of these variables, this assumption is convenient although, at first sight, it can be considered a restrictive one. However, it is quite the opposite since, in richer countries with higher proportions of elderly populations, the estimated infection mortality ratios are likely to be overestimated. If so, our estimates of the infected population represent a lower limit of the true number of infections. The second assumption is that the attack rate of the coronavirus is unrelated to the age and sex of susceptible individuals. This is in concordance with the evidence in respiratory infections in previous pandemic processes (26,33). Then, the distribution of IFRs across countries reflects the "fixed" lethality of the virus associated to a varying demographic structure of the population across the world. Figure 1 presents the calculated infection fatality ratios for the world, and for 50 countries in which the lethality of the pandemic has been more significant.
Recently, a cross-sectional epidemiological study with a super-spreading event in the county of Heinsberg in Germany offered the opportunity to estimate the infection fatality ratio in the community (34). The estimated infection fatality ratio was 0.36%. Although this number is surprisingly low when compared with other estimations, for instance, the used in this study for Germany (1.3%), it is not evident that the true infection fatality ratio is closer to 0.36% rather than 1.3%. This is because there can be local factors that explain the discrepancy as pointed out in the Heinsberg study. Amongst these factors, it might be mentioned comorbidity gaps, ethnic differences, the quality and coverage of the health systems, climatic differences, immunization levels, etc. 9 .
Consequently, it might be necessary to assess the consequences of using an overestimated infection fatality ratio (that is, an IFR closer to the one reported in the Heinsberg study, or others inferred from seroprevalence data (36). The answer is that the number of infections would be underestimated, and that detection rates would be overestimated (since the infection fatality ratio is on the denominator). An overestimation of the detection rates reduces the validity of international rankings based on this figure. However, from the public health point of view, this would be irrelevant since, as discussed later, all countries should increase their detection rates of SARS-CoV-2 infections as much as possible.

Regression Analysis
To investigate whether improving the detection rates of SARS-CoV-2 infections is potentially associated to save lives, we use a parsimonious synchronic cross-country multiple linear regression 10 . That is, we use the information reported 15, 60, and 105 days after the confirmation of the first 100 SARS-CoV-2 infections, which corresponds to the pandemic outbreak (PO). At a given pandemic phase, we regress the natural logarithm of the cumulative number of deaths in country i, ln deaths i , on their estimated detection rates (DR i ) and its squared to assess whether there is a non-linear relationship of this conditional correlation 11 .
The four parsimonious regressions have a demographic control that corresponds to the estimated country-specific infection fatality ratio (ifr i ). This is a non-endogenous control since it only captures the impact of demography (population shares by age-groups) on the number of deaths and not the reverse. The regressions control for the population size of the country i in its natural logarithmic form ln(pop i ). This control is necessary because the share of the susceptible population remains persistently at relatively higher levels in more populated countries when compared with the less populated ones. We also include the natural logarithm of the number of confirmed SARS-CoV-2 infections in each country ln(confirmed i ). This is a measure of the persistence of the mortality process while controlling for cross-country differences in their absolute testing number of days since the pandemic outbreak. On the contrary, a non-synchronic estimation neglects the pandemic phases but considers as reference period the calendar day. 11 Output tables without the square of the detection rates are available in the Supplementary Material. performances. The regressions also control for the economic performance of a country by means of the natural logarithm of the per capita gross domestic product ln(gdppc i ) 12 . We also include the current health expenditure as share of GDP in 2017 (healthshare i ). This control is needed to account for relative resource-dependent differences in the coverage/quality of the health systems around the globe. Finally, we use available data to explore a possible association between BCG vaccination and aggravated cases of COVID-19, and deaths [a relationship which is being investigated in some clinical trials (37)] 13 . The evidence is still inconclusive because the argued existence of uncontrolled confounders (38)(39)(40)(41)(42). However, if these confounders exist, they can bias the relationship between SARS-CoV-2 detections rates and the cumulative number of deaths. Based on this argument, we include a raw of dummies capturing the degree of BCG vaccination coverage as follows: BGC group 1: no mandatory vaccination (up to 49.9% coverage), BGC group 2: 50 to 79.9% coverage, BGC group 3: 80 to 89.9%, BGC group 4: 90 to 98.9%, and BGC group 5: 99 to 100%. The reference category is BCG 12 In constant 2017 international dollars with the same purchase power. 13 https://apps.who.int/gho/data/node.main.A830?lang=en (accessed April 24, 2020). group 1.

Robustness
An alternative approach is used to indirectly investigate the conditional association between detection rates and SARS-CoV-2 related deaths. Instead of using the detection rates and its square, we use the natural logarithm of the estimated number of infections ln(infections i ) while dropping from the equation the natural logarithm of the number of confirmed (detected) SARS-CoV-2 infections as follows: Regarding the statistical inference, significance tests rely on a heteroscedasticity consistent covariance matrix (HCCM) type HC3 which is suitable when the number of observations is small (43). Although in the presence of heteroscedasticity of unknown form, Ordinary Least Square estimates are unbiased, the inference can be misleading due to the fact that the usual tests of significance are generally inappropriate (43). Additionally, we estimate the same set of equations (the main specification and the robustness specification 15, 60, and 105 days after the pandemic outbreak) using robust regressions. We do this because we have the concern that parameter estimates may be biased if, in some countries (outliers), the report of the cumulative number of deaths has been involuntarily altered or even manipulated. Robust regression resists the effect of such outliers, providing better than OLS efficiency when heavytailored error distributions exist as it can be likely the case (44).

Descriptive Analysis
On July 20, the estimated infected population reaches about 160 million individuals (Figure 2A). This number is about 19 times larger than the reported number of confirmed cases (about 8.6 million represented by the dashed line). Note that the number of infections is estimated based on detection rates calculated 23 days in the past. Thus, for the period t −23 to t, the number of SARS-CoV-2 infected individuals are estimated using the estimation rate as in t −23 . Therefore, the estimation of SARS-CoV-2 infected individuals can be biased if detection rates deteriorate or improve considerably within this time span.
The accuracy of our estimations can be assessed by contrasting them against to those provided by population-based seroepidemiological studies. There are some studies of this type focusing on restricted geographical areas, for instance, in Germany and Switzerland (34,45). However, to the best of our knowledge, there is only one country level and large scale population-based seroepidemiological study performed in Spain (46). The ENE-COVID study in Spain finds that, on 11 May, 5% of the population would test IgG positive against SARS-CoV-2. It implies that about 2.35 million individuals were infected by SARS-CoV-2. Similarly, in our study we estimated on 11 May an infected population of about 2.25 million individuals. This evidence suggests that our method can be a suitable alternative when population-based seroepidemiological studies are not available, which is frequently the case. Here, it is important to recognize that, from the public health point of view, it is preferable to be vaguely right than precisely wrong. On 11 May, Spain confirmed only 246,504 cases (about 10% of all estimated infections). At that time, it would have been convenient that public health authorities and the public opinion would have the information that, for each confirmed case, there were significantly much more individuals spreading the infection in unpredictable ways.
Back to the global estimates, by comparing the cumulative number of estimated infections with the cumulative number of confirmed (detected) cases, we obtain, at the end of June 2020, a global detection rate of about 9% ( Figure 2B). The global detection rate curve shows an U-shape with a minimum at the beginning of the third week of March reaching only 1.1%. The last data suggest that detection rates are steadily increasing. Moreover, the semi-logarithmic plot in Figure 2A suggests that the infection stopped spreading at its maximum pace approximately during the third week of March, but unfortunately, it increased its speed again around the last week of June.
The world distribution of the number of deaths, the estimated number of SARS-CoV-2 infections, and the detection rates of SARS-CoV-2 infections across the world are displayed in Figures 3-5, respectively.
Since the global estimates are no more than an aggregation of the trajectories made by the different countries in the world, we investigate how heterogeneous the detection rates across countries are. Table 1 presents this information in a synchronic way. The rankings compare countries in the same phase of their respective pandemic processes, that is after 15, 30, 45, 60, 75, and 90 days after the confirmation of the first 100 SARS-CoV-2 infections (pandemic outbreak). This approach allows us to perform such an international comparison.
At a first sight, it is noteworthy the fact that each of the first 24 countries ranked on the top by the initial detection rate (15 days after the beginning of the pandemic outbreak) does not accumulate more than 500 deaths 45 days after initiating their pandemic processes. Thus, it seems to exist a strong correlation between detection rates and the cumulative number of deaths for a given stage of the pandemic process. Countries with high counts of deaths ranked very badly in their initial detection rates. For example, the US, Spain, Italy, UK, France, and Belgium ranked in place 90, 82, 81, 89, 87, and 85, out of 91 countries listed in the ranking.
A second conclusion is that the relative improvement of detection rates over time, that is, 30, 45, 60, 75, and 90 days after the beginning of the pandemic processes, does not alter the fact that those countries are still ranked the worst in terms of deaths. That is, improving detection over time has declining returns to scale when comes to save lives.
The depicted relationship between detection rates and the cumulative number of deaths remains almost unchanged when using non-synchronic data as of 20 May in Table 2. This table mixes information of countries at different stages from their pandemic processes. So, it must be interpreted with caution. Although efforts to increase detection have been significative in    In Table 3, we present the non-synchronic ranking as of 22 June. The US is in place 35, Spain 49, Italy 53, Belgium 63, UK 61, and France 67. It is noteworthy that, except for Russia, none of the first 16 countries in this ranking have accumulated more than 2,000 fatalities on 22 June. More importantly and despite the incredible efforts to increase the tests amongst the more developed countries, none of them were able to detect more than 16% of the estimated infections (the US detected 15.7% on 22 June). It implies that testing efforts need to be deployed at the first stages of the pandemic process due to its cumulative nature. Countries are ranked by the detection rates of SARS-CoV-2 infections as of 20 May. Source: Own elaboration.
Frontiers in Public Health | www.frontiersin.org Countries are ranked by the detection rates of SARS-CoV-2 infections as of 22 June. Source: Own elaboration.
Frontiers in Public Health | www.frontiersin.org  Table 1A). (B) contains all 61 countries (in Table 1A) whose pandemic processes have more than 120 days since the PO. The dashed fitted line excludes South Korea (KR). Source: Own elaboration.

Tables 2, 3
show that moving over time from relatively low to relatively high cumulative detection rates is unlikely and probably very expensive. This is due to the over proportional efforts needed to expand testing relative to the exponentially growing infections at the early stages of the pandemic. Consequently, from the public health point of view, it is much more advantageous, technically, and economically feasible, to implement mass testing from the very beginning of the pandemic process. To achieve this goal, health authorities and governments would require understanding the linkages between the cumulative detection rates and the minimization of the pandemic related fatalities and economic damage.

Unconditional Analysis
In this analysis, we show the unconditional relationship between detection rates and deaths. The fitted lines in Figure 6 are obtained after regressing the natural logarithm of the cumulative number of deaths in the country i on their estimated cumulative detection rates (DR i ). The results strongly suggest a negative relationship between detection rates and the cumulative number of deaths. This strong negative slope is in concordance with the hypothesis that, by detecting a higher proportion of the SARS-CoV-2 infected population, many lives can be saved, in particular, the lives of the elderly and those individuals with preexisting conditions. The strong association between the number of deaths and the estimated cumulative detection rates remains significant 15, and 120 days after the PO. These associations are shown in Figures 6A,B, respectively. Figure 7 shows the relationship between detection rates (15 and 120 days after the PO) and deaths 120 days after the PO. This descriptive result is of interest since it suggests that, unconditionally, early detection is associated with death outcomes 120 days after the PO to a greater extent than the contemporary detection rates, that is, 120 days after the PO.
Although this information suggests the existence of a strong relationship between detection rates and the cumulative number of deaths, this slope may be confounded by the variables mentioned before. Thus, in the next section, we show the results of our conditional analysis as described earlier.

Multivariate Regression Analysis
Our results in Table 4 show that higher detection rates are associated with a reduction in the number of deaths after controlling for demography (age-structure of the population and population size), economic performance (GDP per capita), and  Table 1A) whose pandemic processes have more than 120 days since the pandemic outbreak. Source: Own elaboration.
the relative resources that the economies devote to their health systems. Over time, the cross-sectional regressions increase in explanatory power, from a R-squared of 0.71 in model 2 to 0.95 in model 8.
Based on these results, Figure 8 shows a strong conditional gradient between detection rates and the cumulative number of deaths. For instance, for a hypothetical country with average and constant endowments, the cost in terms of deaths of detecting 5% vs. 35% is about 1.81 natural logarithm points which corresponds to exp 1.81 = 6.13. That is, the average country detecting 5% is associated with a number of deaths about 6.1 times higher when compared with the same country detecting 35% of all SARS-CoV-2 infections.
To put this result in perspective, let us simulate what would be the number of deaths in the U.S., if instead of detecting 16.02% 120 days after the pandemic outbreak, the country would have detected with the same intensity as South Korea (41.01%). Evaluating the number of deaths at the endowments of the U.S, the country would have fewer deaths by 1.14 natural logarithm points. It means that the current U.S deaths are now 3.13 times higher than they would be if the country would have tested with similar intensity as South Korea. Since the number of deaths 120 days after the pandemic outbreak reached 126,140, detecting at the rate of South Korea would have saved about 85,794 lives in the U.S. at that time.
Finally, looking at the regression coefficients in Table 3, it is noteworthy the fact that during the pandemic outbreak, a 1% higher detection rate is associated with more lives saved than a 1% increase in the health expenditure over the GDP. Our results also suggest that the number of deaths, rather than depending on the relative solvency of the health system, could depend in a greater extent on the size and opportunity of the testing efforts.
The conclusion is the more tests the better. Although in this study we employed an economics inspired approach to figure out the importance of testing, our findings are also endorsed by recent medical literature on coronavirus as well as by another economics inspired models providing support to a causal relationship between detection and saving lives (47)(48)(49)(50).

Robustness of the Results
Robust regressions provide estimates that are close to the ones reported in Table 4. Consequently, it is unlikely that the results reported in this study are outlier driven. Additionally, results are robust to heteroscedasticity of unknown form for small samples. Nevertheless, results should be interpreted with caution. The few observations available for the regressions and lack of data does not allow to rule out the possibility that there are omitted variables that have the potential to bias the results.
It is important to keep in mind that results can be biased if omitted variable problem exists. That is, there are variables that are correlated with the explained outcome but at the same time they are also correlated with the explanatory variables of interest. For instance, one can think in countries implementing lockdowns because lower detection rates Standard errors in parentheses. Significance levels: ***p < 0.01, **p < 0.05, *p < 0.1. Source: Own elaboration.
(Argentina), or relaxed social distancing rules because higher detection rates (Australia). Nevertheless, these non-observed variables yield to an underestimation of the true association between detection rates and the cumulative number of deaths. Thus, detection matters.

DISCUSSION
In this study, we have proposed a method to estimate the number of SARS-CoV-2 infections for the globe and also for all 91 major countries covering more than 86% of the world population. On June 22, we find that, worldwide, about 160 million individuals have been infected by SARS-CoV-2. Moreover, only about 1 out of 11 these infections have been detected. We find that detection rates are very unequally distributed across the globe and that they also increased over time from about 1% during the second and third weeks of March to about 9% on June 22. In an information context in which population-based seroepidemiological studies are not available, this study shows a parsimonious alternative to provide estimates of the number of SARS-CoV-2 infected individuals. By comparing our estimates with those provided by the ENE-COVID study in Spain, we confirm the utility of our approach keeping in mind that from the public health point of view, it is preferable to be vaguely right than precisely wrong. In order to provide reliable estimates of the number of SARS-CoV-2 infections and of the cumulative detection rates, it is necessary that governments provide real-time information about the number of COVID-19 deaths. This study supports the view that an accurate communication of the fatality cases can have consequences on the development of the pandemic itself. Thus, it is also a call for allowing international comparison following WHO international norms and standards for medical certificates of COVID-19 cause of death and International Classification of Diseases (ICD) mortality coding.
Additionally, in our empirical analysis, we have presented parsimonious evidence, that higher detection rates are associated with saving lives. Our conditional analysis shows, for example, that if the US would have had the same detection rate trajectory as South Korea, about two-thirds of the reported deaths could have been avoided (about 85,000 lives).
We find that detection rates at the very early stages of the pandemic seem to explain the great divergence in terms of deaths between countries. Moreover, we showed evidence that moving from relatively low to high cumulative detection rates (and thus saving lives) is unlikely and difficult. This is probably due to the high level of efforts needed to expand testing relative to the exponentially growing infections at the early and middle stages of the pandemic. Thus, from the public health point of view, it is better to deploy testing efforts at the first stages of the pandemic process. To do this would be much more advantageous, in terms of saved lives, but also it would be technically, and economically feasible.
Already, many developed countries with well-developed health sectors were not able to avoid unnecessary deaths by their inaction in terms of promoting mass testing to counter the pandemic outbreak at early stages.
To achieve the goal of implementing mass testing from the very beginning of the pandemic outbreak, governments need to understand the consequences of not doing that. Thus, the evidence presented in this paper offers a rigorous macro-level linkage between detection rates and the cumulative number of deaths which may be useful in future pandemics. This evidence also supports the implementation of mass testing in the likely coming secondary pandemic outbreak (so-called second waves).
Further research should be devoted to understanding why the detection capacity in many advanced countries was too weak, late, and also so weakly correlated (if correlated) with the income levels. In this paper, we claim that governments have incentives against test because the public opinion tends to primarily react to the report of the cumulative and the marginal numbers of detected (reported) cases. The contradiction is that something good, such as the increase in the testing efforts by governments, can be perceived by the general public as something negative (due to the increase in detections). In consequence, are low detection rates in developed countries simply a management failure, or are there long-run incentives that promoted this behavior among many rich countries? It is clear that during the ongoing pandemic, improving detection rates is a race against time, but are there institutional and/or technological constraints that hamper detection improvements that can save lives? All these questions are relevant for this and future pandemics. This study claims that all countries in the world should be able to respond to a pandemic outbreak with massive testing in the very short run. This would be an efficient approach since it is also likely that higher detection rates are also associated with a lesser impact of the pandemic on the economy.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
CV conceived this research, performed the background work, collected the data, performed all statistical analyses, and wrote the paper.