Effects of demographic and weather parameters on COVID-19 basic reproduction number

Timely prediction of the COVID-19 progression is not possible without a comprehensive understanding of environmental factors that may affect the infection transmissibility. Studies addressing parameters that may influence COVID-19 progression relied on either the total numbers of detected cases and similar proxies and/or a small number of analyzed factors, including analysis of regions that display a narrow range of these parameters. We here apply a novel approach, exploiting widespread growth regimes in COVID-19 detected case counts. By applying nonlinear dynamics methods to the exponential regime, we extract basic reproductive number R0 (i.e., the measure of COVID-19 inherent biological transmissibility), applying to the completely naive population in the absence of social distancing, for 118 different countries. We then use bioinformatics methods to systematically collect data on a large number of demographics and weather parameters from these countries, and seek their correlations with the rate of COVID-19 spread. In addition to some of the already reported tendencies, we show a number of both novel results and those that help settle existing disputes: the absence of dependence on wind speed and air pressure, negative correlation with precipitation; significant positive correlation with society development level (human development index) irrespective of testing policies, and percent of the urban population, but an absence of correlation with population density per se. We find a strong positive correlation of transmissibility on alcohol consumption, and the absence of correlation on refugee numbers, contrary to some widespread beliefs. Significant tendencies with health-related factors are reported, including a detailed analysis of the blood type group showing consistent tendencies on Rh factor, and a strong positive correlation of transmissibility with cholesterol levels.


Introduction
The ancient wisdom teaches us that "knowing your adversary" is essential in every battle -and this equally applies to the current global struggle against the COVID-19 pandemic. Understanding the parameters that influence the course of the pandemic is of paramount importance in the ongoing worldwide attempts to minimize the devastating effects of the virus which, to the present moment, has already taken a toll of more than a million lives [1], and resulted in double-digit recession among some of the major world economies [2]. Of all such factors, the ecological ones (both abiotic such as meteorological factors and biotic such as demographic and health-related population properties) likely play a prominent role in determining the dynamics of disease progression [3].
However, making good estimates of the effects that general demographic, health-related, and weather conditions, have on the spread of COVID-19 infection is beset by many difficulties. First of all, these dependencies are subtle and easily overshadowed by larger-scale effects. Furthermore, as the effective rate of disease spread is an interplay of numerous biological, medical, social, and physical factors, a particular challenge is to differentiate the dominating effects of local COVID-19-related policies, which are both highly heterogeneous and time-varying, often in an inconsistent manner. And this is precisely where, in our view, much of the previous research on this subject falls short.
There are not many directly observable variables that can be used to trace the progression of the epidemics on a global scale (i.e. for a large number of diverse countries). The most obvious one -the number of detected cases -is heavily influenced both by the excessiveness of the testing (which, in turn, depends on non-uniform medical guidelines, variable availability of testing kits, etc.) and by the introduced infection suppression measures (where the latter are not only non-homogeneous but are also erratically observed [4]). Nevertheless, the majority of the research aimed to establish connections of the weather and/or demographic parameters with the spread of COVID-19 seeks correlations exactly with the raw number of detected cases [5][6][7][8][9][10][11][12][13]. For the aforementioned reasons, the conclusions reached in this way are questionable. Other variables that can be directly measured, such as the number of hospitalized patients or the number of COVID-19 induced deaths [14][15][16], again depend on many additional parameters which are difficult to take into account: level of medical care and current hospital capacity, advancements, and changing practices in treating COVID-19 patients, the prevalence of risk groups, and even on the diverging definitions of when hospitalization or death should be attributed to the COVID-19 infection. As such, these variables are certainly not suitable as proxies of the SARS-CoV-2 transmissibility per se.
On the other hand, as we here empirically find (and as theoretically expected [17,18]) the initial stage of the COVID-19 epidemic (in a given country or area) is marked by a period of a nearly perfect exponential growth in the number of cases, which typically lasts for about two weeks (based on our analysis of the available data). One can observe widespread dynamical growth patterns for many countries, with a sharp transition between exponential, superlinear (growth faster than linear), and sublinear (growth slower than linear) regimes (see Fig. 2 below)the last two representing a subexponential growth. We here concentrate on the initial exponential period of the detected-case data, characterizing the period before the control measures took effect, from which we can deduce the basic reproduction number R0 [19], following a simple and robust mathematical (dynamical) model presented here. R0 is a straightforward and important epidemiological parameter characterizing the inherent biological transmissibility of the virus, in a completely naïve population, and the absence of social distancing measures [20,21]. To emphasize the absence of social distancing, the term R0,free is also used,for simplicity, we further denote 0 0, free RR  . R0 is largely independent of the implemented COVID-19 policies and thus truly reflects the characteristics of the disease itself, as it starts to spread unhampered through the given (social and meteorological) settings. Namely, the exponential period

COVID-19 Environmental Dependence
3 ends precisely when the effect of control measures kick in, which happens with a delay of ~10 days after their introduction [22], corresponding to the disease latent period, and to the time between the symptom onset and the disease confirmation. Not only that very few governments had enacted any social measures before the occurrence of a substantial number of cases [4], but also the length of the incubation period makes likely that the infection had been already circulating for some time through the community even before the first detected case (and that the effects of the measures are inescapably delayed in general). Also, the transition from the exponential to the subsequent subexponential phase of the epidemics is readily visible in the COVID data (see Fig. 2). Furthermore, R0 is invariant to the particular testing guidelines, as long as these do not significantly vary over the (here relatively short) studied period.
In the analysis presented here, we consider 42 different weather, demographic, and health-related population factors, whose analyzed ranges correspond to their variations exhibited in 118 world countries. While some authors prefer more coherent data samples to avoid confusing effects of too many different factors [5-8, 11, 16], this consideration is outweighed by the fact that large ranges of the analyzed parameters serve to amplify the effects we are seeking to recognize and to more reliably determine the underlying correlations. For example, while the value of the Human Development Index (HDI, a composite index of life expectancy, education, and per capita income indicators) varies from 0.36 to 0.96 over the set of analyzed countries, this range would drop by an order of magnitude [23] if the US was chosen as the scope of the study (other demographic parameters exhibit similar behavior). The input parameters must take values in some substantial ranges to have measurable effects on R0 (i.e. small variations may lead to effects which are easily lost in statistical fluctuations).
The number of considered parameters is also significant, especially when compared to other similar studies [6-9, 11, 16, 24-26]. In a model where a large number of factors are analyzed under the same framework, consistency of the obtained results, in terms of agreement with other studies, commonsense expectations, and their self-consistency, becomes an important check of applied methodology and analysis. Furthermore, a comprehensive and robust analysis is expected to generate new findings and lead to novel hypotheses on how environmental factors influence COVID-19 spread. Overall, we expect that the understanding achieved here will contribute to the ability to understand the behavior of the pandemics in the future and, by the same token, to timely and properly take measures in an attempt to ameliorate the disease effects.

Modified SEIR model and relevant approximations
There are various theoretical models and tools used to investigate and predict the progress of an epidemic [18,19]. We here opted for the SEIR compartmental model, up to now used to predict or explain different features of COVID-19 infection dynamics [27][28][29][30][31]. The model is sufficiently simple to be applied to a wide range of countries while capturing all the features of COVID-19 progression relevant for extracting the R0 values. The model assumes dividing the entire population into four (mutually exclusive) compartments with labels: (S)usceptible, (E)xposed, (I)fected, and (R)ecovered.
The dynamics of the model (which considers gradual transitions of the population from one compartment to the other) directly reflects the disease progression. Initially, a healthy individual has no developed SARS-CoV-2 virus immunity and is considered as "susceptible". Through contact with another infected individual, this person may become "exposed" -denoting that the transmission of the virus has occurred, but the newly infected person at this point has neither symptoms nor can yet transmit the disease. An exposed person becomes "infected" -in the sense of becoming contagiouson average after the so-called "latent" period which is, in the case of COVID-19, approximately 3 days. 4 This is a provisional file, not the final typeset article After a certain period of the disease, this person ceases to be contagious and is then considered as "recovered" (from the mathematical perspective of the model, "recovered" are all individuals who are no longer contagious, which therefore also includes deceased persons). In the present model, the recovered individuals are taken to be no longer susceptible to new infections (irrespectively of whether the COVID-19 immunity is permanent or not, it is certainly sufficiently long in the context of our analysis).

COVID-19 Environmental Dependence
Accordingly, almost the entire population initially belongs to the susceptible class. Subsequently, parts of the population become exposed, then infected, and finally recovered. SARS-CoV-2 epidemic is characterized by a large proportion of asymptomatic cases (or cases with very mild symptoms) [32], which leads to a large number of cases that remain undiagnosed. For this reason, only a portion of the infected will be identified (diagnosed) in the population, and we classify them as "detected". This number is important since it is the only direct observable in our model, i.e. the only number that can be directly related to the actual COVID-19 data. This dynamic is schematically represented in Fig. 1, and is governed by the following set of differential equations: In the above equations, S, E, I, and R denote numbers of individuals belonging to, respectively, susceptible, exposed, infected, and recovered compartments, D is the number of detected cases, while N is the total population. Parameter β denotes the transmission rate, which is proportional to the probability of disease transmission in contact between a susceptible and an infectious subject. Incubation rate σ determines the rate at which exposed individuals become infected and corresponds to the inverse of the average incubation period. Recovery rate γ determines the transition rate between infected and recovered parts of the population, (i.e. 1/γ is the average period during which an individual is infectious). Finally, and are detection efficiency and the detection rate. All these rate parameters are considered constant during the analyzed (brief) period.

5
In the first stage of the epidemic, when essentially the entire population is susceptible (i.e. S/N ≈ 1) and no distancing measures are enforced, the average number of secondary infections, caused directly by primary infected individuals, corresponds to the basic reproduction number R0. The infectious disease can spread through the population only when R0 > 1, and in these cases, the initial growth of the infected cases is exponential. Though R0 is a characteristic of the pathogen, it also depends on environmental abiotic (e.g. local weather conditions), as well as biotic factors (e.g. prevalence of health conditions, and population mobility tightly related to the social development level). In addition to R0, the transmissibility is also expressed in terms of the effective reproduction number Re that also takes into account the effects of the introduced social measures. Re is not considered in this work, as we are concerned with the factors that affect the inherent biological transmissibility of the virus, independently from the applied measures.

COVID-19 growth regimes
If we observe the number of total COVID-19 cases (e.g. in a given country) as a function of time, there is a regular pattern that we observe: the growth of the detected COVID cases is initially exponential but slows down after some timewhen we say it enters the subexponential regime. The subexponential regime can be further divided into the superlinear (growing asymptotically faster than a linear function) and sublinear regime (the growth is asymptotically slower than a linear function). This typical behavior is illustrated, in the case of Italy, in Fig. 2. The transition to the subexponential regime occurs relatively soon, much before a significant portion of the population gains immunity, and is a consequence of the introduction of the infection suppression measures. Transitions of the growth patterns (here shown for Italy) from exponential (red), to superlinear (blue) and sublinear (green) regime. The three insets correspond to the log-linear scale (exponential), log-log scale (superlinear), and linear-log scale (sublinear). Dots correspond to detected infections, starting from 20/02/2020. 6 This is a provisional file, not the final typeset article

Inference of the basic reproduction number R 0
In the initial exponential regime, a linear approximation to the model can be applied. Namely, in this stage, almost the entire population is susceptible to the virus, i.e. S/N≈1, which simplifies the equation (1.2) to: By combining expressions (1.3.) and (3.1.) one obtains: where we have introduced a two-by-two matrix: The solution for the number of infected individuals can now be written: where λ+ and λdenote eigenvalues of the matrix А, i.e. the solutions of the equation: The eigenvalues must satisfy: leading to: (λ + σ) * (λ + γ) − β * σ = 0. (3.6. ) The solutions of (3.6.) are: Since λ-< 0, the second term in (3.4) can be neglected for sufficiently large t. More precisely, numerical analysis shows that this approximation is valid already after the second day, while, for the extraction of R0 value we will anyhow ignore all data before the fifth day (for the analyzed countries, numbers of cases before the fifth day were generally too low, hence this early data is dominated by stochastic effects/fluctuations). Hence, ( ) is proportional to ( + ), i.e.: By expressing β from (3.7) and inserting the formula into (1.6.), we obtain: By taking the logarithm, the above expression leads to: (3.12. ) from which λ + can be obtained as the slope of the log(D(t)) function. From equation (3.9.), we thus obtain the R0 value as a function of the slope of log(D(t)), where the latter can be efficiently inferred from the plot of the number of detected COVID-19 cases for a large set of countries.

COVID-19 Environmental Dependence
The SEIR model and the above derivation of R0 assume that the population belonging to different compartments is uniformly mixed. Possible heterogeneities may tend to increase R0 values [18]. However, this would not influence the results obtained below, as our R0 values are consistently inferred for all analyzed countries by using the same model, methodology, and parameter set. Moreover, our R0 values are in agreement with the prevailing estimates in the literature [33].

Demographic and weather data acquisition
For the countries for which R0 was determined through the procedure above, we also collect a broad spectrum of meteorological and demographic parameter values. Overall, 118 countries were selected for our analysis, based on the relevance of the COVID-19 epidemiological data. Namely, a country was considered as relevant for the analysis if the number of detected cases on June 15th was higher than a threshold value of 1000. A few countries were then discarded from this initial set, where the case count growth was too irregular to extract any results, possibly due to inconsistent or irregular testing policies. As a source for detected cases, we used [34].
In the search for factors correlated with COVID-19 transmissibility, we have analyzed overall 42 parameters, 11 of which are related to weather conditions, 30 to demographics or health-related population characteristics, and one parameter quantifying a delay in the epidemic's onset.
Our main source of weather data was project POWER (Prediction Of Worldwide Energy Resources) of the NASA agency [35]. A dedicated Python script was written and used to acquire weather data via the provided API (Application Programming Interface). NASA project API allows a large set of weather parameters to be obtained for any given location (specified by latitude and longitude) and given date/time combination. Since we needed to assign a single value to each country (for each analyzed parameter), the following method was used for averaging meteorological data. In each country, a number of largest cities 1 were selected and weather data was taken for the corresponding locations. This data was then averaged, weighted by the population of each city. Also, data were averaged over the period used for R0 estimation. (More precisely, to account for the time between disease transmission and the case confirmation, we shifted this period 12 days into the past.) The only meteorological parameter not available from the NASA POWER project was the information on UV (ultra-violet) radiation, which was obtained from the OpenUV source [www.openuv.io], using the same averaging methodology.  The log(D(t)) function, for a subset of selected countries, is shown in Fig. 3. The obvious linear dependence confirms that the progression of the epidemic in this stage is almost perfectly exponential. For each country, the parameter + is directly obtained as the slope of the corresponding linear fit of the log(D(t)), and the basic reproduction number R0 is then calculated from Equation (3.9).
Here, we used the following values for the incubation rate, σ = 1/3 day -1 , and for the recovery rate γ =1/4 day -1 , per the commonly accepted values in the literature [20]. Note that possible variations in these two experimental values would not significantly affect any conclusions about R0 correlations, due to the mathematical properties of the relation (3.9): it is a strictly monotonous function of + and the linear term + * ( + )/ * dominantly determines the value of R0.
Supplementary tables contain the values for 42 variables, for all countries. Correlations of each of the variables with R0 are given in Table 2. Values for the Pearson correlation coefficient are further shown below, though consistent conclusions are also obtained by Kendall and Spearman correlation coefficients (which do not assume a linear relationship between variables). , and percentage of refugee population by country or territory of asylum (RE). The statistical significance of each correlation is indicated in the legend, while "ns" stands for "no significance".
The first set of results that corresponds to, roughly speaking, general demographic data, is presented in Fig. 4. The plot in panel A shows the distribution of R0 vs. HDI values for all countries, where a higher HDI score indicates the more prosperous country concerning life expectancy, education, and per capita income [40]. This parameter was included in the study due to a reasonable expectation that a higher level of social development also implies a higher level of population interconnectedness and mixing (stronger business and social activity, more travelers, more frequent contacts, etc.), and hence that HDI could be related to the SARS-CoV-2 transmissibility. Indeed, we note a strong, statistically highly significant correlation between the HDI and the R0 value, with R = 0.37, and p=410 -5 , demonstrating that the initial expansion of COVID-19 was faster in more developed societies.
The social security and health insurance coverage (INS) "shows the percentage of population participating in programs that provide old age contributory pensions (including survivors and disability) and social security and health insurance benefits (including occupational injury benefits, paid sick leave, maternity, and other social insurance)" [36]. Reflecting the percentage of the This is a provisional file, not the final typeset article population covered by medical insurance and likely feeling more protected from the financial effects of the epidemics, this indicator shows a strong (R = 0.4) and highly significant (p = 410 -4 ) positive correlation with R0. The percentage of urban population (UP) and BUCAP density (BAP) are both included as measures of how concentrated is the population of the country. While the UP value simply shows what percentage of the population lives in cities, the BUCAP parameter denotes the amount of the built-up area per person. Of the two, the former shows a highly significant positive correlation with the COVID-19 basic reproduction number, whereas the latter shows no correlation. Median age (MA) should be of obvious potential relevance in COVID-19 studies since it is well known that the disease more severely affects the older population [41]. Thus we wanted to investigate also if there is any connection of age with the virus transmissibility. Our results are suggestive of such a connection, since we obtained a strong positive correlation of age with R0, with very high statistical confidence. Infant mortality (IM) is defined as the number of infants dying before reaching one year of age, per 1,000 live births. Lower IM rates can serve as another indicator of the prosperity of a society, and it turns out that this measure is also strongly correlated, but negatively, with R0 and p = 810 -5 (showing again that more developed countries, i.e. those with lower IM rates, have experienced more rapid spread of the virus infection). Net migration (I-E) represents the five-year estimates of the total number of immigrants less the annual number of emigrants, including both citizens and noncitizens. This number, related to the net influx of foreigners, turns out to be positively correlated, in a statistically significant way, with R0. However, according to our data, the percentage of refugees, defined as the percentage of the people in the country who are legally recognized as refugees and were granted asylum in that country, is not correlated with R0 at all.
Another set of parameters corresponds to medically-related demographic parameters and is shown in the upper part of Fig. 5. The plot in panel A represents the average blood cholesterol level (in mmol/L) in the population of various countries, plotted against the value of R0. The two parameters are strongly correlated, with R = 0.4, and p=610 -6 . Another demographic parameter with clear medical relevance, that has a comparatively strong and significant positive correlation with R0, is the alcohol consumption per capita (ALC), as shown in panel B of Fig. 5. Our data shows that R0 is also positively correlated, with high statistical significance, with the prevalence of obesity and to a somewhat smaller extent with the percentage of smokers. Here, obesity is defined as having a body-mass index over 30. A medical parameter which is strongly, but negatively, correlated with R0, is a measure of prevalence and severity of COVID-19 relevant chronic diseases in the population (CD). This parameter is defined as "the percent of 30-year-old-people who would die before their 70th birthday from any of cardiovascular disease, cancer, diabetes, or chronic respiratory disease, assuming that s/he would experience current mortality rates at every age and s/he would not die from any other cause of death" [36]. The percentage of people with raised blood pressure (RBP) is also negatively correlated with R0, though this correlation is not as strong and as statistically significant as in the case of the CD parameter. Here, raised blood pressure is defined as systolic blood pressure over 140 or diastolic blood pressure over 90, in the population older than 18. The percentage of smokers exhibits exhibit statistically significant (though not large) positive correlation. Two medical-demographic parameters that show no correlation with R0 in our data are the prevalence of insufficient physical activity among adults aged over 18 (IN) and BCG immunization coverage among 1-year-olds (BCG).
In Fig. 5C we see that blood types are, in general, strongly correlated with R0. The highest positive correlation is exhibited by Aand Otypes, with a Pearson correlation of 0.4 and 0.39, and a very high statistical significance of p = 10 -4 and p = 210 -4 , respectively. Taken as a whole, group A is still strongly and positively correlated with R0, albeit with a bit lower statistical significance (A + type correlation has p-value two orders of magnitude higher than A -). This is not so for group O that, overall, does not seem to be correlated to R0 (O + even shows a certain negative correlation but without statistical significance). Our data reveals a highly significant positive correlation also for ABsubtype (R = 0.31, p = 0.003), while neither the AB + subtype nor overall AB group is significantly correlated with the basic reproduction number. Clear negative correlation is exhibited only by B blood group (R = -0.31, p = 0.004), mostly due to the negative correlation of its B + subtype (R = -0.34, p = 0.001), whereas Bsubtype is not significantly correlated with R0 in our data. If we consider the rhesus factor alone, we again observe very strong correlations with R0 and with very high statistical significance: Rhand Rh + correlate positively (R = 0.4) and negatively (R = -0.4), respectively, with very high statistical significance (p = 210 -4 ). The tendency of Rhto increase the transmissibility, and of Rh + to decrease it, is therefore consistent with the results obtained for all four individual blood-groups. ; overall value for that group, correlation only for Rh + subtype of the group, and correlation for Rhsubtype is shown. The two rightmost bars correspond to the overall correlation of Rh + and the overall correlation of the Rhblood type with R 0 . The convention for representing the statistical significance of each correlation is the same as in Fig. 4.
In Fig. 6, the onset represents the delay of the exponential phase and is defined, for each country, as the number of days from February 15 to the start of the exponential growth of detected cases. The This is a provisional file, not the final typeset article motivation was to check for a possible correlation between the delay in the onset of the epidemic and the rate at which it spreads. Indeed, our data shows that such correlation exists and that it is very strong and statistically significant: R = -0.48 and p = 410 -8 . In other words, the later the epidemic started, the lower (on average) is the basic reproduction number. The convention for representing the statistical significance of each correlation is the same as before.
Panel B of Fig. 6 shows the correlation of R0 with some of the commonly considered air pollutants. Our data reveal a statistically significant positive correlation of R0 with NO2 and SO2 concentrations.
Other pollutants -CO, PM2.5 (fine inhalable particles, with diameters that are generally 2.5 micrometers and smaller), and PM10 (inhalable particles, up to 10 nm in diameter)show no statistically significant correlation with R0.
Next, we consider weather factors. Panels C and D of Fig. 6 show correlations of precipitation, temperature, specific humidity, UV index, air pressure, and wind speed with the reproduction number R0. Of these, precipitation, temperature, specific humidity, and UV index show a very strong negative correlation, at a very high level of statistical significance. Of the other two parameters, air pressure also shows signs of negative correlation but with no statistical significance, while the wind speed is not correlated at all with R0 in our data.

Discussion
The present paper aimed to establish relations between the COVID-19 transmissibility and a large number of demographic and weather parameters. As a measure of COVID-19 transmissibility, we have chosen the basic reproduction number R0a quantity that is essentially independent of the variations in both the testing policies and the introduced social measures (as discussed in the Introduction), in distinction to many studies on transmissibility that relied on the total number of detected case counts (see e.g. [5][6][7][8][9][10][11][12][13]. We have covered a substantial number of demographic and weather parameters, included in our analysis all world countries that were significantly affected by the COVID-19 pandemic (and had a reasonable consistency in tracking the early phase of infection progression). While a number of manuscripts have been devoted to factors that may influence COVID-19 progression, only a few used an estimate of R0 or some of its proxies [42-44] -these studies were however limited to China, and included a small set of meteorological variables, with conflicting results obtained for their influence on R0. Therefore, a combination of i) using a reliable and robust measure of COVID-19 transmissibility, and ii) considering a large number of factors that may influence this transmissibility within the same study/framework, distinguishes our study over prior work. We, however, must be cautious when it comes to further interpretation of the obtained data. As always, we must keep in mind that "correlation does not imply causation" and that further research is necessary to identify possible confounding factors and establish which of these parameters truly affect the COVID-19 transmissibility. Due to the sheer number of studied variables, an even larger number of parameters that might be relevant but are inaccessible to study (or even impossible to quantify), as well as due to possible intricate mutual relations of the factors that may influence COVID-19 transmission, this is a highly nontrivial task. While we postpone any further analysis in this direction to future studies, we will, nevertheless, consider here the possible interpretations of the obtained correlations, assuming that they also probably indicate the existence of at least some causation.
We will first consider the demographic variables presented in Fig. 4. The obtained correlation of the human development index (HDI) with the basic reproduction number is both strong and hardly surprising. The level of prosperity and overall development of a society is necessarily tied with the degree of population mobility and mixing, traffic intensity (in particular air traffic), business and social activity, higher local concentrations of people, and other factors that directly or indirectly increase the frequency and range of personal contacts [45], rendering the entire society more vulnerable to the spread of viruses. In this light, it is reasonably safe to assume that the obtained strong and highly statistically significant correlation of HDI with R0 reflects a truly causal connection. However, some authors offer also a different explanation: that higher virus transmission in more developed countries is a consequence of more efficient detection of COVID-19 cases due to the better-organized health system [45]but since our R0 measure does not depend on detection efficiency, presented results can be taken as evidence against such hypothesis.
The interpretation is less clear for other demographic parameters, for example, the percentage of the population covered by medical and social insurance programs (INS). Besides, we are not aware of any previous study of such a variable in the context of COVID-19. One possibility is to attribute the statistically significant strong positive correlation of this parameter with R0 to a hypothetical tendency of population to more easily indulge in the epidemiologically-risky behavior if they feel well-protected, both medically and financially, from the risks posed by the virus; conversely, that the population that cannot rely on professional medical care in the case of illness is likely to be more cautious not to 14 This is a provisional file, not the final typeset article contract the virus. Or, having in mind that the proportion of people covered by medical and social insurance is generally (though not necessarily) associated with the level of social development, this correlation may be non-causal and simply reflect the already established connection of the HDI with the dynamics of the disease progression. The latter is most probably the case with the infant mortality (IM), where low mortality ratios point to a better medical system, which goes hand in hand with the overall prosperity and development of the country [46]explaining in this way the observed negative correlation with R0.

COVID-19 Environmental Dependence
We are faced with a similar dilemma in the case of median age (MA). The strong positive correlation of virus transmission with the median age of the population can be again interpreted as an indirect consequence of the connection of this parameter with the level of development of the country since the higher standard of living positively affects the life expectancy [45]. However, the same correlation can be considered in the light of the fact that clinical and epidemiological studies have unanimously shown that the elderly are at higher risk of developing a more severe clinical picture, and our result may indicate that the virus also spreads more efficiently in the elderly population. A possible mechanism responsible for higher susceptibility to SARS-CoV-2 virus in the elderly population would be the far greater number of chronic patients in this population who are prescribed certain drugs (ACE inhibitors and angiotensin-receptor blockers) that lead to increased levels of ACE2 receptors through which the SARS-CoV-2 virus can enter cells [47]. However, the weak correlation of virus transmission with the percentage of smokersin spite that tobacco users also have higher numbers of these receptors, as well as the negative correlation with the prevalence of chronic diseases and high blood pressure, do not speak in favor of such conclusion (see below). A general weakening of the immune system with age may also be the reason for greater susceptibility to viral infections [48], and the positive correlation we obtained could be partly due to a large number of elderly people grouped in nursing homes, where the virus can expand very quickly [49].
The correlation of population density with R0, or the lack of thereof, is more challenging to explain. Naively, one could expect that COVID-19 spreads much more rapidly in areas with a large concentration of people, but, if exists, this effect is not that easily numerically captured. As the standard population density did not show any correlation with the reproduction number R0 (not shown), we explored some more subtle variants. Namely, the simplest reason why the data shows no correlation of R0 with population density would be that the density, calculated in the usual way, is too averaged out: the most densely-populated country on our list, Monaco, has roughly 10.000 times more people per square kilometer than the least densely-populated Australia. However, Melbourne downtown has a similar population density as Monaco and far more people, so one would expect no a priori reason that its infection progression would be slower (and the R0 rate for Australia as a whole will be dominantly determined by the fastest exponential expansion occurring anywhere on its territory). For this reason, we included the BUCAP parameter into the analysis, which takes into account only population density in built-up areas. Surprisingly, even this parameter did not exhibit any statistically significant correlation. Several studies may serve as examples showing that the correlation of population density with the rate of COVID-19 expansion can be expected only under certain conditions since the frequency of contacts between people is to a large extent modulated by additional geographical, economic, and sociological factors [50-53]. The observed absence of a correlation of the average population density of the state, even in the case when only the inhabited part of its territory is taken into consideration as with the BUCAP parameter, could be therefore expected and possibly indicates that such a correlation should be sought at the level of smaller populated areasfor example, individual cities [54]. This conclusion is somewhat supported by the obtained highly significant and strong positive correlation of R0 with the percentage of the population living in cities (UP) and which probably reflects the higher number of encounters between people in a more densely populated, urban environment [10]. It is also possible that virus spread might have a highly non-linear dependence on the population density -namely, that an outbreak in a susceptible population requires a certain threshold value of its density, while below that value population density ceases to be a significant factor influencing virus [53, 55, 56].
Another demographic parameter that exhibits a significant correlation with R0 in our data is the net migration (I-E), denoting the number of immigrants less the number of emigrants. Unlike this number, which shows a positive correlation, the number of refugees (RE) seems not to be correlated at all. By definition, migrants deliberately choose to move to improve their prospects, while refugees have to move to save their lives or preserve their freedom. Migrants, arguably, tend to stay in closer contact with the country of their origin and have more financial means for that, which likely contributes to more frequent border crossings and more intensive passenger traffic [57], thereby promoting the infection spread. Alternatively, since the migration is generally directed towards countries with higher living standards, this correlation could be also seen as a consequence of the correlation of HDI. However, any similar correlation is curiously absent in the case of refugees, who generally also migrate towards developed countries. A possible interpretation is that since refugees are mostly stationed in migrant camps, there is less possibility of spreading the virus outside through contacts with residents, but there is a high possibility of escalation of the epidemic within camps with a high concentration of people [58]. Additionally, since refugees usually migrate in large groups, there is less need for frequent travel abroad to stay in contact with relatives, than in the case of (economic or academic) migrants. We did not find any other attempt in the literature to examine this issue. In any case, our results demonstrate that refugees are certainly not a primary cause of concern in the pandemics, contrary to fears expressed in some media.
Of the medical factors, the strongest correlation of R0 is established with elevated cholesterol levels, as shown in Fig. 5. Cholesterol may be associated with a viral infection and further disease development through a complex network of direct and indirect effects. In vitro studies of the role of cholesterol in virus penetration into the host body, done on several coronaviruses, indicate that its presence in the lipid rafts of the cell membrane is essential for the interaction of the virus with the ACE2 receptor, and also for the latter endocytosis of the virus [59]. Besides, it has been shown that lowering cholesterol levels significantly reduces the ability of the virus to enter the cell. On the other hand, elevated cholesterol is often related to obesity (OB)another medical parameter that exhibits a highly significant, strong correlation with R0. However, based on the levels of correlation alonewhich is stronger for elevated cholesterol than for the prevalence of obesityit is not likely that the explanation of cholesterol correlation with R0 should be sought in its relation with obesity, though the interpretation in the opposite direction is not inconceivable: in principle, obesity might be a relevant factor in COVID-19 epidemic exactly due to the effects of cholesterol on SARS-CoV-2 susceptibility. Of course, other mechanisms of influence are also possible: it is known that obesity adversely affects a person's immune system, among other things by the fact that the adipose tissue of obese people excessively produces pro-inflammatory cytokines [60]. It is worth noting that, in developing countries, an increase in population obesity comes as a consequence of the adoption of a lifestyle characteristic of developed western countries, but obesity does not show a simple correlation with the state's development [61]precluding a simple explanation of the correlation via HDI as the confounding variable. Overall, while the correlation of obesity with more severe prognosis in COVID-19 is well established in the literature, its relation to COVID-19 transmissibility is only mentioned in [10] and hitherto not explained.
Often related to obesity is also raised blood pressure (RBP), and we have discovered that this factor is also correlated, at high statistical significance, with R0but this correlation turns out to be negative. . It is interesting to note that this correlation supports the existing hypothesis about the beneficial effect of ACE inhibitors and ARBs [62] (which are standardly used in the treatment of hypertension) in the This is a provisional file, not the final typeset article dispute about their potential use in the fight against COVID-19. Namely, on the one hand, the use of ACE inhibitors and ARBs upregulates ACE2 expression, which could hypothetically help the infection of SARS-CoV-2 since the virus uses these receptors to enter cells. On the other, thanks to its antiinflammatory activity, ACE2 can alleviate tissue damage. Also, soluble ACE2 at increased concentrations could bind SARS-CoV-2 particles and thereby reduce the titer of viruses available for binding to tissue ACE2. These hypotheses have not been directly confirmed in COVID-19 patients, but the harmful effect of ACE inhibitors and ARBs has not been proven so far [62, 63]. To our knowledge, no previous findings are identifying high blood pressure as a predisposition factor for getting infected with the SARS-CoV-2 virusit is only known that, based on clinical studies, RBP appears to be a risk factor for hospitalization and death due to COVID-19 [62,63]. Explanation of the observed negative correlation might be sought in the relation of raised blood pressure with chronic diseases that are known to be relevant for the COVID-19 outcome. Namely, our data show, at very high statistical significance, a strong negative correlation of R0 with the risk of death from a batch of chronic diseases (cardiovascular disease, cancer, diabetes, and chronic respiratory disease), agreeing in this regard with some recent research [10,64]. These diseases are identified as relevant comorbidities in the context of COVID-19, leading to a huge increase in the severity of the infection and poorer prognosis [65, 66] and, therefore, the discovered negative correlation comes as a surpriseparticularly when contrasted to the positive correlation of obesity (where both are recognized risk factors in COVID-19 illness). One possible explanation is that the correlation may be due to potentially lower mobility of people with chronic diseases compared to the general mobility of the population. Additionally, it is possible that these people, being aware to belong to a high-risk group, behaved more cautiously even before the official introduction of social distancing measures.
According to our analysis, the prevalence of certain health-hazard habits is also significantly correlated to COVID-19 transmissibility. Chronic excessive alcohol consumption has, in general, a detrimental effect on immunity to viral and bacterial infections, which, judging by the strong positive correlation we obtained, most likely applies also to SARS-CoV-2 virus infection. This correlation contradicts the belief that alcohol can be used as a protective nostrum against COVID-19, which has spread in some countries and even led to cases of alcohol poisoning [67].
Regarding the impact of smoking on SARS-CoV-2 virus infectionthe results are controversial [68].
Smokers are at higher risk of developing a severe clinical picture of diseases caused by influenza viruses and MERS, as well as other respiratory viral infections, so it can be assumed that this also applies to COVID-19. This assumption is supported by the fact that the SARS-CoV-2 virus enters cells by binding to angiotensin-converting enzyme 2 (ACE2) receptors, the number of which is significantly increased in the lungs of smokers [69, 70]. On the other hand, there is a hypothesis that a weakened immune response of smokers to virus infection may prove beneficial, as they would suffer less damage to lung tissue due to inflammation caused by intense cytokine release that often accompanies a normal immune response [71]. Statistical analyzes of clinical and epidemiological data have yielded contradictory results, so it cannot be said what assumption is most robust [68]. The positive correlation of smoking with COVID-19 transmissibility seems to be a novel result, and it supports the assumption that the virus is transmitted more efficiently in populations with a higher percentage of smokers.
Another result that addresses the association of unhealthy lifestyle with greater susceptibility to SARS-CoV-2 infection is the slight positive correlation we obtained for the prevalence of insufficient physical activity (IN) in adults, which is however not statistically significant. In this sense, in the case of COVID-19, we could not fully confirm the findings from [72], who found that physical activity significantly reduces the risk of viral infections.
Despite the recent media interest [73], our findings neither could confirm that BCG immunization has any beneficial effect in the case of COVID-19, at least as far as reducing the risk of contracting and transmitting the disease is concerned. While it is known that the BCG vaccine provides some protection against various infectious agents, unfortunately, there is no clear evidence for such an effect against SARS-CoV-2 [74]. Our analysis suggests that BCG immunization simply does not correlate with SARS-CoV-2 virus transmission.
Some of the previous research has indicated that the blood type may influence the course of COVID-19 illness. SARS-CoV-2 target cells are typically capable of synthesizing ABH antigens, which is why the proteins from the viruses replicated in these cells can be tagged with an antigen of the appropriate blood group. Interference of anti-A antibodies with the binding of a SARS-CoV-2 virus synthesized in a cell expressing A antigen to the ACE2 receptor has been shown experimentally [75]. This can explain the possible higher frequency of severe clinical picture in patients with A blood group [76], since in these patients, on the one hand, viruses enter cells freely by binding to ACE2 receptors, and on the other hand, these patients are generally characterized by the increased inflammatory activity of ACE [77]. It is assumed that the same mechanism applies to anti-B antibodies, i.e. B antigens, which would make people with blood type O most protected from exacerbation of COVID-19 disease [75]. While the results of epidemiological studies on COVID-19 patients mostly support the proposed effect of blood groups on the development of COVID-19 disease, the relationship between virus transmission and blood group prevalence and Rh phenotype has been significantly less studied. Our analysis showed strong positive correlations of virus transmission with the presence of A blood group and Rhphenotype, as well as strong negative correlations for B blood group and Rh + phenotype, while for AB and O blood group no significant correlations were obtained (Fig. 5C). This result coincides significantly with the correlations obtained in a study conducted for 86 countries [78]. However, another study focused on hospitalized patients in Turkey reported that the Rh + phenotype represents a predisposition to infection [79], contradicting our findings. Similar results regarding the Rh factor were obtained in a study [80] on hospitalized patients in the US (this study further reported no correlation of blood types with the severity of the disease). One way to reconcile these results with ours would be to speculate that the virus is more efficiently transmitted in a population with a higher proportion of Rhphenotype because these people show a milder clinical picture compared to Rh + , so their movement is not equally limited, which is why they have more ability to pass on the infection. It should be noted that the average blood type in the country, similar to the average population density, may not accurately reflect the population status in a particular region where the epidemic has spread, so additional research is needed to more clearly establish the link between blood type and virus transmission.
Our data (Fig. 6A) shows a strong negative correlation with the date of the epidemic onset. Curiously, it seems that the later the epidemic started in a given country, the slower disease expansion is more likely. Instead of interpreting this result as an indication that the virus has mutated and changed its properties in such a short period, we offer the following simpler explanation: pandemic reached first those countries that are most interconnected with the rest of the world (at the same time, those are the countries characterized by great mobility of people overall), so it is expected that also the progression of the local epidemics in these countries is more rapid.
Another segment of our interest were air pollutants, shown in Fig. 6B. Air pollution can have a detrimental effect on the human immune system and lead to the development (or to worsening) of the respiratory diseases, including those caused by respiratory viral infectionsas shown by numerous previous studies, some of which were also experimental [81,82]. Several papers have already investigated air pollution in the context of COVID-19 and reported a positive correlation between the death rate due to COVID-19 and the concentration of PM2.5 in the environment [83,84]. Positive correlations were also found for the spread of the SARS-CoV-2 virus, but mainly by considering daily numbers of newly discovered casesa method that, as we have already argued, may strongly depend on testing policies, as well as on state measures to combat the epidemic [81]. As for the proposed This is a provisional file, not the final typeset article mechanisms of interaction, it has been shown that SARS-CoV-2 virus RNA can be adsorbed to airborne particles and it has been suggested that in this state, under stable windless conditions, viruses could spread over greater distances [85,86]. However, an examination of air samples in Wuhan has indicated little importance for this mechanism of virus transmission in the open, while the very opposite effects should be expected indoors [87,88]. This conclusion concurs with the results of a study in which no correlation was obtained between the basic reproductive number of SARS-CoV-2 infection for 154 Chinese cities and the concentration of PM2.5 and PM10 particles, while the correlation of these factors with the death rate (CFR) was shown [89]. The statistically insignificant and relatively weak correlations we obtained for PM2.5 and PM10 pollutants also do not support the hypothesis of a potentially significant role of these particles in the transmission of this virus. In contrast, significant positive correlations were shown by our analysis for concentrations of NO2, SO2, and CO in the air (although the correlation for CO is not statistically significant), which is generally supported by the results of other studies. For example, a positive correlation of NO2 levels with the basic reproductive number of infection was obtained from data for 63 Chinese cities [90]. Also, it has been shown that the number of detected cases of COVID-19 in China is strongly positively correlated with the level of CO, while in Italy and the USA such correlation exists with NO2 [91]. The mentioned study failed to establish a clear correlation with the level of SO2. These pollutants can cause and intensify inflammatory processes, as well as weaken the human immune system and make it more susceptible to various pathogens [92]. High NO2 air pollution can suppress the production of endogenous nitric oxide, which prevents the replication of the respiratory syncytial virus in the lungs. Also, it is important to emphasize that the atmospheric concentration of NO2 strongly depends on the levels of local exhaust emissions, so its correlation with virus transmission can be interpreted by the connection with the urban environment, characterized by more intensive traffic [93].
Finally, we have also obtained some interesting correlations of the meteorological parameters with R0, shown in Fig.s 6C and 6D. The most basic and yet potentially the most significant weather factor that is widely investigated in the context of the COVID-19 pandemic is, of course, temperature. Humidity also seems to have important effects on the spread of COVID-19, on par with the temperature. The statistically very highly significant negative correlation of the basic reproductive number of SARS-CoV-2 virus infection with both the mean temperature and humidity obtained in our research (Fig. 6D) is consistent with the results of other relevant papers, e.g. [94]. For example, a similar correlation was obtained in a study that analyzed COVID-19 outbreak in the cities of Chile -a country that covers several climate zones, but where it is still safe to assume that social patterns of behavior and introduced epidemic control measures do not drastically differ throughout the country [5]. Effectively the same conclusionthat fewer COVID-19 cases were reported in countries with higher temperatures and humiditieswas reached in a study covering over 200 countries in the world [95].
While an established correlation between virus transmission and a certain factor is not, in general, a telltale sign of a direct causal relationship between them, in the case of temperature and humidity such connection is firmly indicated also by results of experimental research. For example, the spread of the influenza virus, which occurs similarly to the spread of SARS-CoV-2 [96], is boosted by low temperature and low humidity, probably due to greater virion stability and easier spread of aerosols under these conditions [97]. Similarly, another virus in the coronavirus family, SARS CoV, has been shown to lose viability at high temperatures and humidity [98], and these findings are supported by experiments on similar viruses, e.g. on transmissible gastroenteritis virus (TGEV) and mouse hepatitis virus (MHV) [99].
Therefore, the conclusion about negative correlations of air temperature and humidity with the virus transmissibility stands out as a robust result in the literature, as it is supported by findings of numerous theoretical analyses of COVID-19 pandemic data, and experimental results on viruses that share similarities in structure and/or pathogenesis with SARS-CoV-2. However, we still note that studies of correlations of these factors with the basic reproductive number in several cases have yielded different conclusions, most likely due to the method of calculating R0 or due to choosing a small/uninformative sample of populations in which the number of infected cases was monitored [100][101][102]. For example, a study focused on the suburbs of New York, Queens, obtained a positive correlation between virus transmission and temperature, which seems unexpected given the prevailing observations of other studies [103]. This result is most likely a consequence of analyzing data for a small area (Queens only) where the temperature varies in a relatively narrow range of values, as well as correlating the number of detected cases, which may be sensitive to variations in the testing procedure.
Another environmental agent that can destroy or inactivate viruses is UV radiation from sunlight, and the properties of a particular virus determine how long it can remain infectious when exposed to radiation. For example, epidemics of influenza have a seasonal character precisely due to the susceptibility of influenza viruses to UV radiation [104]. Our analysis found, at very high statistical significance, a strong negative correlation between the transmission of SARS-CoV-2 virus and the intensity of UV radiation, which is consistent with the results of other studies obtained for the cities of Brazil and the provinces of Iran [105,106]. It is worth mentioning that lower temperatures, humidity, and sunlight levels usually occur in combination and directly affect not only the virus but also the human behavior, so the observed higher transmission of the virus in such conditions can alternatively be interpreted by indirect effects of other factors that act together in cold weather, such as more time spent indoors where the virus spreads more easily, or weakening of the immune system that increases susceptibility to infections [107].
While the results related to COVID-19 correlations with temperature, humidity, and UV radiation are fairly frequent in the literature, this is less so for the results on the precipitation levels. Very few other studies have examined the association of precipitation with SARS-CoV-2 transmission, with either no correlation found [52], or looking at precipitation as a surrogate for humidity and generally receiving a negative correlation with infection rate [56,108]. Our results, however, shown in Fig. 6D, confirm natural expectations: the data reveals a strong negative correlation of the total precipitation with R0, only slightly lower than in the case of T, H, and UV, at a very high level of statistical significance. Such results also concur with some general conclusions about the behavior of similar viruses: a negative correlation between the rate of respiratory syncytial virus infection and the amount of precipitation expressed in millimeters was shown in a one-year study on over 1000 infected children in India [109,110].
Our analysis did not reveal any statistically significant correlation either between the wind speed or between air pressure and SARS-CoV-2 transmissibility. In the case of wind speed, this result agrees with the findings in some other papers [111,112]. A positive correlation of wind speed with COVID-19 transmissibility was obtained in a study in Chilean cities, but, as the authors themselves note, the interpretation of the effect of this factor is complicated by its observed significant interaction with temperature [5]. The role of wind in transmitting the virus to neighboring buildings is predicted by the SARS virus spread model within the Amoy Gardens residential complex in Hong Kong, but such an effect may relate to local air currents and virus transmission over relatively short distances and does not imply a correlation of mean wind speeds in the area with virus transmission [110,113]. As for the air pressure, the potential connection is hardly at all investigated in the literature. An exception is a study [114] reporting a positive correlation of air pressure with the number of COVID-19 cases in parts of Mozambique, but our results do not confirm such a conclusion (on the contrary, our results rather speak in favor of negative correlation, though they lack statistical significance). 20 This is a provisional file, not the final typeset article

Conclusion
While there is by now a significant amount of research on a crucial problem of how environmental factors affect COVID19 spread, several features set this analysis apart from the existing research. First is the applied methodology: instead of basing analysis directly on the number of detected COVID-19 cases (or some of its simple derivatives), we employ an adapted SEIR model to extract the basic reproduction number R0 from the initial stage of the epidemic. By taking into account only data in the exponential growth regime, i.e. before the social measures took effect (as explained in the Methods section), we ensured that the correlations we have later identified were not confounded with the effects of local COVID-19 policies. Even more importantly, our method is also invariant to variations in COVID-19 testing practices, which, as is well known, used to vary in quite an unpredictable manner between different countries. Another important factor is the large geographical scope of our research: we collected data from 118 countries worldwide, more precisely, from all the countries that were above a certain threshold for the number of confirmed COVID-19 cases (except for several countries with clearly irregular early growth data). The third factor was the number of analyzed parameters: we calculated correlations for the selected 42 different variables (of more than a hundred that we initially considered overall) and looked for viable interpretations of the obtained results. It is the combination of all three factors that we believe makes this study unique, comprehensive, and reliable.
These results should also help in resolving some of the existing disputes in literature. For example, our findings indicate that correlation of HDI with R0 is not a consequence of the COVID-19 testing bias, as was occasionally argued. Of the opposing opinions, our data seem to support assertions that blood types are indeed related to COVID-19 transmissibility, as well as arguments that the higher prevalence of smoking does increase the virus transmissibility (though weakly). On the other hand, in the dispute about the effects of the pollution, our correlations give an edge to claims that there is no correlation between PM2.5 and PM10 particles and transmissibility (whereas we agree with the prevailing conclusions about the positive correlation of other considered pollutants). In the case of the effects of the wind, based on the obtained results we tend to side with those denying any connection. In certain cases our findings contradict popular narratives: there are no clear indications that either number of refugees or physical inactivity intensifies the spread of COVID-19. Unfortunately, our data also diminishes hopes that BCG immunization could help in subduing the epidemic. Additionally, the obtained correlations hint to possible new alleys of research, e.g. those that would help us understand the connection between cholesterol levels and SARS-CoV-2 transmissibility.
Overall, we strongly believe that the presented results can be a significant contribution to the ongoing attempts to better understand the first pandemic of the 21 st century -and the better we understand it, the sooner we may hope to overcome it.

Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.   Table S1. Demographic factors.  Column explanations: CH -cholesterol level; ALC -alcohol consumption per capita; OB -prevalence of obesity; RBPpercentage of people with raised blood pressure; SM -percentage of smokers; CD -severity of COVID-19 relevant chronic diseases in the population; IN -prevalence of insufficient physical activity among adults; BCG -BCG immunization coverage among 1-year-olds.