Spatial autocorrelation and the dynamics of the mean center of COVID-19 infections in Lebanon

In this paper we study the spatial spread of the COVID-19 infection in Lebanon. We inspect the spreading of the daily new infections across the 26 administrative districts of the country, and implement Moran's $I$ statistics in order to analyze the tempo-spatial clustering of the infection in relation to various variables parameterized by adjacency, proximity, population, population density, poverty rate and poverty density, and we find out that except for the poverty rate, the spread of the infection is clustered and associated to those parameters with varying magnitude for the time span between July (geographic adjacency and proximity) or August (population, population density and poverty density) through October. We also determine the temporal dynamics of geographic location of the mean center of new and cumulative infections since late March. The results obtained allow for regionally and locally adjusted health policies and measures that would provide higher levels of public health safety in the country.


Introduction
The spread of COVID-19 pandemic has practically affected the entire planet, and created enormous challenges on every aspect of human life and organization, starting with the health sector and with far reaching consequences on the economy, education, sports, transportation and politics. Since the first cases were registered in Wuhan, China in December 2019 [1], the global spatial dynamics of the infection have been changing as the disease swiftly moved towards the West [2] into Europe then into the United States, South America, and eventually to the whole world, with nearly 38.1 million cases and 1.1 million deaths registered until October 12, 2020 [3].
Given the global geographic spread of the virus and the local wide spread in many countries, and the nature of the transmission of the virus, it is important to understand the spatial mechanisms of this spread and its dependence on proximity, demographics and social characteristics of infected areas. Spatial analysis provides a better understanding of the routes of transmission of infections [4], consequently, it allows the decision-makers to draft and implement effective health and mitigation measures to reduce risks associated with the pandemic.
In Lebanon, the first case was registered on February 21, 2020 [5] and by October 12, 54624 cases and 466 deaths were registered [6]. The first few weeks witnessed a relatively rapid increase but it sharply declined as a result of the strong mitigation measures enforced by the beginning of March. The lift of the international travel ban and the partial easing of measures led to the revival of higher spread rates since July. Only 1788 cases were registered by July 1, 2020 before a sharp rise from July through October. The cases were mainly concentrated in Beirut, its suburbs and its neighboring areas in Mount Lebanon. On August 4, a huge explosion rattled the port of Beirut and destroyed thousands of houses and buildings in the surrounding areas. People were rushed into hospitals, with thousands of injuries recorded on that day [7]. On such a horrible incident, hundreds of volunteers and civil defense teams were involved in rescue work for several days. The social distancing measures were largely neglected in such an emergency situation. The spread accelerated in the upcoming weeks, with sharp rise in Beirut and its surroundings and with a national widespread reaching all regions and major towns and cities [8].
Related Literature: Spatial autocorrelation is the statistical analysis of data studied in space or in space-time aiming for the identification and estimation of spatial processes [9,10]. It has been implemented to study and analyze the spread of various diseases and infections including cancer, diabetes, SARS, influenza virus, COVID-19, etc... [11,12,13,14,15]. Recent studies also inspected the effect of city size, population, transportation systems and demographics on the disease spread and its mortality rate [16,17,18,19,20]. The determination of the mean center of a population (centroid) was discussed in [21,22,23] and extending the concept to the determination of the mean center of wealth and infections allowed for a spatial analysis of the temporal dynamics of wealth distribution, economic growth and infectious diseases [24]. The dynamics of the outbreak of COVID-19 in Lebanon and its reproduction number dynamics were studied in [25,26,27].
In this paper, we study the clustering and spatial progression of new infections in Lebanon by applying the methods of spatial autocorrelation with different model parameterizations of geographic, demographic and social variables including adjacency, prox-  imity, population, population density, poverty rate and poverty density. Locating the mean center of the epidemic spread as a function of time is used to analyze the temporal geographic development of the spread. The obtained results provide a solid basis for the concerned policy makers to draw well-grounded and scientifically based local and regional measures that would contribute to controlling the infection spread.
The paper is organized as follows: in section 2 we introduce the implemented analytic mathematical and statistical methods and tools. Results are presented and discussed in section 3, and section 4 concludes the paper.

Moran's I index
Moran's I index is an inferential statistic used to measure the spatial autocorrelation based both on locations and feature values simultaneously. It is defined as [9]: where W ij represents different types of adjacency between region i and region j, corresponding to different models of infectious spread. N is the number of regions under consideration and X i represents the number of new daily infections in district i.X is the average number of new daily infections per region, and it is given byX = Σ i X i N . The numerical outcome of I falls between −1 and 1 and it indicates whether a distribution is dispersed, random or clustered. A value of I close to 0 indicates a random distribution, while positive values indicate clustered spatial distribution and negative values indicate dispersion. Larger values of |I| nearer to 1 mean stronger clustering (positive I) or stronger dispersion (negative I).
The z I -score associated to this statistic is defined by: where the expected value E[I] and the variance V [I] are defined in the Appendix. The z-score or the corresponding p-value of the statistic are used to reject the null hypothesis and eliminate the possibility of a random pattern leading to the obtained value of the Moran I statistic.
In this paper, we take a 95% confidence level corresponding to |z I | > 1.96 or equivalently to p < 0.05 in order to confirm the outcome of clustering or dispersion of our spatial data indicated by I. In this case we say that the p-value is statistically significant, and based on the value of I we can determine the pattern of the distribution. We consider a model with six different cases of parameterization of the adjacency matrix W ij corresponding to geographic adjacency (case I), proximity (case II), population (case III), population density (case IV), poverty rate (case V) and poverty density (case VI). Table 1 summarizes relevant data from the Lebanese districts.
In casel I, we take W ij = 1 for districts sharing common borders and W ij = 0 otherwise, while in case II we determine W ij = 1 d ij where d ij is the driving distance between the administrative centers of regions i and j. Those two cases study the effect of administrative adjacency and the distance proximity of different districts on the geographic clustering of new infections in Lebanon.
In case III and case IV, we apply the methods used in ( [4,28]) to analyze the effects of population and population density on the spread of the disease since the virus is carried by people and its spread is supposed to be related to their population and interaction. We sort the districts by the number of their residents (obtained from [29]) and then by the density of their residents relative to their areas. In these two cases, districts of consecutive populations and population densities are assigned a factor of W ij = 1, and W ij = 0 otherwise. This provides a statistic about the clustering of infections according to population and population density respectively.
Lastly, in cases V and VI, we introduce new parameters, namely the poverty rate and the poverty density in different districts and we analyze their effect on infection clustering. We sort the distritcs by their rates of poverty and poverty density [29] and assign W ij = 1 for regions of consecutive order of poverty rate or poverty density, and W ij = 0 otherwise, in a similar methodology to cases III and IV in order to infer the effect of poverty rate and density on the geographical patterns of infection spread.

Mean center of infection
The mean center of infection (henceforth MCI) is a geographic location that represents the weighted mean of the positions of infected individuals on the surface of Earth, assumed to be spherical. Assigning the value of Earth's radius to unity, the two spherical coordinates that determine the unique position of a point are its latitude λ i and longitude φ i . The latitude is a measurement of location north or south of the equator while the longitude is a measurement of location east or west of the prime meridian at Greenwich, UK.
The Cartesian position vector − → r i = (x i , y i , z i ) is related to spherical coordinates with unit radius by the relations [30]: We denote the district number of infections (new or cumulative) by X i as defined above, and the Cartesian positions of the administrative centers by (x i , y i , z i ) . Then, the Cartesian position of the weighted mean of infections − → r i is given by: As suggested by [21], the precise position on the surface of a sphere can be determined from the normalized position vector defined by Consequently, we can recover the spherical position of the mean center of infections by calculating the mean latitude and longitude as: The latitude and the longitude can be located and plotted on maps and geographic information systems. We employ the spherical coordinates of geographic locations of This provides a tool to analyze the temporal dynamics of the mean geographic spread of the disease.

Results and discussions
The determination of the Moran's I index and its corresponding p-value for the effect of adjacency and proximity of cases I and II on the clustering of daily new infections of COVID-19 in Lebanon shown in Figure (2), leads to the conclusion that since July 2020, there is strong clustering of infections in regions sharing common borders and among nearer regions. There were only few days when new infections were not clustered in adjacent regions, and only one day where distance was not shown to be a detrimental  Figure 4: Moran's I index and its corresponding p-value for cases V and VI accounting for regional poverty rate and poverty density. The poverty rate is not a decisive factor for spatial spread but poverty density contributes to spatial clustering since the end of August 2020. effect in the spatial spread of new cases. The maximum value of Moran's I reached 0.660 for case I and 0.380 for case II indicating a high level of geographic clustering of the disease spread since July. The infections before July had a high p-value, indicating a high probability for random geographic spread.
The results of the spatial spread dynamics in relation to population and population density adjacency as shown in Moran's I and p-value of cases III and IV depicted in Figure  (3) reveal that the spread was not clustered with respect to the regional population until late August 2020, where it started achieving a positive value of I with p < 0.05 indicating spatial clustering between regions of adjacent population rank, with several days showing a probability of random spread. The maximum attained I was 0.666. However, the statistics for districts with adjacent rank of population density show very strong spatial clustering since the middle of August with I attaining a maximum value of 0.832, which  is the highest among all six studied cases.
The results of case V (Figure 4) show that the spatial spread cannot be attributed to adjacent ranking of poverty rates among the districts since the p-values remain above the 5% level of confidence up until October 2020, hence no spatial clustering occurs. But when we consider the poverty density in case VI, we obtain positive values for Moran's I since the end of August, with p < 0.05 except for five days. Hence, spatial clustering among regions with adjacent ranking of poverty density occurs. The maximum attained I in this case is 0.666.
In comparison, we find out that clustering of new infections occurs starting on different dates between July and August for all considered cases except for case V corresponding to district populations. The strongest level of spatial clustering (highest I) occurs for model IV of population density after mid-August, while clustering associated to geographic adjacency and proximity (cases I and II) has the longest time span (since early July) and the highest levels of confidence.
The location of the MCI was determined as a function of time as shown in Figure  (5). The mean latitude and longitude of the infection were determined according to the methods described in equation (6). The location of the cumulative MCI is plotted on the geographic map of Lebanon during the same period in Figure (6), together with the mean center of population of the country. It started near the city of Jounieh, north-west of the mean center of population, but it has moved southward since May through August, where it started moving northward again. The location of the MCI of new infections was quite geographically distributed before July as the lower plot of Figure (5) shows, before becoming more homogenous afterwards.
The reproduction number R and the rate of the infection spread correlate with people's mobility [31]. Geographic clustering occurs because people's motion and local travel is higher in their close neighborhoods, especially in a country like Lebanon where with the absence of national public transportation throughout the country [32] diminishes nationwide mobility. Higher levels of social interaction among people in dense regions also contribute to the spread of the disease, and this has shown the strongest clustering effect. The study of the spread of the infection allows the relevant authorities to draw appropriate country-specific and regional measures to curb the spread.

Conclusion
In this paper we introduced the Moran's I index with its associated z-score and p-value to study the spatial autocorrelation of registered new infections of COVID-19 in Lebanon. We introduced six different cases of parameterization of the spread related to adjacency, proximity, population, population density, poverty rate and poverty density. We discovered that poverty rate is not statistically relevant to the spatial spread of the disease while geographic bordering, distance between district centers, number and density of residents and poverty density lead to clustering of the disease, with varying strengths and level of confidence since July and August through October. We also introduced methods to determine the geographic coordinates of the mean center of the infection, and determined this center since April 2020, and plotted its variations over time up until October. The understanding of the spatial, demographic and geographic aspects of the disease spread over time provides an essential basis for the relevant authorities to take more efficient decisions of local and inter and intra-regional measures, thus contributing to increased social and health safety and security in the fight against the pandemic.