Risk Assessment and Prediction of COVID-19 Based on Epidemiological Data From Spatiotemporal Geography

COVID-19 is a highly infectious disease and public health hazard that has been wreaking havoc around the world; thus, assessing and simulating the risk of the current pandemic is crucial to its management and prevention. The severe situation of COVID-19 around the world cannot be ignored, and there are signs of a second outbreak; therefore, the accurate assessment and prediction of COVID-19 risks, as well as the prevention and control of COVID-19, will remain the top priority of major public health agencies for the foreseeable future. In this study, the risk of the epidemic in Guangzhou was first assessed through logistic regression (LR) on the basis of Tencent-migration data and urban point of interest (POI) data, and then the regional distribution of high- and low-risk epidemic outbreaks in Guangzhou in February 2021 was predicted. The main factors affecting the distribution of the epidemic were also analyzed by using geographical detectors. The results show that the number of cases mainly exhibited a declining and then increasing trend in 2020, and the high-risk areas were concentrated in areas with resident populations and floating populations. In addition, in February 2021, the “Spring Festival travel rush” in China was predicted to be the peak period of population movement. The epidemic risk value was also predicted to reach its highest level at external transportation stations, such as Baiyun Airport and Guangzhou South Railway Station. The accuracy verification showed that the prediction accuracy exceeded 99%. Finally, the interaction between the resident population and floating population could explain the risk of COVID-19 to the highest degree, which indicates that the effective control of population agglomeration and interaction is conducive to the prevention and control of COVID-19. This study identifies and predicts high-risk areas of the epidemic, which has important practical value for urban public health prevention and control and containment of the second outbreak of COVID-19.


INTRODUCTION
As of October 15, 2020, there were 38,599,508 confirmed cases of COVID-19 and 1,093,548 deaths worldwide (Fan et al., 2021). The World Health Organization has classified the outbreak as a "global pandemic". The rapid and extensive spread of COVID-19 requires the consideration of as many factors as possible, and quickly responding to this major public health event poses a great challenge to the scientific community. Therefore, at the intersection of medicine, virology, geography, public administration and other disciplines, there is an urgent need to formulate accurate epidemic prevention policies (Yu et al., 2020).
Although China's COVID-19 epidemic has been effectively controlled with the joint efforts of the Chinese government and the Chinese people (Zhang et al., 2020a), the number of COVID-19 patients continues to show an upward trend. As the weather becomes cooler and virus activity increases, there are already signs of a second outbreak of COVID-19 (Gosavi and Marley, 2020). Therefore, assessing the risk of COVID-19 and simulating the areas at high risk of future COVID-19 outbreaks can contribute to early prevention and effective containment of a second outbreak of COVID-19 in advance (Thomas et al., 2020).
Since the outbreak of COVID-19, scholars have conducted numerous studies from the perspectives of pathological diagnosis (Xie and Zhu, 2020), drugs and vaccines , transmission relationships (Heidari et al., 2020), spatiotemporal models (Babac and Mornar, 2020), epidemic prediction (Wang et al., 2020a), transmission simulation (Werth et al., 2021), risk assessment (Jia et al., 2020), and epidemic impact (Du et al., 2020), and all of these studies have played a positive role in the prevention and treatment of COVID-19. In terms of epidemic risk assessment, Jia proposed a risk model of population mobility and conducted risk assessment of an epidemic by analyzing population mobility data (Jia et al., 2020). Du coupled a population mobility accumulation model and an exponential growth model (Xu et al., 2020a) to construct an epidemic model and assessed the epidemic risk using Tencent positioning data. Moreover, Pan divided the infection risk of COVID-19 in various states in the United States on the basis of mobile phone positioning data (Hâncean et al., 2020). Other scholars have evaluated the risk of COVID-19 in different countries and regions based on natural and social environmental factors (Chatterjee et al., 2020), and these evaluations based on the vulnerability of the region itself could also play a positive role in assessing the risk of the COVID-19 epidemic (Xu et al., 2020a). The abovementioned studies are mainly based on population mobility and assess the risk of the COVID-19 epidemic; however, the risk distribution of COVID-19 is determined by multiple urban spatial factors (Ribeiro et al., 2020).
In terms of epidemic prediction, statistical and dynamic models are often used to estimate future cases and infection trends. Statistical models include methods such as linear regression analysis (Chatterjee et al., 2020;Piovella, 2020;Cartenì et al., 2020), time series analysis and statistical process control (Feroze, 2020;Zhang et al., 2020b). Statistical models are generally applied to detect and provide an early warning of COVID-19 outbreaks. Since infectious disease theory is not involved here, only short-term predictive analysis can be performed (Polo et al., 2020). Dynamic models can be divided into several basic types, such as susceptible-infected (SI), susceptible-infected-susceptible (SIS), susceptible-infectedrecovered (SIR), and susceptible-exposed-infected-recovered (SEIR) models, based on the characteristics of pathogens, infectious agents, post infection immunity, the source of infection, the route of transmission, and susceptible populations (Yawney and Gadsden, 2020). Moreover, as dynamic models take into account the factors influencing disease transmission and related social factors, they can effectively reveal the trends of the epidemic and change course of the disease . However, these basic dynamic models hardly consider the significant differences among geographical units and dynamic changes in populations, which makes it difficult for these models to support refined risk assessment and simulation by epidemic prevention departments at all levels from single-scale to multiscale coordination (Liu and Mesch, 2020).
In-depth studies have been carried out in different countries and regions on the global spread, modeling and understanding of COVID-19, among which studies from Italy and Romania have demonstrated the necessity to develop new routes between EU countries to contain the spread of the epidemic in the early stages of the outbreak (Hâncean et al., 2020). Studies from Brazil have shown that there are differences in morbidity and mortality between large and small cities and that different age compositions and distributions of health infrastructure all have important effects on COVID-19 (Ribeiro et al., 2020). In Kenya, studies have taken the perspective of household energy and food security during the COVID-19 period, and a sustainable development model during the COVID-19 period has been obtained (Shupler et al., 2021). Norway, on the other hand, has determined national containment strategies depending on the characteristics of a given city during similar crises by analyzing its urban working environment and migration patterns (Venter et al., 2020). India, currently the country with the highest risk of COVID-19, has analyzed the impact of a national lockdown on the urban air quality during COVID-19 (Navinya et al., 2020). Some scholars in the United States have established an early warning and evaluation model based on the responsibility system by using city-related indicators of COVID-19 and performed experimental verification of the epidemic in 17 major cities in the country . Based on the existing models and understanding of the spread of the pandemic in different countries and regions, it can be concluded that developed countries and regions such as the United States and Europe are more concerned about the impact of COVID-19 on the existing urban living environment (Kan et al., 2021), while developing countries and regions such as Southeast Asia and Africa are more concerned about the impact of the pandemic on urban public health resources (Zvobgo and Do, 2020), which illustrates the differences in the level of development among these different countries and regions. Therefore, as China is the largest developing country in the world, studies on the transmission, modeling, and understanding of COVID-19 in China should explore the urban environmental factors that influence the distribution and transmission of COVID-19, taking into account urban public health resources. Such research is likely to be of great regional value (Hou et al., 2021).
In the studies on the early outbreak of COVID-19 and the cross-regional transmission of it, location characteristics of geographical space occupy a large proportion, mainly because there are huge differences in the spatial variability and aggregation degree of COVID-19 infection rate and mortality rate in different countries (Khavarian-Garmsir et al., 2021). However, although some studies have analyzed the heterogeneity of the geospatial distribution of patients with COVID-19, few studies have considered the spatio-temporal variation of confirmed patients with COVID-19 in geospatial space (DuPre et al., 2021). It has been found in previous studies on spatial epidemiology that urban geospatial factors have a strong spatiotemporal effect on the transmission of viruses, including the analysis of the possibility of infectious epidemics from the perspective of the degree of population aggregation in geographic space (Hasselwander et al., 2021). Therefore, the study on the risk distribution of COVID-19 in urban space should carefully consider its spatial and temporal characteristics (Mansour et al., 2021), and analyze the geospatial relationship between communities with different levels of infection and population agglomeration (Hassan et al., 2021), so as to reveal the spatiotemporal changes of COVID-19 in geographical space (Kwok et al., 2021). Spatiotemporal geographic epidemiological data, including cellular signaling data (Xiao et al., 2019) , (Zhan et al., 2021), population flow data (He et al., 2020;Zhang and Yuan, 2021), and urban point of interest (POI) data, etc. Mahajan et al., 2021). In a word, these spatiotemporal geographic data can represent the characteristics of epidemic risk in urban space, providing a new research perspective and solution to problems related to epidemic risks in relation to urban geography (Bachir et al., 2019;Sharifi and Khavarian-Garmsir, 2020). Compared with statistical survey data about the epidemic, spatiotemporal geographic epidemiological data have spatiotemporal continuity, and their strong data volume, analysis and processing mode, display capability and other advantages greatly compensate for the insufficient amount of statistical survey data in research on epidemic analysis (Silva et al., 2018 ;Alsunaidi et al., 2021). Therefore, spatiotemporal geographic epidemiological data can play an important auxiliary role in assessing and simulating COVID-19 risk .
In recent years, with the development of computer technology, machine learning and deep learning have gradually been applied to relevant research on cities and have achieved good results (Milojevic-Dupont and Creutzig, 2021 ;Wang et al., 2020b). The goal of machine learning is to obtain patterns from existing data samples and to then analyze and predict based on the patterns obtained. Logistic regression (LR) models are among the classic models of machine learning (Cao et al., 2020), and they have advantages related to the objective methods and rigorous calculations involved. Compared with linear regression (Yuchi et al., 2019;Sharifi and Khavarian-Garmsir, 2020), gradient neural network-convergence analysis (GNN-CA) (Aarthi and Gnanappazham, 2018), cellular automata and other simulation algorithms , LR is simpler and more efficient in terms of the variables and normality assumptions, and it provides a new solution path for studies on urban decision-making and simulation (Siddiqui et al., 2018).
Accurately assessing and predicting the distribution of high and low risks of COVID-19 is crucial for epidemic prevention and the control of a second outbreak of the epidemic in Guangzhou, which is one of the cities with the largest permanent population and floating population in China (Granella et al., 2021). Taking Guangzhou as an example, this study assesses and simulates the COVID-19 risk from the perspective of geography using machine learning and spatiotemporal geographic epidemiological data. The mechanism and impact of various spatial factors on COVID-19 are discussed, and the assessment and simulation results are verified (Yorio and Moore, 2018). Compared with the existing studies on the epidemic risk, this study has the advantage of smaller scale by evaluating and simulating the high and low risk distribution of the epidemic in Guangzhou through machine learning, which enables the epidemic risk distribution fed back to geographical units more refined, and the epidemic risk analysis based on urban geospatial factors can be greatly conducive to epidemic prevention and control in urban space. At the same time, using geographic detectors to analyze the primary and secondary factors affecting the distribution of epidemic risk level in urban space has important practical significance for the formulation of epidemic prevention and control policies and urban public security.

Study Area
The research area is Guangzhou, Guangdong Province, China ( Figure 1). As one of the most urbanized and modernized cities in China, Guangzhou has 11 districts with a total area of 7,434.4 square kilometers. According to the Statistical Bulletin of The National Economic and Social Development of Guangzhou 2019 released by the Bureau of Statistics of Guangzhou Municipality on March 6, 2020, the permanent resident population of Guangzhou reached 15.3059 million in 2019. Guangzhou is one of the cities with the largest permanent population and floating population in China; thus, the assessment of the risk of COVID-19 conducted in this study can not only help to understand the areas in Guangzhou at high risk of a COVID-19 epidemic but also provide a decision-making basis for COVID-19 prevention and control nationwide.

Data Introduction
Spatiotemporal geographic epidemiological data about the epidemic should be directly or indirectly used to monitor and analyze this disease. According to the "triangle" theoretical model of public security (Wang et al., 2020c), public security consists of four parts: the emergency, the disaster carrier, emergency management and disaster elements. The emergency is the disaster itself, the disaster carrier refers to the people and things affected by the emergency when the emergency occurs, and the disaster factor is the factor inducing the occurrence of the emergency. Based on the analysis of the emergency, the disaster carrier and disaster factors, the whole process of an emergency, from occurrence and development to disaster formation and emergency measures, can be controlled.
In this study, the emergency is COVID-19, and the data on COVID-19 come from the National Health Commission of the People's Republic of China. The data mainly include the number of people infected with COVID-19 in Guangzhou in 2020 and the geographic location of the disease as announced by the committee. The disaster carrier is the entire population of Guangzhou affected by COVID-19, including the floating population and permanent population. Here, the data on the floating population are derived from Guangzhou population heat map data in January, February, and August 2020, combined with the average monthly data from January to August obtained from Tencent-migration data, while the permanent resident data are obtained from the 2019 Statistical Yearbook of Guangdong Province. Disaster factors mainly refer to factors that induce and spread COVID-19, including the main public places where people communicate and gather in cities, such as hospitals, fever clinics, life markets, supermarkets, hotels, restaurants, schools, administrative centers, cultural exchange places, etc. (Stevens et al., 2021), These places play an important role in the flow of urban elements, so they have also become the main places for COVID-19 transmission within cities (Rousseau and Deschacht, 2020). After the outbreak of the COVID-19, Chinese government implements a strict isolation policy by closing schools, administrative units, public services and other places, which restricts the communication and interaction of people in these public places . In addition, the development of a series of online remote interaction modes such as online teaching and online office has further reduces the level of epidemic risk in these areas (Wu et al., 2021a). Therefore, combining the existing literature and China's current epidemic prevention policy , the distance from fever clinic, the distance from living market, the distribution density of supermarket, the density of isolated hotel, the distribution density of catering, and the location distance from traffic station are selected by this study as the disaster factors, which were all screened through and obtained from Guangzhou POIs in 2020.
Based on the "triangle" theoretical model of public security, the following spatiotemporal geographic epidemiological data related to COVID-19 are determined in this study: the fever clinic distance, population flow, supermarket distance, COVID-19 distribution, population density, shopping mall density, restaurant density, public transit station density, and hotel density. Since this study analyzes the risk distribution level of the epidemic based on urban geographic space, the spatial resolution of the data in this study is the study scale unit (the spatial resolution of the data used in this study is unified as 25 × 25 m). The high-precision research unit scale also makes the simulated epidemic risk distribution more refined.

1) After cleaning and duplicate checking of the POI data of
Guangzhou obtained from the AMap application FIGURE 1 | Study area (the study area is Guangzhou City, Guangdong Province, China, which is located on the southern coast of China and is one of the cities with the highest level of urbanization and modernization in China).
Frontiers in Environmental Science | www.frontiersin.org July 2021 | Volume 9 | Article 634156 programming interface (API), it is found that the total numbers of supermarkets, hotels, shopping malls, public transit stations and restaurants in Guangzhou in 2020 are 27,738,16,134,24,686,57,882 and 15,009, respectively. There are 102 fever clinics announced by the government. The Euclidean distances to fever clinics (Wu et al., 2021b), public transit stations, and shopping malls and the densities of supermarkets, hotels and restaurants are calculated, and the results are shown in Figure 2. 2) Population data preprocessing: The population data are divided into resident population data and floating population data. The floating population data comprise Tencent-migration data as population flow change data. Tencent-migration data can be obtained from Tencent's positioning big data service window (http://heat.qq.com/ index.php). Based on the analysis of the user location information of the user positioning by Tencent's multiple app programs, Tencent-migration data with a spatial resolution of 25 m × 25 m are obtained. The average monthly Tencent-migration data for January, February and August 2020 are obtained from the Tencent API ( Figure 3B-D).
The permanent population data come from the 2019 Guangzhou Statistical Yearbook. In 2019, the permanent population of Guangzhou was 15.3059 million, which is consistent with the spatial resolution of the floating population data obtained through resampling ( Figure 3A). 3) COVID-19 data: The COVID-19 data come from the National Health Commission of the People's Republic of China (http:// www.nhc.gov.cn/). As of the end of February 2020, there were no significant cumulative new COVID-19 infections in Guangzhou. The cumulative number of COVID-19 infections in January and February 2020 was 137 and 209 cases, respectively, and the spatial resolution was found to be consistent with the floating population data through calibration sampling of their incidence locations; Figure 4 illustrates the results.

Logistic Regression
As one of the classic methods of machine learning (Lai et al., 2021), LR can build a linear regression based on the sigmoid FIGURE 2 | POI data preprocessing results [(A-F) are Catering density, Market distance, Hotel density, Quotient hyperdensity, Traffic density, and Fever outpatient distance of the spatiotemporal geographic epidemiological data of Guangzhou; the color in the figure ranges from blue to red, indicating density and distance value ranges from low to high].
Frontiers in Environmental Science | www.frontiersin.org July 2021 | Volume 9 | Article 634156 5 function, and with the help of an LR model, it is possible to further explore the relation between independent and dependent variables and to quantitatively analyze the probability of disaster events. Compared with models such as support vector machines (SVMs) and neural networks, LR models have great advantages in training and recognition time, with probability results ranging from 0 to 1, which are easier to interpret (Cheng and Masser, 2003). An LR model is meaningful only when the independent variable is significant. Therefore, the relationship between the occurrence probability of COVID-19 and explanatory factors can be expressed as follows: where P represents the occurrence probability of COVID-19 on a spatiotemporal geographic scale, which is in the range of [0,1]. The closer the value of P is to 1, the higher the probability of COVID-19 occurring in the area; the closer the value of P is to 0, the lower the probability of COVID-19 occurring in the area. Z stands for a linear combination. Therefore, the fitting equation involved in LR is as follows: The technical route of LR model evaluation is shown in Figure 5.

Geographic Detectors
According to the first law of geography, everything is interrelated, and the degree of correlation changes with the change in distance (Luo et al., 2019). In geographic space, it can be assumed that if an independent variable has a significant influence on the dependent variable, then the spatial distributions of the independent variable  Frontiers in Environmental Science | www.frontiersin.org July 2021 | Volume 9 | Article 634156 7 and the dependent variable should be similar in geographic space. A geographic detector is a statistical method based on the spatial variance analysis theory proposed by Wang Jinfeng et al (Fan et al., 2020). The detector can be used to detect the degree of spatial differentiation of different impact factors in geographic space and to verify the coupling of the spatial distribution of two variables as well as the possible causal relationship between the variables (Li et al., 2017).

1) Factor detector
The spatial differentiation degree of COVID-19 detection and the extent to which risk factors explain the spatial differentiation of COVID-19 can be represented by q, and the expression of the factor detector can be expressed as follows: where h 1 . . . L stands for the state of risk factors for COVID-19, while N h and N stand for the number of units in layer h and the whole study area, respectively. σ h 2 and σ 2 represent the variances in layer h and the risk factors in the whole study area, respectively. SSW and SST represent the within-sum of squares and the total sum of squares, respectively. The value range of q is [0,1], and the larger the value is, the more obvious the spatial differentiation of COVID-19 in geographic space. In addition, the larger the value of q is, the stronger the explanatory power of the risk factor for COVID-19 in geographic space, and vice versa.
A simple change in the q value satisfies the noncentral F distribution: where ƛ stands for the noncentral parameter and \overline{Y} stands for the mean value of layer h. Eq. 5 can be used to determine whether the q value is significant.

2) Interaction detector
To identify the interactions between different risk factors, X n assesses whether the explanatory power of the spatial distribution of COVID-19 will be strengthened or weakened when the X 1 and X 2 factors work together; that is, it assesses whether the impacts of these risk factors on COVID-19 are independent of each other. After calculating qX 1 , X 2 and then calculating the value of q(X 1 ∩ X 2 ) of the two and comparing them with qX 1 , X 2 , the relationship between the two risk factors can be divided into the following categories (Table 1).

3) Risk detector
Whether there is a significant difference between the mean value of the attributes of the two subintervals is detected, and the t\ statistic is used for testing: where Y h stands for the mean value of the attributes in subregion h, which, here, represents the incidence of COVID-19; n h stands for the number of samples in subregion h; and Var stands for the variance. The t statistic approximately obeys Student's distribution, and the calculation method of the degrees of freedom is as follows: there is a significant difference between the mean value of the attributes of the two self-fetching parts.

4) Ecological detector
Whether the two impact factors X 1 and X 2 have significant differences in the spatial distribution of attribute Y is compared and measured by the F statistic: where N X1 and N X2 represent the sample sizes of risk factors X 1 and X 2 , respectively; SSW X1 and SSW X2 represent the sum of the intralayer variances in the layers formed by X 1 and X 2 , respectively; and L1 and L2 represent the number of levels of risk factors for X 1 and X 2 , respectively. If SSW X1 and SSW X2 are equal, the spatial distribution effects of risk factors X 1 and X 2 are significantly different.

Logistic Regression Model Training
On the basis of COVID-19 data from January and February 2020 and floating population data from January, February and August 2020, COVID-19 infection areas were divided, and positive and negative sample construction data sets were built. Since the nine spatial factors used in this study may show multicollinearity, which will cause a serious deviation in the operation results of the LR model, collinearity diagnosis of different factors should be carried out first (Saedi et al., 2020). The product of tolerance (TOL) and the variance inflation factor (VIF) is equal to 1, which is also a common indicator that reflects the degree of collinearity of factors. In general, when the VIF is greater than or equal to 10 Frontiers in Environmental Science | www.frontiersin.org July 2021 | Volume 9 | Article 634156 8

Assessment of COVID-19 Risk
Based on the model training results, the higher the risk level is, the higher the probability of COVID-19 occurrence. Incorporating actual geographical locations, a distribution map of the risk level of COVID-19 in Guangzhou in January ( Figure 7A), February ( Figure 7B), and August ( Figure 7C) 2020 is obtained.
The distribution map ( Figure 7A) shows that the areas at high risk of a COVID-19 epidemic in January 2020 were mainly concentrated in the Yuexiu, Haizhu, Tianhe and Liwan Districts. Comparing Figure 2 and Figure 4 reveals that in January, there was a large number of new COVID-19 patients in these regions. Guangzhou is a city with a high concentration of the floating population and permanent resident population, and Guangzhou is also an area with a relatively high distribution density of other spatial factors, such as hotels, shopping malls, and supermarkets. All of these factors increase the risk of COVID-19 outbreaks in these four regions.
The areas at high risk of a COVID-19 epidemic in February 2020 were mainly concentrated in Yuexiu District and Tianhe District. Figure 2 shows that although the Yuexiu and Tianhe Districts are relatively densely populated with permanent residents, the "home quarantine" policy not only greatly restricted the mobility and interaction of people but also reduced the transmission routes and pathways of COVID-19. The "home quarantine" policy effectively curbed the spread of the virus, bringing the cumulative number of new COVID-19 infections under control.
The areas at high risk of a COVID-19 epidemic in August 2020 were mainly concentrated in the Yuexiu, Haizhu, Liwan, Baiyun and Panyu Districts as well as external transportation hubs, including Baiyun Airport and high-speed railway stations. The COVID-19 epidemic was effectively controlled after February, and population activities and urban interactions began to return to normal starting in May 2020. However, with the large-scale mobility and interaction of the population, the risk areas of the epidemic changed from the previous low-risk areas to highrisk areas.
Comparing the high-risk distribution map of the COVID-19 epidemic in January, February and August 2020 reveals that in general, the level of risk experienced a rapid decline and then a slow rise, with the risk reaching its lowest point in February. In addition, in terms of the cumulative number of new patients, there were basically no new local patients after February, which suggests that the "home quarantine" policy was a positive and effective means of epidemic prevention. Additionally, comparing the distribution of regions with a high risk of an epidemic in the 3 months above shows that in February, the areas with a high risk of an epidemic were mainly concentrated in the areas with a dense permanent population, while in January and August, these areas were mainly concentrated in areas with a dense floating population and a dense permanent population, demonstrating that controlling the flow and interaction of the population is the best means of epidemic prevention. Frontiers in Environmental Science | www.frontiersin.org July 2021 | Volume 9 | Article 634156

Prediction of COVID-19 Risk
The COVID-19 risk levels in January, February and August 2020 were reintroduced into the model to simulate and predict the COVID-19 risk distribution in February 2021. As shown in Figure 8, the distribution of COVID-19 risk in February 2021 is roughly similar to that in August 2020; that is, the high-risk areas are mainly concentrated in the Yuexiu, Haizhu, Tianhe, Liwan, Baiyun and Panyu Districts, but the epidemic risk value is higher than that in August 2020. Since these areas have always been areas where the resident population and the floating population are highly concentrated, without corresponding epidemic prevention measures, the mobility and interaction of the population will continuously promote the spread of COVID-19. Therefore, in the event of a second COVID-19 outbreak, these areas will be more likely to spread the virus. Compared with August 2020, external transportation hubs such as Baiyun Airport and the Guangzhou South Railway Station, which have been important regions for population mobility and interaction, have a significantly higher risk of an outbreak in February 2021. The permanent population of Guangzhou will not increase significantly in February 2021; however, February 14, 2021, is the Chinese Lunar New Year. Thus, the whole month falls within the Spring Festival travel season. During the 2019 Chinese Lunar New Year, the population mobility across all of China exceeded 3,000,000,000 individual trips (Zhang et al., 2020c). Therefore, during the Spring Festival travel season of 2021, the population mobility in Guangzhou is bound to reach a new peak, and a large number of population movements are likely to exacerbate the risk of COVID-19 transmission.
Analyzing the risk distribution of COVID-19 between February 2021 and 2020 intuitively shows that the risk of COVID-19 is most directly related to the population concentration and mobility. Therefore, the risk of COVID-19 transmission can be greatly reduced if the population concentration and mobility can be inhibited to a certain extent.

Preliminary Accuracy Test Model
Verification of the risk level of COVID-19 is an important condition for the generalization of research results. Therefore, in order to test the accuracy of the risk assessment of COVID-19 based on spatio-temporal geoepidemiological data, confusion matrix and ROC curve verification are used in this study to verify the accuracy of the results (Shu et al., 2020). Firstly, the dataset of epidemiological data is classified into training data and validation data through the Sklearn module, in which the training data accounts for 70% and validation data accounts for 30% (Abedini et al., 2017). Then, cross-validation is conducted for training data and verification data of different classifications, and the obtained verification indexes are accuracy, precision and recall. Finally, the verification indexes obtained from the training data and test data of different classifications are returned in the form of array to get the final accuracy verification results.

Verification of the Confusion Matrix
The preliminary accuracy test is a crucial step in verifying the reliability and predictability of the model (Kranji et al., 2019). In this study, a confusion matrix (the average value of verification indexes obtained from different training data and verification data) is used to conduct a preliminary accuracy test of the prediction of COVID-19 in February 2021. Confusion matrix test results are shown in Figure 9. The preliminary accuracies of the risk areas and risk-free areas are 0.9932 and 0.8949, respectively. Both of these values are greater than 0.85, demonstrating that the model has high accuracy in its prediction of epidemic risk, but the accuracy of the risk-free areas is relatively low, which may be due to the smaller number of risk-free areas and samples. The precision and recall are 0.9439 and 0.8995 for the risk areas and 0.9392 and 0.8849 for the riskfree areas, respectively. From the perspective of precision, recall and accuracy, the LR model for COVID-19 prediction has relatively high accuracy.

Verification of the Receiver Operating Characteristic Curve
The area under the curve (AUC) value was used to comprehensively test and evaluate the predictive accuracy of the LR model for receiver operating characteristic (ROC) curve validation (Chirisa et al., 2020). When the AUC value is greater than 0.5, the closer it is to 1, the higher the predictive Frontiers in Environmental Science | www.frontiersin.org July 2021 | Volume 9 | Article 634156 accuracy of the model. Figure 10 shows that the after logistic regression, the average AUC values of the cross-checking of training samples, verification samples and all data of different categories are 0.9934, 0.9932 and 0.9933, respectively. All of these values are higher than 0.99 and close to 1, showing that the model has fairly high predictive accuracy and further illustrating the important role of LR models in predicting the COVID-19 risk distribution.

1) Risk factor detector
The factor test results in Table 2 show that in COVID-19 risk assessment, population mobility is the most important factor determining COVID-19 infections in cities, followed by the density of the resident population. This finding is not only consistent with previous COVID-19 risk assessments and predictions but also demonstrates that the most effective way to prevent COVID-19 is to avoid the mobility and excessive agglomeration of people. On the other hand, the densities of public transit stations, shopping malls, and restaurants and the distance to supermarkets have similar influences. That is, the influences of these factors are all slightly lower than those of population mobility, indicating that to prevent the population from being exposed to the public environment for a long period of time, reducing population mobility and interaction in population agglomeration areas is a reasonable means of epidemic prevention. The factors that have the lowest impact on the risk level of COVID-19 are the distance to fever clinics and hotel density because, on the one hand, even if someone tests positive for COVID-19, he or she can be promptly transferred to a fever clinic for treatment; on the other hand, hotels mainly play a role in isolation. During an epidemic, more people choose home isolation, and there is less time to go to a hotel, which makes the population density of the hotel very low; as a result, hotels have little influence as a spatial factor.

2) Interaction detector
The results of the interaction test of different factors are shown in Table 3. The results showed that the risk level of the epidemic in Guangzhou could be best explained by the interaction between the permanent population density and the floating population. When the Q value is 0.67, the effect of the epidemic risk is interpreted to be greater than that of a single impact factor after the interaction of the two indicators, illustrating that epidemic prevention and control can achieve the maximum effect if the floating population and permanent resident population can be effectively controlled.
The ecological test results (Table 4) are obtained based on the assumption that the test value of F is 0.05, where Y represents a significant difference and N represents no significant difference. In the risk distribution of a COVID-19 epidemic, the results for the densities of the permanent population and floating population are significantly different from the results for other FIGURE 10 | ROC curve verification results (the verification results of the ROC curve include the test ROC curve, train ROC curve and total ROC curve, and the three values together determine the accuracy of the results. The closer the area under different curves is to 1, the higher the accuracy will be).
Frontiers in Environmental Science | www.frontiersin.org July 2021 | Volume 9 | Article 634156 influencing factors, indicating that the densities of the permanent population and floating population are the most important factors affecting the risk of COVID-19. Moreover, the floating population has a greater impact on the risk of COVID-19 than the permanent population. Compared with the population factor, there is no significant difference in other factors, including the distributions of supermarkets, hotels and shopping malls. Therefore, under the premise of reasonably and safely controlling population factors, the distribution of other urban facilities can reasonably provide basic life services for urban residents.
In order to avoid the interaction between influencing factors caused by repeated calculation of the algorithm, this study carries out a second cross-validation of the results obtained by the Interaction Detector and Ecological Detector, which shows that there are no significant differences between final verification result and the first one. In other words, the index calculation results between Interaction Detector and Ecological Detector are reasonableness and interdependency.

DISCUSSION
This study proposes a risk assessment and prediction model of COVID-19 based on spatiotemporal geographic epidemiological data, an LR model and geographic detectors. The risk levels of COVID-19 in January, February and August 2020 are obtained, and the areas at high risk of COVID-19 in February 2021 are Frontiers in Environmental Science | www.frontiersin.org July 2021 | Volume 9 | Article 634156 predicted. The spatial variability and attribute associations among different influencing factors are also analyzed to identify the main factors influencing the spread of COVID-19. After the outbreak of COVID-19, the assessment of COVID-19 risk transmission based on the geographical perspective were initially mainly focused on the macro scale, including regional, national and global epidemic assessments (Chakraborty and Maity, 2020). With the popularization of epidemiological data applications for population mobility, the risk assessment of COVID-19 has taken the meso and micro perspectives. That is, studies have started to explore the reasons for the spread of the epidemic from the perspective of the population mobility between communities (Ouyang et al., 2020;Yan et al., 2021). However, such studies continue to place greater emphasis on discussing the impact of population mobility on epidemic risk, and they do not objectively assess and predict the current epidemic risk from the spatiotemporal perspective . In this study, using multisource spatiotemporal geographic epidemiological data, machine learning-based simulations were conducted, taking into account the resident population, the floating population and all urban spatial factors that may affect the spread of the epidemic in geographical space. Finally, the primary and secondary factors affecting the risk of an epidemic are discussed, and the verification results show that the simulation method is quite accurate.
The areas at high risk of COVID-19 are mainly concentrated in areas with resident populations and floating populations, and this result is basically similar to that of previous studies on COVID-19 (Cokun et al., 2021). Since humans are the main carriers of COVID-19 and other infectious diseases, the mobility and interaction of the population are the most important factors contributing to the high risk of COVID-19 Xu et al., 2020b). Compared with current studies related to epidemic risk assessment and prediction, this study focuses on the analysis of the impact of urban spatial factors on epidemic risk from the perspective of spatial-temporal geography, allowing the spread of the epidemic to be expressed in terms of geographical location, which is conducive to preventing and controlling the epidemic in the community at the micro scale.
Finally, this study leaves some areas that require further exploration. Guangzhou, China, was selected as the case for analysis in this study (Peirlinck et al., 2020). To better prevent and control the global pandemic, it is necessary to conduct further assessments and simulations of specific epidemics in cities with severe outbreaks around the world.

CONCLUSION
Risk assessment and prediction of the COVID-19 epidemic and analysis of the main influencing factors hold great practical value for the construction of urban public health safety spaces. In this study, spatiotemporal geographic epidemiological data such as Tencent-migration data and POI data as well as LR and geographical detector models are used to assess the risk of COVID-19 in Guangzhou in January, February and August 2020 and to predict the risk distribution of COVID-19 in February 2021. In addition, the main factors affecting the areas at high risk of COVID-19 are analyzed, and the following conclusions are drawn: 1) The risk of COVID-19 in 2020 mainly exhibited a downward trend and then an upward trend. Although the "home quarantine" policy implemented by the Chinese government has effectively contained the spread of COVID-19 and further reduced the risk of the epidemic for a short time, with the increase in population mobility and interaction degree as well as the recovery of production and the activities of daily life, regional epidemic risk is beginning to show an upward trend. 2) The prediction results of the epidemic situation in February 2021 show that the COVID-19 risk of major external transport hubs in Guangzhou increased significantly due to the arrival of the Spring Festival travel rush, except for areas with dense population movement and interaction. The accuracy of the risk prediction of COVID-19 is greater than 99%, which indicates that the prediction of COVID-19 is highly reliable.
3) The main factors affecting the epidemic risk level are the distribution of the floating population and resident population, and the interaction between the floating population and the resident population also explains the risk distribution of the epidemic to the greatest extent. Therefore, if population agglomeration is limited, then the rational distribution of other urban spatial factors will not have an important impact on the risk of the epidemic.
On the basis of using spatiotemporal geographic epidemiological data, the risk assessment and prediction models for COVID-19 are highly practical and accurate. This study objectively and accurately assesses and predicts areas at high risk of COVID-19, which is conducive to not only preventing and controlling a second outbreak but also providing solutions to urban public security problems for epidemic prevention agencies.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary files, further inquiries can be directed to the corresponding author.