Evaluation of the Secondary Transmission Pattern and Epidemic Prediction of COVID-19 in the Four Metropolitan Areas of China

Understanding the transmission dynamics of COVID-19 is crucial for evaluating its spread pattern, especially in metropolitan areas of China, as its spread could lead to secondary outbreaks. In addition, the experiences gained and lessons learned from China have the potential to provide evidence to support other metropolitan areas and large cities outside China with their emerging cases. We used data reported from January 24, 2020, to February 23, 2020, to fit a model of infection, estimate the likely number of infections in four high-risk metropolitan areas based on the number of cases reported, and increase the understanding of the COVID-19 spread pattern. Considering the effect of the official quarantine regulations and travel restrictions for China, which began January 23~24, 2020, we used the daily travel intensity index from the Baidu Maps app to roughly simulate the level of restrictions and estimate the proportion of the quarantined population. A group of SEIR model statistical parameters were estimated using Markov chain Monte Carlo (MCMC) methods and fitting on the basis of reported data. As a result, we estimated that the basic reproductive number, R0, was 2.91 in Beijing, 2.78 in Shanghai, 2.02 in Guangzhou, and 1.75 in Shenzhen based on the data from January 24, 2020, to February 23, 2020. In addition, we inferred the prediction results and compared the results of different levels of parameters. For example, in Beijing, the predicted peak number of cases was 467 with a peak time of March 01, 2020; however, if the city were to implement different levels (strict, moderate, or weak) of travel restrictions or regulation measures, the estimation results showed that the transmission dynamics would change and that the peak number of cases would differ by between 54% and 209%. We concluded that public health interventions would reduce the risk of the spread of COVID-19 and that more rigorous control and prevention measures would effectively contain its further spread, and awareness of prevention should be enhanced when businesses and social activities return to normal before the end of the epidemic. Further, the experiences gained and lessons learned from China offer the potential to provide evidence supporting other metropolitan areas and big cities with their emerging cases outside China.

Understanding the transmission dynamics of COVID-19 is crucial for evaluating its spread pattern, especially in metropolitan areas of China, as its spread could lead to secondary outbreaks. In addition, the experiences gained and lessons learned from China have the potential to provide evidence to support other metropolitan areas and large cities outside China with their emerging cases. We used data reported from January 24, 2020, to February 23, 2020, to fit a model of infection, estimate the likely number of infections in four high-risk metropolitan areas based on the number of cases reported, and increase the understanding of the COVID-19 spread pattern. Considering the effect of the official quarantine regulations and travel restrictions for China, which began January 23∼24, 2020, we used the daily travel intensity index from the Baidu Maps app to roughly simulate the level of restrictions and estimate the proportion of the quarantined population. A group of SEIR model statistical parameters were estimated using Markov chain Monte Carlo (MCMC) methods and fitting on the basis of reported data. As a result, we estimated that the basic reproductive number, R 0 , was 2.91 in Beijing, 2.78 in Shanghai, 2.02 in Guangzhou, and 1.75 in Shenzhen based on the data from January 24, 2020, to February 23, 2020. In addition, we inferred the prediction results and compared the results of different levels of parameters. For example, in Beijing, the predicted peak number of cases was 467 with a peak time of March 01, 2020; however, if the city were to implement different levels (strict, moderate, or weak) of travel restrictions or regulation measures, the estimation results showed that the transmission dynamics would change and that the peak number of cases would differ by between 54% and 209%. We concluded that public health interventions would reduce the risk of the spread of COVID-19 and that more rigorous control and prevention measures would effectively contain its further spread, and awareness of prevention should be enhanced

INTRODUCTION
The World Health Organization (WHO) named the virus "2019 novel coronavirus disease" (COVID-19) and the novel virus "severe acute respiratory syndrome coronavirus 2" (SARS-COV-2), which has attracted worldwide attention. The new coronavirus is a strain that has never been found in humans before. This virus can cause an acute respiratory disease, and common signs of infection include respiratory symptoms, fever, cough, shortness of breath, and dyspnea. In more severe cases, infection can cause pneumonia, severe acute respiratory syndrome, kidney failure, and even death (1).
According to WHO situation reports, the outbreak of COVID-19 has led to 79,407 confirmed cases worldwide and 2,622 deaths in 32 countries as of February 24, 2020, of which 64,287 were from Hubei, China. Numerous cases have been reported in other areas outside Hubei, including metropolitan areas of Beijing (n = 399) and Shanghai (n = 335) as well as other countries outside China, such as South Korea (n = 833), Japan (n = 144), and Italy (n = 124). With the continuously increasing number of cases, understanding the spread pattern of COVID-19 and monitoring spikes in the number of cases are crucial steps in providing evidence that could guide public health intervention strategies and healthcare policy making.
Several mathematical models and data analysis approaches attempting to estimate the transmission of COVID-19 have been recently reported (2)(3)(4). Public health interventions and transportation restriction effects for disease transmission have also been evaluated in some studies (5,6). Some studies indicated that public intervention measures greatly mitigate the final size of the epidemic, and shift the turning point about 24 days before the turning point without these measures (7). Some noted that travel restrictions would not affect much unless combined with a 50% or higher reduction of transmission in the community (8). And a report from Imperial College COVID-19 Response Team concluded that the intensive intervention or something equivalently effective, such as combining home isolation of suspect cases, home quarantine of those living in the same household as suspect cases, and social distancing of the elderly and others at most risk of severe disease, could reduce transmission. However, this would need to be maintained until a vaccine becomes available, and the team also predicted that transmission will quickly rebound if interventions are relaxed, so it requires the combination of multiple interventions to have a substantial impact on transmission (9). In order to predict the outbreak size and time, researchers have published many different results for forecasting when the outbreak will peak in different areas (10,11). These models are certainly useful to understand the emerging trends of COVID-19. However, there are several challenges to such timely analyses and forecasting. Due to barriers, such as the disease incubation period, asymptomatic infection, diagnosis testing capacity, overloaded medical staff, and complicated reporting processes, there can be delays or missed reporting in this evolving situation regarding the confirmation of cases. Furthermore, the adopted models have mostly been complicated with many pre-settings or assumptions or parameter values that are likely not accurate. Although some modeling approaches can estimate parameter values through statistical methods, they can only contribute a rough simulation for the modeling. As a result, those studies achieved different prediction results by using different methods and datasets.
To achieve a relatively objective judgment, given that that this new disease and complicated situation has many unknown factors, we used mathematical modeling methods to characterize COVID-19 transmission and used multiple datasets for ensuring the data reliability. Since individual data sources may be biased or incomplete, according to related studies, the use of multiple data sources rather than a single dataset can enable a more robust estimation of the underlying dynamics of transmission (12). Therefore, we investigated and collected data from four sources, including released data and official daily reports from commercial technology companies, academic institutes, authorities, or local healthcare commissions, and the World Health Organization, to minimize the resulting errors caused by potentially biased single data sources. The data were obtained from the Beijing Municipal Health Commission (BMHC) (13), Shanghai Municipal Health and Family Planning Commission (SMHFPC) (14), Health Commission of Guangdong Province (15), National Bureau of Statistics of China (NBSC) (16), Baidu Migration Big Data Platform (BMBDP) (17), Center for Systems Science and Engineering (CSSE) of Johns Hopkins University (18), and WHO coronavirus disease (COVID-2019) situation reports (19). Considering that the cases detected in these four cities were all imported or secondary transmission cases, and based on the reported data available after January 20, 2020, Chinese authorities have implemented prevention measures in these cities to contain the outbreak and prevent the disease from spreading; thus, we considered the secondary transmission pattern of COVID-2019 to be different than the early spread pattern in Wuhan, where the virus was rampantly transmitted without any prevention measures. Therefore, we collected data from January 24, 2020 (Chinese New Year's Eve) to February 23, 2020 to give an overall objective estimation of COVID-19 development in four high-risk metropolitan areas of China: Beijing, Shanghai, Guangzhou, and Shenzhen. We estimated how COVID-19 human-to-human transmission occurred in these large cities, which have developed considerable cases. We further used these estimates to forecast the potential risks and development trends of these four metropolitan areas inside China.

METHODS
To evaluate the COVID-19 spread pattern and estimate its transmission in four metropolitan areas, we used an adjusted SEIR model with data. We only considered human-to-human transmission in our models.

Adjusted SEIR Model for COVID-19
The SEIR model is a deterministic metapopulation transmission model in which the population is divided into four classes: S (susceptible, people who are likely to be infected), E (exposed, people who are exposed), I (infectious, people who are infected), and R (removed, recovered and dead persons). We assumed that the epidemic risk started with infectious cases on February 3, 2020, when authorities announced that people were returning to work after the Chinese Spring Festival holiday. Therefore, we modeled a period beginning on February 3, 2020. The SEIR model state transition is shown in Figure 1. In our estimation, the entire population was initially susceptible since COVID-19 is an emerging new infectious disease and not all people have immunity against it. In January (before Chinese New Year), there were an estimated 3.246, 2.847, 3.430, and 3.271 million people flown out from Beijing, Shanghai, Guangzhou, and Shenzhen, respectively. We took this outflow number out from these four cities' initial populations and assume they returned after Chinese New Year by February 17, 2020. We estimated the initial exposed population using the number of confirmed cases during the next 7 days. We assumed that the median incubation period was 5-6 days (ranging from 0-14 days) based on the WHO report (20).
Based on the basic SEIR model, we further considered the influence of multiple factors on the transmission pattern as the situation unfolded, including public health intervention measures, people's self-protection behaviors, the diagnosis rate, population flow, etc.
Assuming that public health interventions contributed to the control of the dynamics of the epidemic, we incorporated a parameter that indicates the changes in the population flow into the model. According to the inflow index, outflow index, and urban daily adjusted index of the travel intensity from the Baidu Migration Big Data Platform, for the period from January 24, 2020 to February 23, 2020, we inferred that people's activity was obviously lower than the normal level for the same period last year. Furthermore, considering the Spring Festival population flow and those returning to work after the holiday (officially announced as February 3, 2020), we regarded that the risk for FIGURE 1 | SEIR model. these four metropolitan areas grows with the inflow population increase starting on February 3, 2020, and the four cities executed 14 days quarantine policy for incoming travelers during that time, the spread was contained strictly, so an average introduced number of cases were counted into the model.
We also estimated the parameter values within these cities using the MCMC method. Cases in the reported data and other sources reported between January 24, 2020 and February 23, 2020 were used to adjust the model. Considering the possible complex influencing factors, we proposed an adjusted SEIR model for COVID-19 estimation, as displayed in Figure 2.
In the adjusted SEIR model, we considered the inflow of the city's population, so the total number of people was not fixed, and the population was divided into seven classes: S (susceptible, people who are likely to be infected), E (exposed, people who are exposed), I (infectious, people who are infected), R (recovered and dead persons), Sq (quarantined susceptible persons), Eq (isolated exposed persons), and Iq (isolated infected persons). The transmission dynamics are governed by the following equations: where q is the quarantined proportion of exposed individuals, β is the transmission probability per contact, c is the contact rate which defines how many people are contacted with an infected person per day, and i is the estimated infected people within the inflow population each day. The quarantined infected people moved to the compartment E q at a rate of βcq, while the quarantined uninfected people moved to the compartment S q at a rate of (1-β)cq. Those who were not quarantined, if infected, moved to compartment E at a rate of βc(1 − q). θ is the transmission capability between the latent and the infected population. According to the reported results of related work, the transmission capability of the people in the incubation period and the diagnosed infected patients are similar (21), we assume that θ = 1. λ is the transition rate from the quarantined to susceptible population, σ is the transition rate from the exposed to the infected population, α is the mortality rate, δ I is the transition rate from the infected population to the quarantined infected population, and γ I is the recovery rate of the infected population. δ q is the transition rate from the quarantined exposed population to the quarantined infected population, and γ H is the recovery rate of the quarantined infected population.

Parameter Estimate Methods
The MCMC method is a commonly used algorithm in modern statistical calculations. This algorithm provides an effective tool for establishing statistical models and is widely used in Bayesian calculations of complex statistical models (22). We used the MCMC method and Metropolis-Hastings(MH)  algorithm sampling (23) with a normal distribution as the recommended distribution, estimated the parameters of the modified SEIR model to obtain the baseline estimation of parameters, incorporated the data collected from infectious disease reports into the above statistical inference, and simulated the process of infectious disease transmission to further fix some parameters on the basis of fitting reported data. Using Beijing as an example, the parameter estimates and initial values of the SEIR model are listed in Table 1.
In addition, to simulate the contact rate for model estimation, we used urban travel index data from Baidu, a major internet company in China that hosts the popular navigator app Baidu Maps, which indirectly monitors the real-time urban travel intensity and population flow. The Baidu index of travel intensity and population flow was converted into the corresponding coefficients for the contact rate and the quarantined susceptible population. In terms of the Baidu index, we simulated people's activity level by comparing our observed period (under strict interventions) with a normal level in the same period last year. We also consider the assumption scenario that when people return back to work (limited interventions), accordingly, we added the coefficients (0.6c, 0.8c, c, 1.5c, 2c) for the baseline contact rate to compare different effectiveness of interventions. Similarly, the coefficients were added to baseline quarantine proportion (0.6q, 0.8q, q, 1.5q, 2q).

Basic Reproduction Number R 0 Estimates
At the onset, when all people are susceptible, R 0 is defined as the average number of new infections directly caused by a case in a population of people who are all susceptible. Given the model structure includes quarantine and isolation, we used the next generation matrix to derive a formula for the basic reproduction number after public health interventions were executed, the principal eigenvalue of the next generation matrix is the expectation of population growth and the equation is as follows and the parameter definition is same with adjusted SEIR model.

Data Characterization
To characterize the overall epidemic size and dynamics, Figure 3 shows the epidemic curve of COVID-19 cases identified in Beijing, Shanghai, Guangzhou, and Shenzhen from January 24, 2020 to February 23, 2020.

Adjusted SEIR Model Estimation
We summarized and interpreted the transmission dynamics of COVID-19 in the four metropolitan areas. The adjusted SEIR model was used to predict cases in Beijing, Shanghai, Guangzhou, and Shenzhen, and Figure 4 shows the comparisons between   the predicted results and actual results. The results are based on an assumption of no further imported cases to these cities since China implemented strong regulation measures during the observation period. Based on our observations from the data shown in Table 2 and Figure 5 below, we also found that the number of infected individuals changed with different levels of public health interventions and that strict interventions could decrease the peak number of infected individuals compared with the scenario of weak interventions; accordingly, we used different contact rates to reflect the different levels of interventions. The baseline contact rate was derived by the MCMC method, and the results show that reducing the contact rate either persistently decreased the peak value or could delay the peak. In addition, with strict public intervention, the number of infected individuals eventually decreased, and the peak appeared sooner than it would with weak intervention methods. After February 3, 2020, as people returned to work after a holiday, many people returned to these cities, which was inferred from the Baidu transportation index. We added this information into the risk factors for the contact rate (1.5c, 2c). Accordingly, the number of infected individuals increased compared with the scenario of a decreased contact rate (0.8c, 0.6c).
In addition, we compared the transmission dynamics with different quarantined proportion of exposed individuals, As shown in Table 3 and Figure 6, which reflects the contact tracing capability and management efforts of local governments, and the results show that reducing the quarantined proportion of exposed individuals (0.8q, 0.6q) led to an increase in the peak value and delayed the peak time. Conversely, the peak value decreased and an earlier peak time occurred with a higher quarantined proportion of exposed individuals (2q, 1.5q).

R 0 Estimation Results
We used the MCMC method to fit the model and adopted an adaptive MH algorithm to carry out the MCMC procedure. As a result, we inferred R 0 = 2.91, 2.78, 2.02, and 1.75 for Beijing, Shanghai, Guangzhou, and Shenzhen, respectively.

DISCUSSIONS
Our analysis results strongly demonstrate that reducing secondary infections among close contacts would effectively limit human-to-human transmission, and public health measures, such as the rapid identification of cases, tracing and following up with people who had contact with an infected person, infection prevention and control in health care settings, and the implementation of health measures for travelers, can greatly prevent further spread of the disease. The documented COVID-19 reproduction numbers range from 2.0 to 4.9 (6,11,25), which are based on cases that developed during different transmission phases and in different areas. For instance, the R 0 in Wuhan was obviously higher than that in other cities during the timeframe analyzed. Furthermore, after implementing the prevention measures employed by the Chinese government and local authorities, we regarded the inferred R 0 results of the four cities as reasonable and interpretable.
In this study, we aimed to monitor COVID-19 trends after cases were imported into other cities and estimate the spread pattern by mathematical modeling, which can be helpful for evaluating the potential risk and severity of new outbreaks. The results of our study show that, for four metropolitan areas of China, the containment measures were an effective control at that time; however, it is imperative to raise awareness in the population and prevent potential outbreak risks going forward. The study has limitations. The present reported data are insufficient to understand the full epidemiological pattern of COVID-19 transmission and new potential outbreaks. For example, the estimates in this manuscript have a certain degree of uncertainty and delays due to the limitations in reporting mechanisms over the course of the natural history of the cases, the impact of other potential asymptomatic cases, and some unreported cases. Some studies were conducted with the assumption that a small fraction, 20%, were not reported (7) and others reported the estimated asymptomatic proportion was 17.9% (26) or 60% (21). Evidently, such asymptomatic infectious cases are not fully reported by current testing method. However, some studies suggested crowdsourced data could be FIGURE 6 | Infected population curve with different quarantined proportion of exposed individuals for four cities.
compiled and analyzed as an complementation of officially released data, which could perhaps help in improving the analysis results (27)(28)(29).
As concluded from the WHO-China Joint Mission report (30), the COVID-19 transmission dynamics are inherently contextual, as are the dynamics for any outbreak, and people worldwide need to work together to defend against this disease. To do this, it is necessary to: (1) enhance the understanding of the evolving COVID-19 and the nature and the impact of ongoing containment measures; (2) share knowledge on the COVID-19 response and preparedness measures being implemented in countries affected by or at risk of importations of COVID-19; (3) generate recommendations for adjusting COVID-19 containment and response measures in China and internationally; and (4) establish priorities for a collaborative program of work, research, and development to address critical gaps in knowledge, responses, readiness tools, and strategies.
As a consequence of our study, we concluded that the outbreak could be greatly reduced by strict public health interventions. The public intervention strategies and implemented protection measures conducted in these four areas may help provide epidemiological suggestions to governments that guide measures for the international cases that are rapidly emerging.

DATA AVAILABILITY STATEMENT
The datasets analyzed in this article are not publicly available because they are kept by a private affiliation (Peking Union Medical College Hospital). Requests to access the datasets should be directed to Na Hong (hongna@dchealth.com) or Yun Long (ly_icu@aliyun.com).

AUTHOR CONTRIBUTIONS
LS, NH, and XZ contributed equally. WZ, YL, and GS take responsibility for the integrity of the work as a whole, from inception to published article. NH and LS were responsible for study design and conception and drafted the manuscript. FC and LH collected and cleaned the data. JH, YM, and HJ were responsible for data modeling and analysis. WZ, GS, and XZ interpreted the results. LS, NH, and XZ drafted the manuscript. All authors revised the manuscript for important intellectual content.