The effect of interurban movements on the spatial distribution of population

Understanding how interurban movements can modify the spatial distribution of the population is important for transport planning but is also a fundamental ingredient for epidemic modeling. We illustrate this on vacation trips for all transportation modes in China during the Lunar New Year and compare the results for 2019 with the ones for 2020 where travel bans were applied for mitigating the spread of a novel coronavirus (COVID-19). We first show that inter-urban travel flows are broadly distributed and display both large temporal and spatial fluctuations, making their modeling very difficult. When flows are larger, they appear to be more dispersed over a larger number of origins and destinations, creating de facto hubs that can spread an epidemic at a large scale. These movements quickly induce (in about a week for this case) a very strong population concentration in a small set of cities. We characterize quantitatively the return to the initial distribution by defining a pendular ratio which allows us to show that this dynamics is in general very slow and even stopped for the 2020 Lunar New Year due to travel restrictions. Travel restrictions obviously limit the spread of the diseases between different cities, but have thus the counter-effect of keeping high concentration in a small set of cities, a priori favoring intra-city spread, unless individual contacts are strongly limited. These results shed some light on the statistics of interurban movements and how they modify the national distribution of populations, a crucial ingredient for devising effective control strategies at a national level.


Introduction
The 2020 Chinese Lunar New Year period witnessed the outbreak of a novel coronavirus  in Wuhan, China, which quickly infected other countries before becoming a pandemic [1]. The proximity of this outbreak with the Chinese Spring Festival, a period of travel with high traffic loads, provided terrible conditions for the spread of this disease. With an increasing amount of confirmed cases, more attention has been devoted to modeling the spread of COVID-19 from various aspects such as determining the value of the reproductive number [2][3][4][5][6][7], of the incubation period [8][9][10]. In general, analytical modeling plays of course an important role in the prediction of the spread and allows in particular to test control strategies [11], which was verified in this case too [12][13][14][15][16][17][18][19][20][21][22]. Particularly important was the estimatimation of probability to export the disease in other countries [19,23,24], and were how effective travel restrictions inside China [19].
Demographic information and mobility, either under the form of data or given by transportation models (see for example the review [25]), are crucial for modeling infectious diseases [26], including this COVID-19. This sort of data is also useful for transport planning [27], city livability [28], for congestion analysis and prediction. Mobility in general concerns either the global scale with movements between countries [18][19][20][21], or the national scale between cities, or even inside cities [20][21][22]. Here we will mostly focus on inter-urban mobility for all types of transport modes, and in contrast with most of the epidemiological studies, we will not model the spread of the disease and instead but will focus on two -interrelatedaspects. First we will focus on statistical properties of movements between cities (in a holiday period) and how the population distribution is affected by these large scale seasonal migrations. This leads us to the second aspect that we will consider, namely, the possible impact of these movements on the epidemic spread between cities. More precisely, we will investigate the statistical properties of traffic flows between cities during the Chinese Spring Festival in 2020 and in 2019. These movements are essentially due to workers coming back to their hometown for the new year holidays and must not be confused with interurban migration where individuals change their town residence. An important point to note is that the comparison of the traffic flows for 2019 and for 2020, where travel restrictions took place, gives us an opportunity to uncover some fundamental properties of mobility. This knowledge is fundamental for understanding and modeling mobility at the national scale. Additionally, it is worth to note that network measure for spatial-temporal weighted networks could also provide fundamental information and deserves future attention [29,30].

Statistics of interurban flows
We will first study standard statistical properties of interurban flows, obtained from migration data provided by Baidu Qianxi (see Material and Methods). This dataset enables us to monitor the traffic flows between cities. For each day d (d = 1, 2, . . ., T), we extract the number of individuals N (i, j, d) going from city i to city j with any travel mode. The migration data can thus be taken as a directed, weighted network of flows between the set of n = 296 cities of China whose populations are also known (see Material and Methods). We collected the data for the Spring Festival of 2020 (from Jan. 1st to Feb. 12th, 2020), and for assessing the impact of travel bans, we also collected the data for the Spring Festival of 2019 (which according to the Chinese lunar calendar takes place from Jan. 12th to Feb. 23rd, 2019).

Large heterogeneity of flows
We first consider the distribution of all flows of individuals N (i, j, d) for all cities i and j and all days d, as shown in Figure 1A. The maximum flow is of order 10 5 and the average of order 10 3 indicates a broad distribution. A power law fit is consistent with this picture with an exponent α ≈ 2.3 ( Figure 1A). This heterogeneity is confirmed in Figure 1B which shows both the average value μ d and the standard deviation σ d computed over all inter-city flows (for each day d). For most days, the relative dispersion σ d /μ d is of order 5-10. This heterogeneity is probably due to the large diversity of cities, which serve as origins or destinations of flows (see below for further analysis). An important feature that Figure 1B Figure 2B the distribution of Δ ij . We observe that the spatial dispersion is of order 8.3, while the temporal dispersion is less (mainly concentrated around 1). The main reason for heterogeneity thus lies in the flow fluctuations between different origins and destinations, while temporal fluctuations are smaller but not negligible. These two sources of heterogeneity clearly represent a challenge for modeling these flows, especially with very simplified models. Our results indicate that the first modeling step would be to describe the spatial heterogeneity of flows and then to consider temporal variations.
The next natural quantities, which can be computed over this network, are the incoming flows N in (i, d) and outgoing flows N out (i, d) defined by respectively. We measure in the same way as above various measures of fluctuations, either averaged over cities or over time, leading to the quantities Δ in (out)

Structure of incoming and outgoing flows
The value of incoming or outgoing flows gives information about the volume of migrations, but not about the number of important origins or destinations. In order to characterize the dispersion over different cities, we denote by O(i, d) and D(i, d), the sets of origin of flows incoming in city i and destinations of flows from city i (for the day d), respectively. We then use Gini indices [30] that capture the dispersion of incoming and outgoing flows and are given by Frontiers in Physics frontiersin.org where O and D represent the number of elements of the sets O(i, d) and Intuitively, if all traffic flows to city i are from one single origin city on day d, the Gini index G in (i, d) will be 1, while if traffic flows to city i are all equal, the Gini index G in (i, d) will be 0 (and similarly for G out (i, d)).
We plot these Gini indices computed for each city versus the traffic flows to or from this city. These Figures 3A,B show that on average the larger the traffic flows are, the more dispersed they are over a larger number of origins or destinations. In terms of epidemic control, it is clear that cities with a large flow N in and a small Gini index G in is the most critical, in the sense that many people from many different cities are converging to the same place. Equally, cities with a large N out and a small G out should be particularly monitored, since they can act as hubs in spreading the disease over the inter-city network. Figures 3C,D show the top 5 critical cities, including Beijing, Shanghai, Chongqing and Guangzhou for both the incoming and outgoing flows, Shenzhen for the incoming flows, and Dongguan for the outgoing flows.

Statistical structure of the national population
An important effect of incoming and outgoing flows is that they change the population structure. Some cities will receive a large number of individuals while for others we expect a decrease of their population. Migration thus affects the statistical structure of the national population and in this section we will characterize this effect.

Temporal evolution of population structure
In order to characterize the disparity of the population distribution and how it varies during seasonal migrations, we consider the population of city i at time d given by  Frontiers in Physics frontiersin.org where P 0 (i) represents the population of city i without incoming and outgoing flows. The Gini index for the city population of the whole country at day d is then given by where P(d) 1 n n i 1 P(i, d) is the average population of all cities at day d. Intuitively, if all people gather in one city, G will be 1, while if people spread evenly across all cities, G will be 0. For comparison, we also define the Gini index at rest as This quantity captures the degree of population concentration without any traffic flows, where P 0 1 n n i 1 P 0 (i) is the average population of all cities without any traffic flows. We show in Figure 4 the variation of the Gini coefficient when we take into account migration flows.
We plot both the results for 2019 and 2020. In both cases we see an important increase of the Gini index in a short time (about a week): When the LNY is approaching, people go back from workplaces to hometowns for reunion with families. A smaller set of cities concentrates these meetings with the number of important cities reaching its minimum and the Gini index reaching its peak on the LNY. After the LNY (Jan. 25th), individuals are going back home and the Gini coefficient relaxes back to its original value, but much slower. We observe that in 2020, the increase of the Gini index is larger and, due to travel bans, the decrease even slower than normal. The reason may be that after the outbreak of COVID-19, almost all regions have deferred the time of resuming works and classes after the Spring Festival holiday. For example, Shanghai proposed that companies not crucial to the nation should not resume works before Feb. 10th and that schools should provide online classes. At this point, the population structure at the national level is far from being back to normal. These different results show that these seasonal movements induce a strong concentration of individuals in a relative small set of cities, and that travel bans tend to keep this situation of high concentration.

Return to "equilibrium": Pendular ratio
We observe in Figure 4 that after the LNY there is a decrease of the Gini index indicating a return to normal state characterized by a lower concentration of individuals. In order to characterize quantitatively this return to the original state (before holidays), we measure the gap between individuals going out from a city before the LNY and coming back after it. This gap defines a 'pendular ratio' given by where d f is a range of days around the LNYd. If this ratio is much larger than 1, it means that for this city there is a large incoming flow while for the opposite situation R (i, d f ) ≪ 1, a large number of individuals are going out (compared to the incoming flows). At large times d f , we expect that R ≃ 1 since most of the individuals have come back. We divide cities into three categories according to the value of R (i, 1): If the value is larger than 1.5, we classify city i as a "receiver" city. If the Frontiers in Physics frontiersin.org value is less than 0.5, we classify city i as an "emitter" city. Finally, if the value is between 0.5 and 1.5, we classify city i as a "transit" city. We represent on Figure 5 the cities of different types on the map of China. We observe that both receiver and transit cities are homogeneously distributed in China. In contrast, emitters cities are in general located in developed regions, e.g., Beijing, Shanghai, Guangzhou, and so on, as shown in Figures 5A,B. It is interesting to note that cities of the Hubei province (within the dashed circle in the figure) are emitters cities in 2020, essentially due to travel restrictions that prevented individuals to come back to Wuhan. This is an important difference compared to the year of 2019 that appears here in the spatial structure of emitters and receivers. We show in Figures 6A,B the pendular ratio for 2019 and 2020 for all cities and we highlight 5 cities: Wuhan, Beijing, Tianjin, Chongqing, and Shanghai, corresponding to the origin place of COVID-19 and four province-level municipalities. We note here that the curve corresponding to Wuhan is at the bottom of all cities in Figures 6B, reflecting the success of sealing off Wuhan from all outside contact to stop the spread of the disease since Jan. 23rd.
In Figures 6C,D, we show this pendular ratio for 2019 and 2020 for the different types of cities (we average over cities in a given category, emitter, receiver or transit). Results show that the standard deviation is small for the three groups adding credit to their definition. In addition, compared to 2019, the values of R (i, 1) corresponding to 2020 are much smaller. In 2019, the pendular ratio of all the three types of cities returns to 1, meaning that the majority of individuals who went away for the holidays came back. The situation for 2020 is very different with a pendular ratio for all types of cities that converges to a value less than 1 (even less than 0.5), indicating that the majority of people who went away for the holidays did not come back yet. This result remains consistent with the conclusion of Gini index (Figure 3) about a larger concentration in cities and the effect of travel bans.
Finally, we note here that we additionally implemented our whole analysis at the province level (Supplementary Material) and the results obtained are similar to those obtained at the city level.

Discussion
Our findings thus concern four different aspects. First, the traffic flows between cities are very heterogeneous not only spatially but also from a temporal perspective. Such a large heterogeneity could be induced by the large flows observed during this particular period of the Spring Festival and also by travel bans. We note here that similar results apply also to an aggregated level, i.e. the incoming and outgoing flows for provinces also display important heterogeneities. This Frontiers in Physics frontiersin.org heterogeneity aspect is crucial for understanding and modeling epidemic spreading for which we know its importance [31, 32] and more generally for most processes on networks [33]. We quantify the dispersion of origins/ destinations of the incoming/outgoing flows showing that for larger flows we have a larger variety of origins and destinations. We also show that during these seasonal migrations of the Spring Festival, the national structure of population changes quickly with a larger concentration in a small set of cities. This concentration decays normally in time after the festivities but travel bans slow down this return to the initial state. It is natural to try to stop the geographical spread of the disease by interurban movements, but on the other hand, large concentration in cities can favor the spread at the city level and increase the number of infected cases. This concentration can be compensated by a more important control at the individual contact level which is what was done in cities such as Wuhan. These results are in line with epidemic modeling results [21], where travel quarantine is effective only when combined with a large reduction of intracommunity transmission.
The study presented here focuses on this particular and very important event of the Chinese Lunar New Year, and it would be interesting to test these properties for other events and for other countries where a large fraction of the population moves within the country. Our results highlight the importance of mobility studies for modeling a variety of processes and in particular for understanding and modeling the spread of epidemics. Effective mitigating strategies need to take into account the change of population structure that we exhibited here.

Methods Data
We obtained the migration data from Baidu Qianxi (http://qianxi.baidu.com), by using Baidu Location Based Services, and Baidu Tianyan, for all transportation modes. It provides the following two datasets: Migration index reflecting the size of the population moving into or out from a city/province, and migration ratio capturing the proportion of each origins and destination. We collected the data during Chinese Spring Festival period of 2020 (from Jan. 1st to Feb. 12th, 2020). For parallel comparison, the migration index during the same period of 2019 (re-scaled according to Chinese lunar calendar, from Jan. 12th to Feb. 23rd, 2019) is also used.
In addition to the migration data, we collected the demographic from China Statistical Yearbook (http://www. statsdatabank.com), an annual statistical publication, which reflects comprehensively economic and social development of China. It covers key statistical data in recent years at both the city level and the province level. We collected the data of population of 31 province-level regions and 296 city-level regions from China Statistical Yearbook 2019, the latest edition provided.

Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Author contributions
PJ and MB designed the study, QH collected the data, and JY performed calculations. PJ and MB analyzed and interpreted the data and wrote the manuscript.