Modelling exposure between populations using networks of mobility during COVID-19

The use of mobile phone call detail records and device location data for the calling patterns, movements, and social contacts of individuals, have proven to be valuable for devising models and understanding of their mobility and behaviour patterns. In this study we investigate weighted exposure networks of human daily activities in the capital region of Finland as a proxy for contacts between postal code areas during the pre-pandemic year 2019 and pandemic years 2020, 2021 and early 2022. We investigate the suitability of gravity and radiation type models for reconstructing the exposure networks based on geo-spatial and population mobility information. For this we use a mobile phone dataset of aggregated daily visits from a postal code area to cellphone grid locations, and treat it as a bipartite network to create weighted one mode projections using a weighted co-occurrence function. We fit a classical gravity model and a radiation model to the averaged weekly and yearly projection networks with geo-spatial and socioeconomic variables of the postal code areas and their populations. We also consider an extended gravity type model comprising of additional postal area information such as distance via public transportation and population density. The results show that the co-occurrence of human activities, or exposure, between postal code areas follows both the gravity and radiation type interactions, once fitted to the empirical network. The effects of the pandemic beginning in 2020 can be observed as a decrease of the overall activity as well as of the exposure of the projected networks. These effects can also be observed in the network structure as changes towards lower clustering and higher assortativity. Evaluating the parameters of the fitted models over time shows on average a shift towards a higher exposure of areas in closer proximity as well as a higher exposure towards areas with larger population. In general, the results show that the postal code level networks changed to be more proximity weighted after the pandemic began, following the government imposed non-pharmaceutical interventions, with differences based on the geo-spatial and socioeconomic structure of the areas.


Introduction
Studies on human mobility using large scale and longitudinal datasets from real world systems are useful for understanding human behaviour, designing techno-social systems, as well as forecasting different crisis scenarios like pandemics.
Alongside with improved methods for obtaining novel data from mobile phones or other digital devices, human mobility as long range migration and short range trips has been an active field of research.The use of mobile phone records data, social media data and other mobile applications has made it possible to capture large and longitudinal datasets of human locations, trajectories and behavioural patterns (1).This data has motivated the modelling of various aspects of human mobility, such as patterns of their behaviour and trajectories (2; 3), co-locations (4), agent-based systems (5; 6; 7; 8), and migration (9; 10; 11; 12; 13; 14).Networks are naturally useful for modelling and analysing mobility patterns, as the locations and trips, or entities and social ties, can be considered as nodes and edges, respectively.Networks of telecommunication and proximity have shown characteristic structural properties present in human networks, such as small-worldness and the presence of communities (15).
Over the years numerous studies have demonstrated the relationship between the mobility and distance, such that the number of trips between two locations has a negative correlation with the distance between them.Similarly, when modelling the mobility between two populations, the population sizes have been shown to correlate with the number of trips between the said locations.(16; 17; 13) These gravity type models, inspired by Newton's law of gravity, have also been applied to communications (16; 17), social connections and trade (18).Another commonly used model for describing human mobility is the radiation model, which considers the number of opportunities as the distance variable such that the number of opportunities within the radius of the distance between two locations denominates the probability of an action to another point (19).Such models have been utilized to describe the use of of services, job seeking and migrating.It should also be noted that there are numerous other external variables affecting the human mobility in addition to the commonly used population sizes and distances, such as available means and cost of transportation or government imposed restrictions (20; 21).
With the emergence of the SARS-CoV-2 pandemic in Finland in early 2020, the authorities imposed non-pharmaceutical interventions (NPIs) in order to contain the spread of the virus and keep the demand for healthcare and hospitalisations on a sustainable level (22).The global pandemic sparked large interest in studying different models of epidemic spreading as well as strategies regarding the restrictions and the related efficiency (23).These NPIs included such methods as social distancing (24; 25), contact tracing (26) as well as restrictions like school closings and remote work guidelines (23; 27), which affect the overall daily behaviour of people, should they comply with them.In addition to the restrictions imposed by the government, the self-imposed social distancing of health-aware people had an effect on their mobility as well as on the spread of the virus.As stated in the study by Chang and coauthors (28), the models for the spread of the epidemic will need to take into account the effects of changes in mobility.These changes, caused by restrictions or general awareness, have been studied at the level of countries (29; 30; 31; 32) and at a more microscopic level such as cities and grids (33; 34; 35; 28; 36).Worth noticing is that the dynamics of epidemic spreading have been shown to be more complex than just the contacts caused by activity (37) and the estimated epidemiological effects of different NPIs have been shown to be model dependent (38).
In this study we investigate the human behaviour before and during the Covid-19 pandemic in Finland using the daily activities as a proxy for estimating the possible contacts between the populations living in different postal code areas.The overarching objective of this study is to use modelling to show large scale changes in exposure induced by human mobility during the pandemic using postal area level networks.Our first research objective is to use the aggregated daily activity data for creating models of possible exposure-networks (or contact networks).The second objective is to quantify the changes in this system during the first two years of the pandemic in comparison to the preceding prepandemic year 2019.Finally, we aim to show a relationship between the government imposed NPIs and the exposure networks, using data from (39).
While many other studies have used the activity-type aggregated data from companies such as SafeGraph, CueBiq, Google or Apple (see (30; 29; 31; 34)) for various countries and regions, we seek to obtain new insight to the relationship of activity and the NPIs by using postal code and cell grid level data on a daily time resolution.This contains more micro level activity than the community level reports used in some studies, but less information than the individual mobility traces used in some other studies.The data we use is obtained from Telia, a telecom company operating in Finland, and it contains aggregated daily activity data of mobile phone users.Similar data has been used in other studies such as (40; 41; 19; 36), but not in the same type of modeling paradigm.The methodology of using gravity and radiation type models for predicting the movement of people between two areas with varying population, number and diversity of services (42), wealth or similarity (43) has been investigated with different datasets of CDR or GPS locations in the form of origin-destination networks.Our study considers a bidirectional relationship between postal code areas, where the populations in these areas do not require visiting the other's geographical area to form an edge.Thus, the bidirectional edges represent the interaction of exposure in a broader setting than directed trips between an origin area and a destination area.
The rest of the manuscript is divided into four sections, Methods, Results, Discussion, and Conclusions.First, we present the daily activity data along with the pre-processing steps done to make the data into sets of networks for periods of time.From these networks we construct two types of models, a gravity-type model and a radiation-type model, by using postal code area level data.In the Results section we present the findings on the fitting of the two classical models and the changes in the resulting networks over the course of years 2019 to 2021.In the Discussion section consider the implications of our findings and discuss potential future research.In the Conclusions section we present overall concluding remarks.

Material and Methods
In this section, we describe the empirical dataset and explain the construction of the weighted bipartite projections, the measures used to investigate the temporal changes in the network and finally the proposed model for predicting the likelihood of social connections between areas using geospatial and socioeconomic information.

Data
The data used in this study is obtained from Telia, a telecommunications company operating in Finland, Sweden and Denmark.This data, called "Crowd Insights", contains aggregated human activities for each day in the format of home and grid location matrix.Each mobile phone user's home location is marked with its postal code area, where the user stayed longest before 9 am, i.e. the place the user woke up in the morning.The activity of each user is recorded with the binary (0 or 1) if the user spent at least 20 minutes during the day in that particular grid location.This data is then anonymised and aggregated for each postal code area, thus generating a matrix where during one day a postal code area has an amount of visits or mobile activities to each grid cell.The grid was defined by the telecom areas, ranging in size from 500 metres by 500 metres to 2 kilometres by 2 kilometres depending on the density of users in the area.In order to protect the privacy of less populated areas, activities with less than 5 people are redacted from the dataset.The dataset contains the activities during the periods 1.1.2019-23.9.2019 and 1.2.2020-31.3.2022, with some days being removed due to low signal quality.For reducing the possible inaccuracy caused by the incomplete yearly sets and the seasonal trends therein, we create uniformly sized subsets where each date is contained in the set of each year.By filtering these subsets, we obtain 219 dates for each year from 2019 to 2021.These subsets are referenced as yearly sets in this manuscript.
In this study we consider a subset of the data by choosing only the activities that are both stemming from and occurring within the capital area of Finland, meaning the municipalities of Helsinki, Vantaa, Espoo and Kauniainen.In other words, an accepted activity is performed by a person waking up in the chosen geographical area and performing the activity in the grids of the area.Thus, we consider the capital area as a separate subsystem from the rest of the country, even though the data contains people moving in and out of the system.Postal code areas in the chosen geographical area are highly populated, but also smaller in size than the average postal code areas in the country.
Filtering the data results in 167 postal code areas as home areas and 1444 grids as the areas for performing mobile activities.This data can be presented as a bipartite network, where for each day we consider the home areas and grid areas as separate classes of nodes and the links are the number of people who have performed the mobile activity.In the scope of this paper, we are investigating the difference in human behavior during the years 2019 to 2022 by constructing networks of the postal areas and analyzing it at the graph level using various time windows for the activity aggregation.The aim is to use the commonalities between activities of various areas as a proxy for contact as well as functional activities such as work and thus form a network of connected postal areas.
In addition to the activity data, we use socioeconomic data on the same postal code area level as variables in the model.The data is obtained from Statistics Finland (44) and is openly available.In this study we are not considering the changes in the socioeconomic values for each postal code area as the changes can be considered to be minor during the timespan of the study and the reporting of postal code areas changes between the years in the Statistics Finland data.As the final source of information we use the dataset by Hale et al. (39) for obtaining a numerical index on the level of mobility and other activity related restrictions in Finland during the pandemic.
For the analysis of the daily contact networks between the postal code areas we calculate a set of graph theoretical properties with the intention of observing changes and quantify some of the characteristics of the networks.Daily population activities follow a weekly schedule as well as public holidays and holiday periods, resulting in relatively high variance throughout the year.We address this by analyzing the network properties on daily networks as moving averages as well as networks aggregated as averages of 7-day periods or as years.The dataset contains empty gaps, namely the time between 24.09.2019 and 31.01.2020, which we address by comparing only the common dates during each year when doing comparisons between the years.Otherwise these empty periods of time are visualized as discontinuities.
For measuring the changes in the network and reinforcing the reasoning behind the models, we calculate the average clustering coefficient, assortativity of degree correlations and weighted degree distributions.Then we compare the resulting weighted networks to the distances and population sizes used in Eq. 2 and Eq. 3 without fitting the data beforehand.For the calculation of the average clustering coefficient and the degree assortativity we utilize the NetworkX library (45) in Python 3.7.

Models
In this section we describe first the process of constructing the networks of exposure from human mobility using the above described empirical data and then fitting two types of models to simulate the interactions between postal code areas, i.e. the nodes of the network.We use the edges of these projected networks for fitting a gravity type model and a radiation type model using postal code level population data.The changes in the structure of these networks over time are analysed using standard graph theoretical measures.

Bipartite network projection
The temporal activity data forms a bipartite network in the form of an adjacency matrix A of size 167 × 1444, where each postal area (row) has a number of activities in each geolocated grid (column).From this matrix we calculate a weighted one mode projection by defining the edge weight function W ij for the postal code areas i and j as the sum of products over each grid cell: where w g a (t) is the number of activities, i.e. people visiting the grid cell g from the postal code area a at time t and the denominator is the sum of all activities from the corresponding postal code area.The multiplication ensures that both postal code areas, i and j, have activity in the particular grid cell during that day.The weight W ij is bounded between 0 and 1, where 0 would indicate no co-occurring activities and value 1 would indicate full co-occurrence to a single grid cell, e.g as depicted in Fig. 1.A small value indicates a low probability for picking a random activity from each postal code area such that both of the activities were performed in the same grid cell.This way we obtain a weighted projection between the postal code areas that can be used as a proxy for contacts between the postal code areas.An example of the dynamics of the function 1 is depicted in Fig. 1.Using this method for the duration of the empirical data we obtain S = 1031 daily networks, in which the nodes are the only permanent structure.

Modelling networks of exposure
The main research goal of this paper is to create a characteristic model for predicting and understanding the resulting bipartite network projections (Eq. 1) that represent the contacts between the postal code areas due to the co-occurrence of individual mobile activities or mobilities.For this we use two classical models: a gravity-type model and a radiationtype model (17; 19; 8; 5).In case of the gravity-type model, we consider the exposure between populations in two postal code areas, f g (i, j) = W ij , to be to dependent increasingly on the size of the two populations and decayingly on the distance between them, as follows where C is a constant, B is the parameter specific exponent, p i and p j are the population variables of the postal code areas i and j, and d ij is the "crow-flight" distance between the centroids of the land areas (excluding waters) of these postal areas, measured in meters.This is an assumption since the population is not evenly distributed and the real world distances can be longer due to the transportation infrastructure.As in literature we consider our gravity-type model symmetrical, i.e. the exposure or the individual mobility is the same from i to j as from j to i.
The second model we evaluate with the empirical networks (Eq.1)) is based on the radiation-type models, which in general are formulated as directed interactions.However, for the problem of this study we consider the interactions symmetrical as in the gravity-type model introduced above.The model is based on the notion of number of possible opportunities within the radius equal to the distance between two areas centered on the source area.Here the radiation model for exposure, f r (i, j) = W ij , is constructed using the following modified form of the standard model: where r di,j is the sum of population variable within the radius d i,j from the area i excluding the populations in areas i and j, and B is the parameter used for fitting the model to the data.The population variable under radius, r di,j , was obtained by calculating the fraction of the area under the radius d i,j for each postal code area and multiplying each population variable by the fraction of area covered.Here it is assumed that the population, services or workplaces are evenly distributed.We use this approximation due to lack of more detailed data.Prior to fitting the models, we filter out the postal code areas with less than 500 inhabitants, which from our empirical dataset removes only seven areas that are mostly uninhabited industrial zones.
Both the gravity and radiation type models are fitted to the weights of the edges of the empirical 7-day average networks and the yearly average networks with a linear regression utilizing ordinary least squares (OLS) for obtaining the parameter values.To use a linear regression model we change the models in logarithmic form such that the gravity type model is of the form while the radiation type model is of the form The statistics of the fitted exponents (B p , B p ,B d ,B r ) are shown in Table 1 and in Fig. 5 the values are visualized over the time period of our dataset.
After fitting and evaluating the gravity and radiation models, we investigate whether combining additional geospatial and population level variables to the gravity type model would give a better estimate of the edges of the network.These additional variables were chosen by evaluating the edges with largest error.The chosen variables are the population density, obtained from the socioeconomic Statistics Finland dataset (44), and a simple distance via public transportation (i.e.train and metro).The distance via public transportation was obtained by using the geographical coordinates of each train and metro station and their interconnections within the four municipalities.For each pair of postal code areas we calculate the distance from the centre of the area to the nearest station and then calculate the shortest path length via public transportation between the corresponding nearest stations.In the case both areas have the same nearest station, the distance is set as the distance between the two area centres.The final form of this extended model f e (i, j) is as follows where p is the population density and d is the distance via public transport.The extended model was fitted similarly to Eq. 4 and Eq. 5.

Results
In this section, we will construct the projected empirical networks, analyze them in terms of general attributes and their time-dependent structural changes as well as fit the above introduced three models to the data.In total, the dataset contains 1031 days of individual mobile activities in the original grid cells which we project to daily postal area level weighted exposure networks using the formula in Eq. 1.The aggregated networks were constructed such that each edge weight was the mean of the corresponding edge within the chosen time frame.The weekly averaged networks were constructed as 7-day consecutive bins and the yearly averaged networks are constructed using overlapping dates during the years 2019, 2020 and 2021 in the data (dates from 01.02.and 23.09.).With this binning we obtained 146 weekly networks and the yearly overlapping dates contain 219 days, which were used in the aggregated yearly networks.In Fig. 2 the total daily activity and the average edge weight of the pandemic years 2020 and 2021 are visualized in comparison to the same dates during pre-pandemic year 2019 as a moving average.Both these measures show similarities in terms of the overall difference to the pre-pandemic year.Both the seasonality and public holidays can be seen as peaks in the overall activity, whereas the same peaks are not as prominent in the average edge weight.Also worth noticing is the proportional difference, as the total activity at points exceeds the 2019 level, the average edge weight does not.Comparing the time series of the activity measure and the average edge weight to the average containment index from the NPI dataset (39) shows a negative correlation as would be expected if the population is compliant or epidemic aware.Interestingly the correlation for the average edge weight is stronger than for the total activity, as indicated by the correlation coefficients ≈ −0.76 and ≈ −0.56, respectively.The correlation coefficient between the same measures is ≈ 0.72.All correlation coefficients are shown in Table 2.The resulting aggregated yearly networks and the related weighted degree and edge weight distributions are shown in Fig. 4. As can be expected from the proportional comparisons in Fig. 2, the means of the distributions shift towards 0 during the pandemic.The highest weighted degree becomes also lower, i.e. ≈ 0.873 to ≈ 0.622 and ≈ 0.603).For clarity of visualization, the yearly networks are depicted using the four strongest edges for each node with the node width and colour representing the weight in Fig. 4. A visually noticeable difference is the decrease in the number of longer distance edges to the geographical center (central Helsinki) taking place after 2019.Utilizing the Clauset-Newman-Moore greedy modularity maximization (46) for clustering the postal code areas to communities in the aggregated yearly networks using the 95-percentile of the edges based on weight shows that the number of detected communities declines from 16 during pre-pandemic 2019 to 4 and 5 during pandemic 2020 and 2021.
Next we measure the average clustering coefficient and degree assortativity in the filtered 90-percentile graphs and perform a preliminary analysis for the models by calculating the Pearson correlation between the edge weights of the full graph and the corresponding population level variables as described in Eq. 2 and Eq. 3. The time series are shown in Fig. 3.The proportional difference in the weighted average clustering coefficient in Fig. 3A shows similar change as the average edge weight.The Pearson weighted degree assortativity, depicted in Fig. 3B, increases during the pandemic, which indicates that the edges in the filtered aggregate networks during the pandemic are more often found between postal code areas with similar weighted degree.This effect can also be seen in the network visualizations in Fig. 4, where the strongest edges are no longer between the central area and areas further from it.Also, this correlates with the notion of larger communities detected during the pandemic.The result remains uniform when replacing the edge weight with 1 in the filtered network.

Evaluating the models
For the performance evaluation of our three models we first fit them to the aggregated yearly networks with the empirical data.Then we perform the evaluation by calculating the errors using symmetric mean absolute percentage errors (SMAPE) and root mean squared error (RMSE): where n is the number of edges, W f is the predicted edge weight and W o is the observed edge weight.The results for the fitted exponents and errors are shown in Table 1.The predicted values obtained for the radiation, gravity, and extended gravity models are plotted against the observed values for the year 2019, depicted in Figures 5 (A), (B), and (C), respectively.As anticipated by the correlations to the unfitted models, the gravity model performs better than the radiation model in predicting the edge weights and the extended gravity model performs the best.Same can be seen in the models' R 2 values, which indicate the proportion of variance explained by the variables.All the three models have R 2 values over ≥ 0.5 in 2019 with interestingly an increase during the pandemic years.The difference in proportional error (SMAPE) between the three models remains consistent in all three aggregated yearly networks.The values of RMSE show that in the pre-pandemic year 2019 the radiation model shows the lowest error, but in the subsequent years the error shows an increasing trend for all three models.The error metrics over time for the weekly networks are shown in Fig. S2 The fits of the models in Fig. 5   After fitting and analyzing the yearly aggregated networks, we investigate the full duration of our dataset with the weekly aggregated networks.The fitted exponents over the the course of the dataset are shown in Fig. 5.The gravity model over time shows the two exponents B d and B p behaving similarly, while their relative difference changes.Similar shape can be seen in the case of radiation model but with more extreme relative changes.In the case of the extended gravity model the two additional variables show a negative relationship to exponents of the basic gravity model.The point in time when the exponent B p becomes positive can be seen to occur at the beginning of the pandemic.Comparing the fitted exponents to the average edge weight over time shows high correlations as can be expected.Consequently, most of the exponents share a high correlation to the NPI restriction index.The outliers in terms of the correlation strength is the added variable d in the extended gravity model.These correlations are listed in Table 2.In order to investigate the effects of particular types of NPIs, we fit the 8 sub-indices related to the restriction stringency in the OxCGRT dataset (39) to each of the fitted exponents (B p , B d , B p and B d) of our extended gravity model ( 6) using a similar OLS regression as before.The predicted values of each of these exponents are shown in Fig. 8 and the related coefficients in Table 3.The results show varying coefficients and p-values for each different NPI sub-index depending on the exponent they were fitted to.Across the four models fitted extended model's exponents, the stringency subindex Restrictions on gatherings" has a constantly low p-value and standard error.In case of school and workplace closures, restrictions on internal movement and cancellation of public events also show significant p-values (< 0.05).In addition, the closures on public transport have an effect on B d, as expected.
Finally, comparing the predicted networks to the corresponding observed networks reveals the lacking aspect of gravity and radiation type models in the context of networks.The previously discussed errors and model fits indicate that the models capture the aspect of exposure between postal code areas.This can also be seen when comparing the average edge weights of the predicted networks and observed ones, as the differences between the two values are of the order of 10 −14 .However, a difference arises when comparing the average weighted clustering coefficient and the average weighted degree assortativity coefficient.The model produces networks with higher assortativity and clustering than observed in the empirical networks.The points are visualized also in equally sized bins, depicting the mean (dot), the 25th percentile (box) and the 95th percentile (line connected to box).The colour of the box is green if the 95th percentile contains the perfect fit to the observed data (straight line).The rest of models fitted to aggregated yearly networks are shown Fig. 6 in Supplementary Material.(D-F) The fitted model exponents for the whole duration of the data with 7-day aggregated networks.The vertical line denotes the missing data between 24th September 2019 and 31st January 2020.

Discussion
In Finland the SARS-CoV-2 pandemic started in March 2020 having a visible effect on the activities of the population like their mobility, due to the government-imposed restrictions and self-imposed social distancing.This can be seen in Figure 2 as a reduction in the overall activity in the bipartite networks and as a significant reduction in the average edge   1: The fitted coefficients for the yearly average networks.The means are obtained by fitting the models for the full yearly averaged networks with postal code areas of population less than 500 being removed from the network.The number in brackets is the standard error for the coefficient.SMAPE and RMSE measure the error of the fitted model and lower number indicates a better fit.weight in the projected exposure networks during the pandemic.Notably, we can see that higher level of activity does not explicitly mean higher exposure between the postal code areas as implied by the Spearman correlation between the two values being ≈ 0.621 with p-value < 0.01.Comparing the overall activity and edge weight to the NPIs in the form of an index, we see that the average edge weight has a higher correlation to the NPI stringency index than the activity has (−0.824 and −0.668).High correlation to the NPI stringency index could be seen as high compliance to the restrictions as well as general awareness of the public, resulting in lesser exposure.Moreover, the lesser amount of fluctuations, seasonal or other, in the average edge weight and the related fitted exponents of the models hint that the underlying behaviour remains stable during "normal pre-pandemic times but changes once the restrictions are imposed.The changes in the network attributes such as assortativity depict the reorganization where instead of having more exposure to hubs of high weighted degree, the exposure is distributed more evenly across the postal code areas, as the assortativity shifts towards positive values.Similar trend can be seen in the weighted clustering coefficient, shifting to lower values when the pandemic sets on.
The proposed gravity and radiation type models are shown to be sufficient in estimating the exposure between postal code areas, with the gravity type model performing better than the radiation type model.As can be expected, the prediction error decreases once the gravity model is fitted with additional variables of population density, p, and public transport distance d i,j .In addition to changing exponent weights, the error increases the more restricted human mobility is (see Fig. 7).While the error values can be seen to increase, also the R 2 values show an increase, which we suspect being caused by a higher number of extreme outliers.Fitting the models over the full duration of our data shows that the exponents (Fig. 5) correlate with the average edge weight as well as the NPI index.These changes in the exponents can be interpreted as shifting weight between distance and population size.Interestingly, in the extended gravity model the exponents for population size and population density show an anti-phase like behaviour.In both gravity type models, the weight for longer distances decreases (see Table 1), which we also confirmed manually from the empirical data.The distances larger than the nearest vicinity show a comprehensive reduction and the longest distances remain at the low level.Of this finding the government imposed restrictions and guidelines such as social distancing and remote work could explain the lower weight for long distance edges as people eligible for remote working would not be commuting.Temporary closures of restaurants and other services can also have a similar effect as the number of trips decreases (47).The results in both projected networks and in the fitted model exponents showed a high correlation with the general NPI stringency index, and when fitting the series of exponents of the extended gravity model to the subcategories of restrictions we see that certain restrictions affect certain exponents more than others.The coarse model in itself provides an estimate where the extended gravity model's exponents would be given a certain set of restrictions.As mentioned before, using such exponents for the model yields a good estimate for the mean edge weight, but overestimates the network properties when compared to the observed networks.
The mobile phone location based dataset used in this study has two limitations that could cause some inaccuracies.Firstly, the data defines the home location of each person as the postal code area in which the person stayed longest before 9 am, i.e. the place the person woke up in the morning.This definition could misplace some groups of people like those staying in hotels, which could cause fluctuations in the number of activities per capita especially in the pre-pandemic times.Secondly, the additional protection of individual privacy by filtering activities with less than five individuals from a postal code area creates inaccuracies in the projected networks, which is why in fitting the models we filter out the postal code areas with population less than 500.This filtering removes seven postal code areas, including three industrial areas with population sizes of < 200, two sparsely populated areas with populations < 270, a hospital and a military area.
The design of this study considered an area-wise limited system of people living in 4 municipalities in capital yet most populated area of Finland.The limitation can cause inaccuracies with the radiation model in areas located in the outskirts of the investigated area, as the variable of people living within the radius (r di,j ) is not considering the populations outside of the system.For the radiation model, the values for r di,j were also calculated such that the distribution of population was assumed to be even across the postal code area.Another assumption was made for the population size data, as we only consider the numbers recorded in 2019.The changes between the beginning and the end of our dataset were not large enough to significantly change the results of the model.Due to this, we are not considering things such as influx of new inhabitants or outflow of people to other municipalities.A potential limitation and inaccuracy arises from calculation of distances between postal code areas.The assumption of having the geo-spatial center of each postal code area the point of calculation disregards the population density and distribution within the area, which results in some level of error in distances especially between postal code areas sharing borders.Also, in reality the distances between postal code areas are not the same as the calculated crow-flight distances, in which case a more realistic measure could be the travel time.Considering the travel time between locations would most likely improve the model, but such data would add to the model complexity as the travel time can vary due to e.g.road closures or constructions and public transportation changes.Moreover, an average travel time weighted with the means of travel available to the local population could improve the results.
In the future we aim to conduct further analysis on the empirical dataset in terms of the time series of exposure and the socioeconomic aspects of each postal code area.Also, a detailed analysis of the NPI effects is warranted (47).Intuitively, the method of constructing a weighted projection network based on data of daily visits to a set of locations and using the resulting links as a proxy for contact could be used in modelling spreading processes between the postal code areas with compartmental SEIRS-type models (48; 49).

Conclusions
Mobile phone based datasets have been shown to provide novel insights into human mobility and their social interactions.In this paper we have investigated the use of aggregated mobile phone location data as a source for gaining insight into the exposure between populations living in different postal code areas using co-occurring visits to mobile phone grid locations as a proxy.People visiting the same cellphone grid areas form a bipartite network, which we project using the Eq. 1 resulting in a fully connected network with edge weights between 0 and 1.The resulting empirical networks show the changes in exposure stemming from co-occurring activities between pre-pandemic and pandemic times shifting towards lower clustering, higher assortativity and lower weight for edges with longer distances between populations and to higher populated areas.These changes along with high correlation with the NPI-indices reflect the adherence to government imposed restrictions, remote work as well as self-imposed social distancing of aware individuals.We showed that the weights of the exposure can be modelled using gravity and radiation type models with population variables and distances between the postal code areas.The simplistic models yield promising results at the level of average edge weights, but are found to be lacking in reproducing the assortativity and clustering of the empirical networks.Lastly, we conducted a brief investigation on modelling exposure for the whole duration of our dataset using the different NPI sub-categories for calculating the exponents of our best performing gravity model, showing a varying degree of weight for the NPI sub-categories depending on the gravity model exponent.

Figure 1 :
Figure 1: Two examples of the weighted exposure network construction for three example postal code areas, red, blue and green.The first example consists of only one grid area and the second has 2 grid areas.Both examples assume no other activities.Each person represents one activity from the respective postal code area to the grid area connected by an arrow.The 3 nodes represent the corresponding projection with edge weights illustrated next to the edges and calculated using Eq. 1.

Figure 2 :
Figure 2: Proportional percentage difference of the total activity in the empirical bipartite networks (A) and of the average edge weight in projected exposure-networks (B) between the overlapping dates for the years 2020 and 2021 to the pre-pandemic year 2019 as 7-day moving averages.

Figure 3 : 7 -
Figure 3: 7-day moving averages of measures of the projected networks during the yearly overlapping dates in the dataset: (A) Average weighted clustering coefficient as proportional difference to the same time in the year 2019, (B) weighted degree assortativity correlation coefficient.The networks in both (A) and (B) are filtered by edge weight to have the 90-percentile of edges based on edge weight.
illustrate the error.All three models overestimate the weakest edges and underestimate the strongest edges with varying magnitude, the radiation model having the largest error.While the absolute values between fitted exponents of the same model are not comparable due to the different units in the variables, the proportional differences show changes in the aggregated networks.The exponents of gravity model indicates a larger significance on the exponent for population B p after 2019 as it increases in proportion to the distance exponent B d .The distance exponent shows the previously discussed weakening of longer distance edges by increasing with each year.The fitted exponents of the radiation model show a similar change with the increasing weight of B p .The exponent for population within radius, B r , increases but not in proportion to B p .When including the population density in the extended gravity model, the fitted population size exponent becomes negative in 2019 and increases during the pandemic while B p

Figure 4 :
Figure 4: The yearly projected networks with the mean edge weights for the years 2019 (A-C), 2020 (D-F) and 2021 (G-I).In sub-figures A, D and G the network is shown as 90th percentile of the edges filtered by edge weight.The width and colour depict the weight of the edge and the node size depicts the weighted degree.The histograms in sub-figures B, E and H depict the weighted degree distribution of each yearly network.The histogram has 20 bins and the red vertical line depicts the mean.The rightmost histogram depicts the distribution of edge weights.

Figure 5 :
Figure 5: Fitting of the aggregated networks to the 3 models.(A-C) The predicted model values plotted against the observed empirical values for the network of 2019.Each point denotes an edge in the network and the straight line indicates the perfect fit.Points closer to the line have smaller error and points above the line are overestimates and vice versa.The points are visualized also in equally sized bins, depicting the mean (dot), the 25th percentile (box) and the 95th percentile (line connected to box).The colour of the box is green if the 95th percentile contains the perfect fit to the observed data (straight line).The rest of models fitted to aggregated yearly networks are shown Fig. 6 in Supplementary Material.(D-F) The fitted model exponents for the whole duration of the data with 7-day aggregated networks.The vertical line denotes the missing data between 24th September 2019 and 31st January 2020.

Figure 7 :
Figure 7: Error metrics for the gravity, radiation and extended gravity models over the duration of the data set, fitted on the weekly networks.(A) RMSE and (B) SMAPE.Lower values indicate a smaller error.The vertical line denotes the empty period in the data between 24th September 2019 and 31st January 2020.

Figure 8 :
Figure 8: The predicted exponents of the extended gravity model from an OLS regression fitted on the 8 indices of non-pharmaceutical interventions.The dashed lines depict the prediction's 95% confidence interval.(A) is for population size exponent B p , (B) is population density B p, (C) is for distance via public transport B d and (D) is for distance B d .

Table 2 :
Correlation coefficients between weekly networks, NPI indices and fitted model parameters.