Spatiotemporal large-scale networks shaped by air mass movements

The movement of atmospheric air masses can be seen as a continuous and generally complex flow of gases and particles hovering over our planet. It can however be locally simplified by considering three-dimensional trajectories of air masses connecting distant areas of the globe during a given period of time. In this paper, we present a mathematical framework to construct spatial and spatiotemporal networks where the nodes are the subsets of a partition of a geographical area and the links between these nodes are inferred from sampled trajectories of air masses passing over and across the nodes. We propose different estimators of link intensities relying on different bio-physical hypotheses and covering adjustable time periods. This approach leads to a new class of spatiotemporal networks characterized by adjacency matrices giving, e.g., the probability of connection between distant areas during a chosen period of time. To illustrate the effectiveness of this approach, we applied it to characterize tropospheric connectivity in two real geographical contexts: the watersheds of the French region Provence-Alpes-C\^ote d'Azur and the coastline of the Mediterranean Sea. The analysis of the constructed networks allowed identifying a marked seasonal pattern in air mass movements in the two study areas. The networks constructed from air mass trajectories can be used to investigate issues, e.g., in aerobiology and epidemiology of airborne plant pathogens. Similar networks could be estimated from other types of trajectories, such as animal trajectories, to characterize connectivity between different components of the landscape where the animals live.


Introduction
Atmospheric air masses are volumes of air with a defined temperature and water vapor content that have long been known to rule fundamental atmospheric phenomena like weather and air currents. Their composition is mostly inert gases, but both organic and inorganic particles have been found to linger in high-altitude air as a consequence of the constant interaction of air masses with the earth's surface below Email addresses: maria.choufany@inra.fr (Maria Choufany), davide.martinetti@inra.fr (Davide Martinetti), rachid.senoussi@inra.fr (Rachid Senoussi), cindy.morris@inra.fr (Cindy E. Morris), samuel.soubeyrand@inra.fr (Samuel Soubeyrand) them. A non-exhaustive list includes gases and minerals like wildfire smoke, radioactive material, dust, sand, volcanic ash and sea salt, but also living organisms such as pollen, fungal spores, bacteria, virus and small insects. Despite the relative sparse density of these particles with respect to the volume of an air mass, their presence and transportation across the planet has proven to have strong effects on many phenomena impacting human health and safety (pollen (Mahura et al., 2007;Šauliene and Veriankaite, 2006;Bogawski et al., 2019), dust concentrations (Khaniabadi et al., 2017;Aciego et al., 2017), nuclear byproducts (Moroz et al., 2010;Rolph et al., 2014), human, animal and plant epidemics (Leyronas et al., 2018;Wang et al., 2010;Aylor, 1990;Mundt et al., 2009;Sadyś et al., 2014;Hiraoka et al., 2017), air pollution (Liu et al., 2018b,a;Talbi et al., 2018), and rainfall (Chen and Luo, 2018;Armon et al., 2018;Rabinowitz et al., 2019)).
The rise in the number of publications on these subjects suggests a growing interest of the scientific community on the effects of air-mass movements on the biosphere, that has surely been boosted by recent available developments, such as the Hybrid Single-Particle Lagrangian Integrated Trajectory model (HYSPLIT, Stein et al. (2015)), allowing reconstruction of actual air-mass movements at rather fine geographical and temporal scales and with a global cover.
The vast majority of studies focused on isolated events, such as dust storms or peaks of air pollutants, that are rather concentrated in time (from few hours to few weeks) and/or space (just a few locations such as cities). Nonetheless, the movement of air masses is expected to have impacts on a broader spatiotemporal scale, as reviewed in recent studies (Leyronas et al., 2018;Margosian et al., 2009). The purpose of the present paper is then to propose a mathematical framework for studying air-mass movements on large spatiotemporal scales, under the hypothesis that these movements can create stable and recurrent connections between distant portions of a territory. The very nature of these connections will be further specified throughout the manuscript, but as a general rule we will consider that any pair of points (or areas) in space can have a certain degree of connection, regardless of their geographic distance, provided that there are recurrent air-mass trajectories that connect the two points (or areas). The direction and strength of these connections will be estimated by looking at the trajectories linking every pair of points/areas and weighting them according to appropriate measures. In this perspective, it seems natural to resort to graph and network theory, since the formalism of nodes and edges provides an adequate environment for describing complex connections and can further be used to deepen into the topology of the constructed networks in order to infer interesting properties of the graphs, such as the presence of hubs.
From a generic statistical point of view, we aim to (i) estimate the weighted and directed edges of a graph using a sample of trajectories of individuals traveling through the space formed by the nodes of the graph, and (ii) characterize the estimated graph based on relevant statistics.
In the following sections, we first introduce the definitions and properties that will allow us to describe and then estimate connections between points/areas in space via spatiotemporal trajectories. Then, we propose several types of measures to model diverse types of connections. The expected output consists of a spatiotemporal graph describing the network of links induced by trajectories. It's worth noting that our approach is meant to infer connectivity induced by air-mass movements and it is readily applicable to HYSPLIT-type data, but we have maintained a sufficient level of generality to be applied to other phenomena, provided that trajectory data are available (e.g. animal trajectories). Finally, we apply our method to two case studies concerning the coastline of the Mediterranean sea and the French region of Provence-Alpes-Côte d'Azur. The two case studies have different spatiotemporal granularities and they will be used to provide examples of application of the proposed methodology.

Framework for the definition of trajectory-based networks
In this section we show how a set of trajectories evolving within space during a finite time interval can be used to construct pertinent spatiotemporal networks. We first recall some basic definitions related to networks (Section 2.1) and then propose a statistical methodology to infer the network structure from a data set of trajectories (Section 2).

Network theory
Network theory (a.k.a. graph theory) is a mathematical formalism introduced by Leonhard Euler to describe the famous Königsberg bridge problem (Newman, 2003;Strogatz, 2001;West et al., 1996). The two basic components of a network are a set of nodes linked by a set of edges. Nodes can represent a variety of things, such as persons, regions, computers, neurons, etc., while edges are used to describe the connections between those nodes. Formally, a network G " pV, Eq is defined as a set of nodes (or vertices) V " tv 1 , v 2 , . . . , v N u connected by a set of edges E " te ij u i,jPt1,...,N u . A natural way of representing a network is given by means of a NˆN square matrix M , usually referred to as an adjacency matrix, whose term pi, jq, M ij , is non-zero an edge exists between i and j. By convention, adjacency matrices are defined to have an empty diagonal (i.e. M ii " 0, i P t1, . . . , N u), meaning that nodes cannot be self connected. If M is symmetrical (i.e. M ij " M ji , i, j P t1, . . . , N u), then the network is said to be undirected, and directed otherwise. If M ij P t0, 1u, the network is said to be binary, meaning that an edge between two nodes i and j either exists or does not. Otherwise, if M ij P R, the network is said to be weighted, meaning that the edge between nodes i and j are more or less connected.
In this paper, a network is said to be spatial (Barthélemy, 2014) when nodes correspond to geographic locations, while we use the term temporal (Holme and Saramäki, 2012) to refer to networks where edge values can change over time. Finally, we will use the term spatiotemporal network to refer to network that are simultaneously spatial and temporal, under the constraint that nodes cannot change position, neither appear nor disappear over time. The networks considered in this paper also fall into the rather generic definition of spatiotemporal networks. If the spatial qualifier means that the nodes of the networks represent fixed geographical locations, the temporal qualifier is more complex. Indeed, temporal networks are generally divided in the literature into two main classes, namely contact graphs or interval graphs (Holme and Saramäki, 2012). The former type refers to networks where edges represents instantaneous contacts between nodes (Figure 1(a)), while in the second type edges are active over time intervals instead of instants of time (Figure 1(b)). In this paper we propose a new definition of spatiotemporal networks where nodes correspond to disjoint regions of the space and edges are computed as a function of the flow of trajectories linking these nodes (Figure 1(c)), as it will be explained in the rest of the current section.

Flows and trajectory segments
We consider a function Φ : RˆRˆΩ Ñ Ω, usually called flow on the spatial domain Ω of R d , satisfying the following properties: where s, t, t 1 P R and x P Ω. For fixed t and s, the flow Φpt, s,¨q is a spatial transformation. For fixed x and varying s or t, the function gives a forward or backward trajectory of a particle over Ω between times t^s " infpt, sq and t _ s " suppt, sq. If s ď t, y " Φpt, s, xq gives the future location at time t of the particle presently located at x at time s. Contrarily, if s ě t, y " Φpt, s, xq gives the location at past time t that was occupied by the particle located at x at present time s. Φpt, s, .q is assumed to be a bijective mapping meaning that particles following distinct trajectories cannot be at the same location at the same time.
In general, a flow is defined with respect to a possibly time-dependent vector field F over RˆΩ, as the solution u : R Ñ Ω of an ordinary differential equation (see e.g. Hamilton's equations in classical mechanics) with specified initial condition at a specified time s: where F is continuous and Lipschitzian over RˆΩ. In the setting introduced above, Φpt, s, xq " uptq with Φps, s, xq=u(s)=x. The solution u represents the trajectory of the particle located at x at time s.
Varying the initial condition in System (2), i.e. varying s and x, leads to consider pieces of trajectories of all particles which dynamics are governed by the vector field F . In this article, the vector field F will not be made explicit, but we will consider samples of trajectory segments (defined below) for constructing trajectory-based networks.
Definition 2.1. The trajectory segment associated to the flow Φ over the time interval ∆ ts " rt^s, t _ ss, s, t P R, for a particle located at x P Ω at time s is defined as follows: Γpt, s, xq " tpt 1 , Φpt 1 , s, xqq : t 1 P ∆ ts u.
If s ă t (resp. s ą t), Γpt, s, xq is a forward (resp. backward) trajectory segment. In this article, we are mainly interested in backward trajectories, but the framework presented here encompasses forward trajectories as well.
Example 2.1. The notions of flow and trajectory segment can be adapted to cope with air mass trajectories over the Earth surface. In this case, the spatial domain Ω representing the Earth surface is the sphere S 2 in R 3 . If in addition, air masses are characterized by altitude and temperature evolving in space and time, then Ω " S 2ˆR`ˆR , where R`(resp. R) is the domain of the altitude (resp. temperature) coordinate.
Example 2.2. Animal movements and behaviour activities can also be represented with the notions of flows and trajectory segments, providing, for instance, the animal locations and the covariate value indicating whether animals are feeding or not. In this case, Ω " R 2ˆt 0, 1u, where 1 stands for 'the animal is feeding' and 0 otherwise. The use of a binary variable for describing the feeding activity may require the use of stochastic processes or generalized functions undergoing dynamic analog to the System (2) for constructing the flow if it is defined with respect to a vector field F .

Pointwise and integrated connectivities
Trajectory-based networks are grounded on the notion of connectivity used as a quantitative, directed measurement of edges between graph nodes. In this aim, we first define the pointwise connectivity as a measure (or submeasure), in the mathematical sense, of the connectivity between a subset A and a point x of Ω induced by the trajectory segments Γpt, s, xq of a particle located at x at time s. Then, we use the pointwise connectivity to define the integrated connectivity between two subsets A and B of Ω over a temporal domain ∆ of R (∆ can be the union of disjoint intervals).  Figure 2. Most examples are particularly relevant when Ω is a simple geographic domain and when Φ defines movements of individuals (e.g., air masses, animals or particles) within Ω.
Example 2.3. The contact-based pointwise connectivity is defined by: where A ts " ∆ tsˆA and 1 denotes the indicator function. Ψ C pA | t, s, xq indicates whether or not the particle whose movement in Ω is governed by Φp¨, s, xq hit A during the time interval ∆ ts . Note that for disjoint sets A and A 1 of BpΩq.

Remark 1.
This example based on the simple contact between sets can be considered as too strict from a statistical and measure-theory perspective since the length or the duration of a contact may be null.
Instead, a positive constraint on contact length for example can be used to define another version of the contact-based pointwise connectivity: Equation (4) could then be replaced by where LpA ts X Γpt, s, xqq denotes the length of the curve Γ within A. The length operator L will be made explicit in Example 2.5.
Example 2.4. The duration-based pointwise connectivity is defined by: to measure the duration spent by the particle in A during ∆ ts .
Example 2.5. The length-based pointwise connectivity is defined by: where ∇ v Φpv, s, xq stands for the gradient of the flow Φ with respect to the time variable v and ||¨|| denotes the Euclidean norm. Ψ L pA | t, s, xq measures the distance travelled within A by the particle during ∆ ts .
Example 2.6. The pointwise connectivity based on local volume is defined by: where detpJ x Φ pv, sqq is the determinant of the Jacobian matrix (with respect to x) of the spatial transformation Φpv, s,¨q. The absolute value | detpJ x Φ pv, sq| of the Jacobian determinant at x gives the ratio by which the function Φpv, s,¨q expands/shrinks infinitesimal volumes around location x into infinitesimal volumes around location Φpv, s, xq.
In other words, Ψ V pA | t, s, xq assesses how particle density increases or decreases from x to A along the time interval ∆ ts . Intuitively, if n particles are initially in A and if the infinitesimal volume around any of these particles tends to shrink from A to x, then one expects a high concentration of particles in a fixed volume around x and, therefore, a high connectivity from A to x. Conversely, if the infinitesimal volume around a particle tends to expand from A to x, then one expects a lower concentration of particles in the same fixed volume around x and, therefore, a lower connectivity from A to x.
More sophisticated specifications of the pointwise connectivity can be proposed by incorporating spatio-temporal covariates in its formulation, like in the following examples.
Example 2.7. Let G denote a time-varying vector field defined over RˆΩ. The pointwise connectivity based on the external vector field G is defined by: where ă ∇ v Φpv, s, xq, Gpv, Φpv, s, xqq ą is the scalar product between the gradient with respect to the time variable v of the flow Φ and the vector field G. Larger the average collinearity in A between the instantaneous movement of the particle and the simultaneous direction of the vector field G, higher the connectivity between A and x. For instance, if Φ gives the movement of air masses and G provides the intensity and the direction of a continuous release of specific particles, then the connectivity will be high (resp. low) if the movement of the air in A and the movement of particles released in A are approximately collinear (resp. orthogonal).
Example 2.8. Let Z andZ be positive real valued spatio-temporal functions defined over RˆΩ. The pointwise connectivity based on Z andZ is defined by: This form of pointwise connectivity may represent, for example, (i) the negative effect of the altitude of the air mass when it is above A on the recruitment of specific particles from the ground, and (ii) the positive effect of rainfall at ps, xq on the deposition of particles from the air mass to the ground (see Figure 3). Thus, lower the average altitude of the air mass above A and more intense the rainfall at ps, xq, larger the contribution to the connectivity between A and x. This is expressed in Equation (8) as follows: (i)Z is defined as the binary function indicating whether or not the altitude of the air mass (located at x at time s) is lower than a threshold h when it is located at Φpv, s, xq at time v; (ii) Z is a function of the local rainfall intensity at ps, xq.
Remark 2. If in Example 2.8, the altitude of the air mass is incorporated as the third coordinate of Φ and A is a 3D-domain vertically limited by the threshold value h, then, Equation (8) is simplify reduced to Equation (5).
Remark 3. Example 2.8 could be generalized by considering a measure, say µ, over R, to handle the potential contribution of discrete-time events to the pointwise connectivity: Remark 4. In the same vein, Example 2.8 can also be modified by adding within the integral the term ||∇ v Φpv, s, xq|| arising in Equation (6) where δ P R and ν is a measure on RˆΩ.

Trajectory-based network
Definition 2.4. A trajectory-based network generated by Ψ p2q ν,δ (given by Equation (10)) over the temporal domain T Ă R, is a graph whose nodes A i , i " t1, .., Iu, are disjoint sets of Ω in BpΩq and whose directed edges are weighted by integrated connectivities M ij " Ψ p2q ν,δ pA iˆAj | T q, 1 ď i, j ď I and i ‰ j.
Definition 2.4 corresponds to a spatial trajectory-based network evaluated over the fixed temporal domain T . It can be extended in different ways to obtain spatiotemporal analogs. For example, if T 1 , . . . , T K denote K disjoint but successive time intervals with equal lengths, then the sequence of trajectory-based networks generated by Ψ p2q ν,δ p¨ˆ¨| T k q, k " 1, . . . , K, forms a spatiotemporal trajectorybased network that can be analyzed to assess how connectivities across space are changing with time. This is one of the issues considered in Section 4.2. Ψ C pA | t, s, xq " 1 for the two other curves; Ψ D pA | t, s, xq is larger for the middle curve than for the right one, whereas Ψ L pA | t, s, xq is larger for the right curve than for the middle one. In this illustration, the passage of the particle in the elliptic spatial domain A contributes to the pointwise connectivity (red part of the curve in panel (a)) only when the particle is at an altitude lower than a threshold value (grey part of the curve in panel (b)).

Estimation of integrated connectivities
In practice, the integral defining the integrated connectivities between subsets of Ω (Definition 2.3) cannot be analytically computed in general, but can be estimated from a sample of trajectories. For instance, the integrated connectivity Ψ p2q ν,δ pBˆA | T q can be estimated by its empirical counterpart obtained by importance sampling, say p Ψ p2q ν,δ pBˆA | T q: where s 1 , . . . , s N P T and b 1 , . . . , b N 1 P B are times and locations, respectively, randomly drawn under the measure ν restricted to TˆB, and ΨpA | s k`δ , s k , b l q is the pointwise connectivity associated to the trajectory of the particle located at b l at time s k and observed over ∆ s k`δ ,s k " rs k^sk`δ , s k _ s k`δ s.
If ν is constant, other classical numerical approaches can be applied to approximate the integral, such as an hybrid approach in which the mid-point rule is applied in time and a regular point process is used in space. In such a case, the integrated connectivity estimator is also given by Equation (11).
Example 3.1. Using Equation (11), the contact-based pointwise connectivity in Example 2.3 is estimated by: where A s k`δ ,s k " ∆ s k`δ ,s kˆA . Thus, p Ψ p2q C,δ pBˆA | T q is simply the proportion of sampled trajectories intersecting A. (11), the duration-based pointwise connectivity in Example 2.4 is estimated simply by the average duration of the intersections between the sampled trajectories and A. Example 3.3. Using Equation (11), the length-based pointwise connectivity in Example 2.5 is estimated simply by the average length of the intersections between the sampled trajectories and A.

Applications
In this section, we applied our general framework to the flow of air mass movements. Indeed, these movements compiled over years were used to characterize climatic patterns (Hondula et al., 2010) and to describe the transport of pollutants (Pérez et al., 2015). We show now how to deploy our approach for constructing air-mass movement networks in two real geographical contexts, namely the coastline of the Mediterranean Sea and the French region of Provence-Alpes-Côte d'Azur. These two examples have been chosen in order to prove the flexibility of our approach to different situations and geographical scales.

Case study regions and network construction
The first study region corresponds to the coast of the Mediterranean Sea, ranging approximately 1,600 km from north to south and 3,860 km from east to west. The temperate climate of the chosen region is strongly influenced by the presence of the Mediterranean Sea, with mild winters, hot summers and relatively scarce precipitations events. The landscape is characterized by coastal vegetation, typically shrubs and pines, and densely populated areas with intensive crop production of wheat, barley, vegetables and fruits, especially olive, grapes and citrus. In this paper, we characterize recurrent movements of air masses through the Mediterranean region by defining a grid with mesh size 74 km covering the coastline from 5 km up to 250 km inland from the coast, including the four largest islands (namely Sicily, Sardinia, Cyprus and Corsica). Thus, we divide the region into 604 cells, where the centroids of the cells will be used as arrival locations of air-mass trajectories and will correspond to the nodes of the constructed network.
The second study region corresponds to the French region of Provence-Alpes-Côte d'Azur (PACA, hereafter), located in the south-eastern part of France and characterized by a rather complex landscape formed by a densely-populated coastline, agricultural lands (high-value-crops with fruit and olive orchards, vineyards, vegetable cultivation and horticulture), and natural mostly-alpines areas. The choice of this particular region is justified in the context of a research project aimed at assessing the potential longdistance dissemination of phytopathogenic bacterial populations that are known to be transported by air currents. The bacteria of interest (e.g., Pseudomonas syringae) can be lifted in to the air from a source location and then be passively transported by air masses until they are deposited back to land onto a different, far away sink location. Since the life cycles of the considered species of bacteria are strongly linked to the water cycle (Morris et al., 2008), we naturally partitioned the study area in a way that fit this assumption. Hence, we considered the 294 watersheds of the PACA region to define the sites that will later constitute the nodes of the constructed network. Since watersheds have irregular shapes and varying sizes, we selected a certain number of arrival locations per watershed (between 1 and 10 and proportionally to the watershed area) in order to cover the watersheds consistently according to the relative importance of their size and estimate the integrated connectivities. In total, a set of 833 arrival locations for air-mass trajectories was generated.
Once the arrival points for the two study regions have been established, we turned to the computation of air-mass trajectories arriving at the prescribed locations using the Hybrid Single-Particle Lagrangian Integrated Trajectory model (HYSPLIT, (Stein et al., 2015)). The HYSPLIT model can be fed with meteorological data from the Global Data Assimilation System files with a 0.5-degree spatial resolution (GDAS 1 ) and was tuned by us to return 48-hours backward air-mass trajectories arriving at the pre-  Leyronas et al. (2018) In this work we will consider networks corresponding to three temporal contexts: (i) the spatial networks obtained when T is the entire period 2011-2017, (ii) the yearly spatiotemporal networks formed by the seven spatial networks obtained when T 1 encompasses the year 2011, T 2 encompasses 2012 and so on, and (iii) monthly spatiotemporal networks formed by the twelve spatial networks obtained when T 1 represents every January from 2011 to 2017, T 2 every February from 2011 to 2017, and so on.

Network analysis
The constructed networks are directed and weighted by contact-based connectivities generated by air mass trajectories. They are inherently complex by the sheer amount of spatial and temporal information that they encompass. Hence, there is no easy way of representing the results either graphically or numerically, without compromising the original complexity of the networks. While a comprehensive physical study of the spatiotemporal properties of these networks goes beyond the scope of the paper, we explore the estimated trajectory-based networks by looking at some generic properties through the 1 https://www.ncdc.noaa.gov/data-access/model-data/model-datasets/global-data-assimilation-system-gdas following indices: • Diameter: the longest of all possible shortest paths between any two pair of nodes.
• Density: the ratio between the sum of all edge weights and the number of all possible edges (Liu et al., 2009).
• Transitivity (also known as clustering): the equivalent definition of density, but applied to triplets of nodes instead of pairs of nodes (Opsahl and Panzarasa, 2009).
• Shortest path: characterized by the average and standard deviation of the computed shortest path between any possible pair of different nodes (Newman, 2001).
• Small worldness: refers to the property of a network of being highly clustered and having relatively short shortest paths. It is computed as the ratio between the normalized clustering and the normalized average shortest path distance (Li et al., 2007;Colon-Perez et al., 2016).
• Scale-free property: The degree of a node in terms of the total number of edges entering and exiting from it, and for directed networks it can be decomposed in the incoming and outgoing degree, respectively. The degree distribution is the empirical distribution of the degree of a network and it said to be scale free when it approximately follows a power law distribution, i.e. P pkq " k p´αq , where P pkq represents the probability of a node having degree equal to k (Barabási and Bonabeau, 2003;Barabási and Albert, 1999). Some authors impose that the α parameter of the power law distribution has to fall within the interval r2, 3s (Barabási et al., 2016). Thus, a network is scale free when most of its nodes have low degree, while the probability of having nodes with very high degree is not negligible (fat right tail of the distribution). Nodes with very high degree play a crucial role in dynamics conditional on networks and are often referred as hubs (Liu et al., 2011).
• Degree correlation: in directed networks, it accounts for the correlation between the incoming and the outgoing degree of a node. Networks with positive (resp. negative) degree correlation foster (resp. hamper) epidemic spread (Pautasso et al., 2010).  Figure   4. In order to highlight those edges that represent strong connections, we depicted them with darker shades of color, while we did not draw the connections that had a weight of less than 0.3. It can be seen that the strongest connections tend to link nodes that are geographically close, but nonetheless moderate connections also exist between rather distant nodes. This is confirmed by small values of the average shortest path distance 8.20ˆ10´4 for the Mediterranean and 2.57ˆ10´4 for PACA, and high values of the transitivity index (0.74 for the Mediterranean region and 0.99 for PACA), as shown in the first lines of Tables 1 and 2. An interesting difference between the two networks is that the one for PACA has a very negative degree correlation (´0.85), meaning that nodes having a high incoming degree will have low outgoing degree, and vice versa. On the other hand, for the Mediterranean network, the value of the degree correlation is moderately positive (0.31), meaning that nodes having high a incoming degree tend to have also high a outgoing degree.

Mediterranean region
Diam Dens Trans S P (mean) S P (sd) S W S F D C  Qualitatively, the indices provided in Tables 1 and 2 are overall more variable for the monthly spatiotemporal trajectory-based networks than for the yearly ones. Thus, focusing in what follows the monthly network, we investigate possible seasonal patterns by using the complete-linkage hierarchical clustering method (Ferreira and Hitchcock, 2009). We applied the clustering using the Euclidean distance over the 8-dimensional space formed by the 8 indices provided in Tables 1 and 2. For the Mediterranean region, the dendogram in Figure 5

Discussion
We presented a framework for estimating and characterizing spatial and spatio-temporal networks generated by trajectory data. The development of this framework was motivated by the study of networks resulting from the movement of air masses sampled over long time periods and large spatial scales. Thus, in the application, we investigated the tropospheric connectivities across the Mediterranean basin and the French region PACA, and their variations through years and months. Our approach could be applied to diverse phenomena, from which trajectories can be observed. For instance, one could estimate networks generated by the movement of animals on the landscape scale based on animal trajectories observed with GPS devices (Bastille-Rousseau et al., 2018). This would allow the characterization of connectivity between different landscape components. Sampled trajectories of humans, sampled transports of specific goods (such as plant material) and sampled trajectories of knowledge in social communities (that cannot be exhaustively observed) could also be used to estimate networks in other applied settings.
In Section 2.3, we proposed diverse measures of connectivity with different underlying (physical or biological) interpretations. Thus, the analyst can adapt the connectivity measure to the mechanistic processes he investigates. In the application section, we only used the contact-based connectivity. Comparisons of contact-based, length-based and duration-based connectivities, not shown in this manuscript, led to little variations for the two case studies considered in this article. However, the use of covariates such as local rainfall and air-mass altitude for defining connectivities, as proposed in Section 2.3, is expected to potentially impact the inferred networks and deserves to be explored. This perspective would be particularly relevant in the context of aerobiology: e.g., the airborne transport of organic particles, such as bacteria and fungal spores, can be influenced by rainfall favoring the deposition of these particles (Morris et al., 2017).
In statistics, we are not only interested in point estimation, but also in the assessment of estimation uncertainties. In this paper, we however, focused on connectivity estimation, even if quantifying the estimation variance could have been useful for more rigorously investigating temporal variation in connectivities. Formally, the connectivity measures that we defined are integrals. Hence, results on integral numerical approximations (e.g., midpoint, trapezoidal or Monte Carlo integration) can be exploited to assess errors or variances of the connectivity estimates (Davis and Rabinowitz, 2007;Caflisch, 1998;Geweke, 1996). However, for this assessment, one should ideally take into account dependencies between connectivity estimates for different pairs of nodes, which is not trivial. Further in-depth methodological developments are required to tackle this issue.
To more finely estimate connectivity, and its uncertainty, one could also take into account, if relevant, the uncertainty about the trajectories themselves. For example, when observed trajectories are smoothed versions of actual trajectories (as it is likely the case for air-mass trajectories calculated with HYSPLIT) or when the trajectories are partially observed and rather erratic, (i) a probabilistic model grounded on, for instance, a stochastic differential equation, could be used to reconstruct probable trajectories and (ii) the connectivity would be estimated from these reconstructed trajectories. Obviously, step (ii) should incorporate the uncertainty about the trajectory reconstruction impacted by an eventual preliminary step consisting in estimating the parameters of the above-mentioned probabilistic model.
Concerning the application treated in this article, we observed distinct seasonal patterns in the temporal variation of the networks covering the Mediterranean coastline and PACA. In the former case, the networks corresponding to the four clusters shown in Figure 5(b-e) exhibit different spatial patterns of hubs (in terms of location and size) and different trends in the main connectivity directions. In the latter case, the differences between the three networks identified with the clustering approach are mostly related to connectivity amplitude. It would be interesting to explore whether this observation made at two very different spatial scales and resolutions generally holds by studying regions of size similar to PACA all along the Mediterranean coastline.
In the long-term context of our applied research projects connected to aerobiology, the construction and exploration of networks generated by air-mass movements are a way to unravel epidemiological dynamics (and the resulting genetic patterns) of microbial pathogens disseminated at long distance via air movements in the troposphere (see Leyronas et al., 2018, for a proof of concept). Indeed, even if the pathogen is not explicitly taken into account by the framework proposed in this article, the description of connectivities that it offers provides us a proxy of airborne pathogen movements over long temporal terms and large spatial scales. This proxy is a mean to understand pathogen transportation and to anticipate its long distance dissemination. Specifically, network indices such as those calculated in this article can be associated with particular epidemiological properties such as the probability of long-distance transport of pathogens (Moslonka-Lefebvre et al., 2011;Jeger et al., 2007;Pautasso and Jeger, 2014). For instance, for plant pathogens, recent studies (Nicolaisen et al., 2017;Bowers et al., 2013;Aho et al., 2019) showed that airborne populations of bacteria and fungi are rather constant across the years, while higher diversity can be observed in different seasons. This statement resonates with our analyses where we observed clear seasonal signals in the estimated monthly spatiotemporal networks in Section 4.3 whereas the yearly signals were less obvious.
Finally, the networks estimated using our approach could be a basis for developing epidemiological models (explicitly handling the pathogen) incorporating long-distance dissemination conditional on recurrent air-mass movements. Such models could be exploited to set up surveillance strategies for early warning and epidemic anticipation in order to help reduce the impacts of airborne pathogens on human health, agricultural production and ecosystem functioning (Mundt et al., 2009).