A Framework for Characterizing the Multilateral and Directional Interaction Relationships Between PM Pollution at City Scale: A Case Study of 29 Cities in East China, South Korea and Japan

Transboundary particulate matter (PM) pollution has become an increasingly significant public health issue around the world due to its impacts on human health. However, transboundary PM pollution is difficult to address because it usually travels across multiple urban jurisdictional boundaries with varying transportation directions at different times, therefore posing a challenge for urban managers to figure out who is potentially polluting whose air and how PM pollution in adjacent cities interact with each other. This study proposes a statistical analysis framework for characterizing directional interaction relationships between PM pollution in cities. Compared with chemical transport models (CTMs) and chemical composition analysis method, the proposed framework requires less data and less time, and is easy to implement and able to reveal directional interaction relationships between PM pollution in multiple cities in a quick and computationally inexpensive way. In order to demonstrate the application of the framework, this study applied the framework to analyze the interaction relationships between PM2.5 pollution in 29 cities in East China, South Korea and Japan using one year of hourly PM2.5 measurement data in 2018. The results show that the framework is able to reveal the significant multilateral and directional interaction relationships between PM2.5 pollution in the 29 cities in Northeast Asia. The analysis results of the case study show that the PM2.5 pollution in China, South Korea and Japan are linked with each other, and the interaction relationships are mutual. This study further evaluated the framework's validity by comparing the analysis results against the wind vector data, the back trajectory data, as well as the results extracted from existing literature that adopted CTMs to study the interaction relationships between PM pollution in Northeast Asia. The comparisons show that the analysis results produced by the framework are consistent with the wind vector data, the back trajectory data as well as the results using CTMs. The proposed framework provides an alternative for exploring transportation pathways and patterns of transboundary PM pollution between cities when CTMs and chemical composition analysis would be too demanding or impossible to implement.

Transboundary particulate matter (PM) pollution has become an increasingly significant public health issue around the world due to its impacts on human health. However, transboundary PM pollution is difficult to address because it usually travels across multiple urban jurisdictional boundaries with varying transportation directions at different times, therefore posing a challenge for urban managers to figure out who is potentially polluting whose air and how PM pollution in adjacent cities interact with each other. This study proposes a statistical analysis framework for characterizing directional interaction relationships between PM pollution in cities. Compared with chemical transport models (CTMs) and chemical composition analysis method, the proposed framework requires less data and less time, and is easy to implement and able to reveal directional interaction relationships between PM pollution in multiple cities in a quick and computationally inexpensive way. In order to demonstrate the application of the framework, this study applied the framework to analyze the interaction relationships between PM 2.5 pollution in 29 cities in East China, South Korea and Japan using one year of hourly PM 2.5 measurement data in 2018. The results show that the framework is able to reveal the significant multilateral and directional interaction relationships between PM 2.5 pollution in the 29 cities in Northeast Asia. The analysis results of the case study show that the PM 2.5 pollution in China, South Korea and Japan are linked with each other, and the interaction relationships are mutual. This study further evaluated the framework's validity by comparing the analysis results against the wind vector data, the back trajectory data, as well as the results extracted from existing literature that adopted CTMs to study the interaction relationships between PM pollution in Northeast Asia. The comparisons show that the analysis results produced by the framework are consistent with the wind vector data, the back trajectory data as well as the results using CTMs. The proposed framework provides an alternative for exploring INTRODUCTION Transboundary particulate matter (PM) pollution has become an increasingly significant public health issue around the world (1)(2)(3). This is because, on one hand, transboundary PM pollution has severe negative impacts on human health due to its associations with respiratory diseases, cardiovascular diseases, birth defects, etc. (4)(5)(6)(7), raising considerable public concerns over health and public pressure for government authorities to take actions. On the other hand, transboundary PM pollution is difficult to address, as transboundary PM pollution usually travels across multiple urban jurisdictional boundaries with varying transportation directions at different times driven by synoptic air movement and complex meteorological conditions (1). In other words, PM pollution in a city may interact with PM pollution in adjacent cities, therefore posing a challenge for urban managers in the city to figure out who is potentially polluting whose air, how PM pollution in adjacent cities interact with each other, and with whom they should cooperate in tackling transboundary PM pollution.
Obviously, the interaction relationships between PM pollution in adjacent cities must be examined before taking air pollution mitigation plans and policy measures. An investigation of the interaction relationships between PM pollution in adjacent cities could allow for a better understanding of the transportation and patterns of transboundary PM pollution, which has important implications for air pollution exposome research. Moreover, such an investigation could help inform the formulation of crossjurisdictional mitigation plans and policy measures. To this end, scholars have developed various approaches suitable for examining the interaction relationships between PM pollution in cities at the city scale (8)(9)(10). Generally, these approaches can be divided into two groups. One group is the mechanistic modeling approaches that utilize the mechanism on the physical and chemical processes of air pollutions over time and space to examine the relationships between air pollution in different cities. The other is the statistical modeling approaches that attempt to use statistical methods to explore the relationships without considering the detailed mechanisms.
A typical example of the mechanistic modeling approaches is the Eulerian Chemical Transport Model (CTM incorporates a variety of physical schemes and chemical mechanisms to describe the physical and chemical processes of air pollutants over time, including the processes of pollutant emission, transport, chemical transformation, and deposition. Some implementations of the CTMs including Weather Research and Forecasting model coupled with Chemistry (WRF-Chem) (8) and GEOS-Chem (9). These CTMs usually use a Eulerian grid model based on a fixed longitude/latitude coordinate system to describe the space and location. In order to enable the CTMs to simulate the processes of pollutant emission, transport, chemical transformation, and deposition, the CTMs usually require a large amount of spatial data to drive the simulation of various processes. Some of the essential data include emission inventory data to drive the process of pollutant emission, meteorological data to simulate the change of meteorological conditions (e.g., atmospheric pressure, temperature, wind speed and direction, etc.) and drive the process of pollutant transportation (11,12). The ability of the CTMs in capturing the physical and chemical processes of air pollutants over time and space allows the models to directly link air pollution in the source city to air pollution in the receptor city, thus providing a quantitative and causal explanation of the directional interaction relationships between air pollution in multiple cities.
Although CTMs are able to provide high-quality characterization of interaction relationships between air pollution in multiple cities, the CTMs are cumbersome, demanding, and sometimes impossible to implement. First, the CTMs require a large amount of high-quality spatial data, and these data are usually not available in underdeveloped areas. Even in developed regions, the quality of data heavily affects the accuracy of analysis results and may cause huge uncertainties (6,11,(13)(14)(15)(16). Second, the execution of the models is costly both in terms of finance and time. CTMs are usually run in expensive high-performance computing clusters, and it usually takes months to complete a typical CTM run. Moreover, the technical complexity and difficulty of running CTMs are high (1). Generally, only experts who are trained in numerical simulation in the field of atmospheric science are capable to configure, debug and run a CTM.
The statistical modeling approaches do not rely on the detailed mechanism of physical and chemical processes of air pollutants. This group of approaches usually infer the interaction relationships based on assumptions. In other words, this group of approaches can only indicate association (or correlation), but not causation. Chemical composition analysis, for example, is developed based on the assumption that the chemical composition of air pollutants is unique in different places, and the unique chemical composition of air pollutants represents the unique identity of that place. The more similar the chemical composition of air pollutants in one city to the chemical composition of air pollutants in the other city, the more likely the air pollution in the two cities were associated with each other (17). For example, the ratio of two isotopes ( 206 Pb/ 207 Pb) was used to infer the interaction relationship between Pb deposition in Singapore and other countries in Southeast Asia (18). The method of chemical composition analysis is not able to determine the direction of the interaction relationship. Fortunately, with the help of radiometric dating used in paleoenvironmental studies (19), the time of samples can be estimated, which ascertains the chronology and further determines the direction of the interaction relationship.
Chemical composition analysis is a relatively convincing statistical tool for inferring interaction relationships between air pollution in cities. However, the assumption on the uniqueness of the chemical composition of air pollutants across different cities is doubtful because the chemical composition of air pollutants in a place changes over time (20). Moreover, this method requires laborious sample collection processes in various sampling points. The accuracy of chemical composition analysis depends on the number of samples. Furthermore, the spatial extent of chemical composition analysis is strongly limited by the locations of the sampling sites (1).
In summary, existing methods are able to examine the interaction relationships between PM pollution in multiple cities, but are strongly limited by the availability of data and costs in terms of time and finance. In cities and regions where there is no data available, or only have limited financial resources, it is impossible and too expensive to implement the methods mentioned above. In fact, most of developing countries and under-developed areas such as South Asia and Central Asia do not have proper emission inventory data customized for running CTMs in local areas (most of existing emission inventory data are usually developed at a very coarse spatial resolution for global or continental-scale simulations) (17,(21)(22)(23)(24). Nor did these areas conducted large-scale sampling campaigns for chemical composition analyses (17). This is because the development of such a customized emission inventory data and large-scale sampling campaigns requires persistent financial input and collaboration of hundreds of scientists. Therefore, a simple and easy-to-implement method, that requires less data, less time and less labor to examine the interaction relationships between PM pollution in multiple cities, would therefore be useful.
This research aims to provide an alternative statistical analysis framework for characterizing directional interaction relationships between PM pollution in multiple cities when CTMs and chemical composition analysis would be too demanding or impossible to implement. This analysis framework integrates the cross-correlation function with Granger causality test. It requires only PM measurement data, which can be obtained from existing air quality monitoring network or low-cost air quality sensors. It is easy to implement and is able to reveal directional interaction relationships between PM pollution in multiple cities in a quick and computationally inexpensive way.
In the following section, this study introduces the analysis framework in details. Then a case study of 29 cities in East China, South Korea and Japan is carried out to illustrate the analysis framework. The 29 cities in Northeast Asia are selected as the study area in that cities in Northeast Asia have been suffering from severe transboundary PM pollution for decades, raising considerable public concern; moreover, CTMs have been used to simulate the transboundary PM pollution in Northeast Asia and these CTM simulation results could be used to verify the analysis results produced by the framework proposed in this study. In the concluding section, this paper summarizes the advantages of the framework and its limitations.

THE FRAMEWORK FOR CHARACTERIZING THE MULTILATERAL AND DIRECTIONAL INTERACTION RELATIONSHIPS BETWEEN PM POLLUTION AT CITY SCALE
The proposed framework consists of two steps. The first step is to compute the strengths of potential interaction relationships between PM pollution in adjacent cities using the crosscorrelation function. The second step is to determine the directions of the interaction relationships using Granger causality tests. Each step is introduced in detail below.

The Cross-Correlation Function
The strength of a potential interaction relationship between PM pollution in two adjacent cities is measured using a statistical measure called time lag-adjusted Pearson correlation coefficient. The time lag-adjusted Pearson correlation coefficient is calculated using cross correlation method (25)(26)(27).
First, the Pearson correlation coefficients between two PM concentration time series in each pair of cities are calculated at continuously varying time lags. This can be mathematically described in the following equation: where P (τ ) is the Pearson correlation coefficient between two PM concentration time series at a specific time lag value τ , and X 1 and X 2 are the two PM concentration time series in the pair of cities. The value range of the time lag τ is set according to the patterns of synoptic cycles of the study area. For example, in Northeast Asia, the synoptic meteorological system usually influences the air quality in the region on a weekly basis (28), therefore the time lag value τ in the Equation (1) is set to vary between the past 7 days (−168 h) and subsequent 7 days (+168 h).
Then, the maximum Pearson correlation coefficient among all the coefficients calculated in the first step is identified, as described in Equation (2).
P max is the maximum correlation coefficient which indicates the strength of the interaction relationship between PM pollution in the pair of cities. Two tests of significance are performed to ensure the results are statistically significant. The first test is the significance test of the correlation coefficient which is used to test whether the calculated Pearson's correlation coefficient is significantly different from zero. The second test is the significance test of the difference between two correlation coefficients using Fisher's r-to-z transformation (29,30), which is used to examine whether the maximum correlation coefficient (P max ) is significantly larger than the correlation coefficient without the time lag (P (0)). If P max is significantly larger than P (0), the observed difference between the two coefficients is not due to random chance.

The Granger Causality Test
After the strength of the interaction relationship between PM pollution in the pair of cities is calculated using the crosscorrelation function and the significance of the interaction relationship is confirmed by statistical tests in the first step, the next step is to determine the direction of the interaction relationship between PM pollution in the pair of cities.
First, the time lag that generates the maximum correlation coefficient P max is identified, as shown in Equation (3).
The sign of T delay shows the potential temporal order of the two PM concentration time series in the pair of cities, which suggests the potential direction of the interaction relationship. Then, Granger causality tests are applied to confirm the potential direction. The Granger causality test is a statistical hypothesis test for inferring causal influences between variables based on temporal precedence (31,32). The rationale of the Granger causality test is that, given an autoregressive model that predict the future values of the PM concentration in a city (X 2 ) based on the past values of X 2 , if adding the lagged values of the PM concentration in the other city (X 1 ) into the model can better model X 2 , then X 1 is said to Granger-cause X 2 . In this way, we can confirm the direction of influence between FIGURE 2 | Study area. X 1 and X 2 . The advantage of the Granger causality test over correlation analysis is that it can remove spurious correlations of the PM time series and therefore reduce the risk of reporting false associations with wrong directions (32). Although Granger causality cannot directly reflect the real physical causal chains, it provides relatively convincing statistical evidence for inferring causality without requiring additional data. Figure 1 gives an illustration of the calculation process of the time lag-adjusted Pearson correlation coefficient using fine particulate matter (PM 2.5 ) measurement data in two cities of Northeast Asia. As shown in Figure 1A, the two PM 2.5 concentration time series in Weihai, China and Seoul, South Korea are best aligned when Weihai's PM 2.5 time series is shifted later by 12 h. The Pearson correlation coefficient calculated at the time lag of 12 is the maximum Pearson correlation coefficient among all the coefficients calculated at varying time lags (Figure 1B). Statistical tests confirm the statistical significance of the correlation coefficient. Then, the time lag that generates the maximum correlation coefficient can be identified as −12. Lastly, Granger causality test confirms that the direction of influence in the interaction relationship between PM 2.5 pollution in Weihai and Seoul is from Weihai to Seoul, suggesting that the PM 2.5 pollution in Weihai may have an impact on the PM 2.5 pollution in Seoul during January 2018.

A CASE STUDY OF 29 CITIES IN EAST CHINA, SOUTH KOREA AND JAPAN Data
This study collected a full year of hourly PM 2.5 measurement data in 2018 from 29 major cities with a population of over 1 million in Northeast Asia from the environmental monitoring agencies in China, South Korea and Japan. The 29 cities include 14 cities in East China (Beijing, Tianjin, Dalian, Shenyang, Tonghua, Baishan, Yanbianzhou, Weihai, Qingdao, Rizhao, Lianyungang, Yancheng, Nantong and Shanghai), 5 cities in South Korea (Seoul, Daejeon, Daegu, Gwangju and Busan) and 10 cities in Japan (Tokyo, Niigata, Sendai, Shizuoka, Nagoya, Osaka, Okayama, Hiroshima, Fukuoka and Kumamoto). Figure 2 shows the study area.
This study conducted a comprehensive data quality check to remove problematic data points, including implausible zeros, duplicated data records and missing measurements. Extremely high hourly PM 2.5 measurements (>1,000 µg/m 3 ) are considered as outliers and therefore removed. After the data quality check, the hourly PM 2.5 measurements at all monitoring stations in each city were averaged to generate an hourly PM 2.5 time series for that city. As China Standard Time (UTC+08:00) is 1 h earlier than Japan Standard Time and Korean Standard Time (UTC+09:00), the timestamps for all PM 2.5 time series in South Korea and Japan were adjusted to China Standard Time for the convenience of data analysis.
In addition, this study used a weather reanalysis dataset developed by NASA to draw wind vector maps at 50 m above the surface in 2018 in Northeast Asia. The dataset is produced based on atmospheric, land, and ocean observations from satellites, aircraft, and ships in the NASA's project of Modern-Era Retrospective analysis for Research and Applications version 2 (MERRA-2) (33).

Results
In order to show the temporal variation of the interaction relationships between PM 2.5 pollution in the 29 cities in East China, South Korea and Japan, this study performs the analysis using the framework on a monthly basis.
The analysis results in January, April, July and October 2018 were calculated and visualized in maps (see Figure 3). As shown in Figure 3, each line connecting two cities indicates that there is a significant interaction relationship between the PM 2.5 pollution in the two cities. The colors of the lines indicate the strengths of the interaction relationship. The arrows in the lines show the temporal order of the corresponding PM 2.5 time series, which suggest the prevailing transportation directions of air parcels.
The visualizations in Figure 3 show that the interaction relationships between the PM 2.5 time series of pairs of cities in East China, South Korea and Japan are significant and strong in all four seasons. In January, April and October, the PM 2.5 time series in Japan lagged behind the PM 2.5 time series in South Korea, which in turn lagged behind the PM 2.5 time series in East China; this suggests that the PM 2.5 pollution probably flows from China to South Korea to Japan. Conversely, in July, the PM 2.5 pollution probably flows from Japan to South Korea to China. In summary, the results demonstrate the strong and significant multilateral and directional interactions between PM 2.5 pollution in cities in Northeast Asia.
The results on the multilateral and directional interaction relationships between PM 2.5 pollution in the 29 cities show how

EVALUATION
This study used three data sets to verify the results produced by the framework on the interaction relationships between PM 2.5 pollution in the 29 cities in Northeast Asia. The first two data sets are the wind vector data and calculated trajectories of air mass movement. The third data set is the data extracted from existing literature that adopted CTMs to quantify the relationships. Figure 4 shows four monthly-averaged wind vector maps in January, April, July and October 2018 at 50 m above the surface in Northeast Asia, which were drawn using a MERRA-2 weather reanalysis data developed by NASA (33). As shown in Figures 3  and 4, the directions of the interaction relationships between PM 2.5 pollution in the 29 cities match the wind vectors very well.

Evaluation Using the Wind Vector Data and Back Trajectories
In addition, this study calculated backward trajectories of the air masses reaching Seoul, South Korea based on the Global NOAA-NCEP/NCAR reanalysis meteorological data for the year of 2018 using NOAA's Hybrid Single Particle Lagrangian Integrated Trajectory (HYSPLIT, version 4) model. The HYSPLIT model is able to trace air parcels' paths back in time and space and indicate where the air parcels have been before they reach the receptor site (34). Each trajectory had a run time of 96 h with 3 h time intervals. The Python package Matplotlib and the R package openair (http://www.openair-project.org/) developed by Carslaw and Ropkins (35) were used to visualize the back trajectories produced by the HYSPLIT trajectory model. Figure 5 shows the results of the 96-h back trajectories centered on Seoul, South Korea in January, April, July and October 2018. As Figure 5 shows, in January, April and October 2018, the air parcels traveled from China to South Korea, while in July 2018, the air parcels traveled from Japan to South Korea. It can be further inferred from Figure 5 that, the air parcels continued to travel from South Korea to Japan in January, April and October 2018, while in July 2018 the air parcels continued to travel from South Korea to China. It can be seen that the results of back trajectory simulations and wind vectors are consistent with the results on the directional interactions between PM 2.5 pollution in the 29 cities of Northeast Asia produced using the framework proposed in this study. Figures 4 and 5 also suggest that the seasonal pattern of the interaction relationships between PM 2.5 pollution in the 29 cities of East China, South Korea and Japan is probably driven by the atmospheric circulation, particularly the westerlies and the East Asian monsoon, which usually brings south-eastern winds in summer and north-eastern winds in winter (36).

Evaluation Using the Data Produced by CTMs in the Literature
As introduced in the section of introduction, CTMs are able to produce high-quality analysis results on the interaction relationships between air pollution in multiple cities. Fortunately, a handful of modeling studies using CTMs have been carried out to study the interaction relationships between PM pollution in China, South Korea and Japan (37)(38)(39)(40)(41). Although the interaction relationships are analyzed at national/regional scale and the results are calculated at national/regional level in these studies, the results of these studies are helpful and can be used to verify the analysis results in this study. Table 1 shows a list of studies that quantified the interaction relationships between PM pollution in China, South Korea and In addition, the study by Kajino et al. (39) quantified the interaction relationships of total nitrate deposition in March, July, and December 2006, respectively, which enable a seasonal comparison. As shown in Figure 6, from March to July, the contribution of South Korea to the total nitrate deposition in China increased, but the contribution of South Korea to Japan decreased; from July to December, the contribution of South Korea to the total nitrate deposition in China decreased, but the contribution of South Korea to Japan increased. Obviously, the directions of influence in the results produced by CTM are consistent with the directions identified in the analysis results produced by the framework in this study.

ADVANTAGES AND LIMITATIONS OF THE FRAMEWORK
The proposed framework in this study has several advantages in examining the interaction relationships between PM pollution in cities compared with CTMs and chemical composition analysis. As shown in Table 2, the framework requires much less data than CTMs. The framework only needs the PM measurement data of the cities of interest which can be obtained from existing air quality monitoring network or low-cost air quality sensors. The CTMs, however, not only require the PM measurement data of the cities to evaluate and calibrate the model, but also require emission inventory and meteorological data to drive the simulation of the processes of pollutant emission, transport, chemical transformation, and deposition.
The framework also costs much less time to implement, and is easy to execute. It usually takes a few hours to apply this framework using the PM measurement data. But, for a CTM, it usually takes months to execute. For example, a CTM such as the GEOS-Chem model is used to simulate the PM pollution in East Asia. The simulations are carried out in a nested domain at a horizontal resolution of 1/2 • latitude by 2/3 • longitude over the East Asia. The nested domain is embedded in a global chemical transport simulation at a horizontal resolution of 4 • latitude by 5 • longitude, which provides initial and boundary conditions for the nested domain. The computer to run the model has 128 gigabyte computer memory and 2 Intel Xeon processors with each processor having 16 cores (Intel Xeon Gold 5218 CPU). Then it needs approximately two months to complete the simulations. As for chemical composition analysis, it usually takes several months even years to collect a sufficient number of samples. Moreover, when the samples are collected, the samples have to be analyzed using expensive devices to obtain the detailed chemical compositions.
In summary, compared with CTMs and chemical composition analysis, the proposed framework in this study provides a simple but valid and easy-to-implement method, that requires less data and less time, to examine the interaction relationships between PM pollution in multiple cities. The framework provides an alternative for exploring the transportation pathways and potential source areas when CTMs and chemical composition analysis would be too demanding or impossible to implement.
The proposed framework in this study has two limitations. The first limitation is that it cannot establish causal interaction relationships between PM pollution in multiple cities the way the CTMs are able to do ( Table 2). As the case study shows, the analysis results produced by the framework show there exist significant multilateral and directional interaction relationships between PM 2.5 pollution in the 29 cities in Northeast Asia. These interaction relationships show that PM 2.5 pollution in China, South Korea and Japan interacted with each other. However, these associated relationships with directions can only suggest that there may exist probable causal relationships that PM 2.5 pollution in a city causing the PM 2.5 pollution in another city, but cannot be certain that these interaction relationships between PM 2.5 pollution have causal linkages. The second limitation is that the framework is not able to quantify the interaction relationships between PM pollution in cities as the CTMs are able to do ( Table 2). In other words, the framework can answer whether there is a significant interaction relationship between PM pollution in two cities and what is the direction of influence in the relationship, but cannot answer to what extent the PM pollution in one city affects the PM pollution in the other city and how much PM pollution is transported to the other city.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. The dataset of hourly PM 2.5 measurement data can be found in the websites of the environmental monitoring agencies in China (https://air.cnemc.cn:18007/), South Korea (http://www. airkorea.or.kr/) and Japan (http://soramame.taiki.go.jp/). The weather reanalysis dataset used to draw wind vector maps in this study is available from NASA Modern-Era Retrospective analysis for Research and Applications version 2 dataset (MERRA-2) website (https://gmao.gsfc.nasa.gov/reanalysis/MERRA-2/data_ access/).