Improving Source Apportionment of Urban Aerosol Using Multi-Isotopic Fingerprints (MIF) and Positive Matrix Factorization (PMF): Cross-Validation and New Insights

Urban air pollution is a matter of concern due to its health hazards and the continuous population growth exposed to it at different urban areas worldwide. Nowadays, more than 55% of the world population live in urban areas. One of the main challenges to guide pollution control policies is related to pollutant source assessment. In this line, U.S. Environmental Protection Agency's Positive Matrix Factorization (EPA-PMF) has been extensively employed worldwide as a reference model for quantification of source contributions. However, EPA-PMF presents issues associated to source identification and discrimination due to the collinearities among the source tracers. Multi-Isotopic Fingerprints (MIF) have demonstrated good resolution for source discrimination, since urban sources are characterized by specific isotopic signatures. Source quantification based on total aerosol mass is the main limitation of MIF. This study reports strategies for PMF and MIF combination to improve source identification/discrimination and its quantification in urban areas. We have three main findings: (1) cross-validation of PMF source identification based on Pb and Zn isotopic fingerprints, (2) source apportionment in the MIF model for total PM mass, and (3) new insights into potential Zn isotopic signatures of biomass burning and secondary aerosol. We support future studies on the improvement of isotopic fingerprints database of sources based on diverse elements or compounds to boost advances of MIF model applications in atmospheric sciences.


INTRODUCTION
Air pollution is a major environmental threat to public health and climate change, which results in millions of premature deaths each year and affects the radiative balance and cloud formation process in the atmosphere (World Health Organization, 2017). Large amounts of particulate matter (PM) emitted globally come from urban areas mainly contributed by the transport sector, industrial processes, and domestic fuel burning (Karagulian et al., 2015). Recent findings have evidenced direct association between concentrations of fine particles (PM 2.5 ) and mortality (Anenberg et al., 2019). Knowledge of the source origin and contribution to human exposure are essential to targeting effective strategies for air quality management (Thunis et al., 2019).
Multivariate receptor modeling (e.g., PCA, PMF2, ME-2, and EPA-PMF) has been extensively employed to assess source apportionment investigations in environmental science (Reff et al., 2007). Positive Matrix Factorization (PMF) was developed by Tapper (1993, 1994), and most recently, a freely available graphical interface was developed by the U.S. Environmental Protection Agency (EPA-PMF). The PMF model decomposes a matrix of samples in two matrices (factor contributions and profiles) in order to quantify the contributions of specific pollutant sources (Paatero et al., 2005;Brown et al., 2015). Despite important improvements of PMF in relation to PCA models, some limitations of these models, like source identification, are implied in receptor multivariate analysis based on elemental concentrations (Figure 1).
Isotope fingerprints show great potential to source tracing in the environment and has recently been used for many purposes (Aebischer et al., 2015;Araújo et al., 2016Araújo et al., , 2018Araújo et al., , 2019Gelly et al., 2019;Nazarpour et al., 2019). The main advantage in the combination of stable (C, O, Zn, Cu, and Fe) and radiogenic (Pb, Sr, and Nd) isotopes is found on the improvement of the source discrimination resolution. Heavy radiogenic isotopes being produced by radioactive decay should be unaffected by mass fractionation in comparison to stable isotopes (Blum and Erel, 2003;Vallero, 2014). Therefore, radiogenic isotopes present specific signatures for each source and have been applied as powerful fingerprints (Komárek et al., 2008). On the other hand, isotopic fractionation of stable isotopes, caused by physicalchemical reactions, must be characterized in order to trace pollutant dispersion in the environment (Wiederhold, 2015).
Isotopes of only one element can present some limitations to distinguish among many different sources in complex systems (Cheng and Hu, 2010). This drawback has been outlined by a combination of isotopes of two or more elements (Figure 1), named here as the Multi-Isotopic Fingerprints (MIF) approach, which has improved source identification and discrimination of PM in urban areas and other sites (Widory et al., 2004(Widory et al., , 2010Cloquet et al., 2006;Dolgopolova et al., 2006;Guéguen et al., 2012;Sherman et al., 2015;Ochoa González et al., 2016;Dong et al., 2017;Souto-Oliveira et al., 2018, 2019Schleicher et al., 2020). However, source contributions are quantified for those specific elements only, being few representatives of total PM mass, considering low elemental mass of Pb, Cu, and Zn (<1%) compared to total PM mass.
Taking into account the current gaps of multivariate receptor modeling and MIF (Figure 1), our study has two main goals. The first goal is to assess isotopic fingerprints as an alternative to validate source identification in the EPA-PMF model. The second goal is to improve the estimation of source contribution in the MIF model based on total PM mass. In order to examine strategies to achieve these aims, we performed EPA-PMF analysis, assessing uncertainties with bootstrap and displacement parameters and employing the elemental concentrations and Pb and Zn isotopic signatures measured in PM 2.5 samples in São Paulo City. The Metropolitan Area of São Paulo (MASP) is the biggest urban area in South America and is one of the top 10 megacities in the world. Air pollution at MASP has been well-described by several previous studies (Andrade et al., 2017 and Supplementary Table 1) and has some similarities when compared to other megacities and unique characteristics such as the high contribution of bio-fuels (gasohol and bio-diesel) in the transport sector (Nogueira et al., 2015).

Elemental and Black Carbon Concentrations
The PM 2.5 was sampled in winter of 2013 on the rooftop (∼12 m) of the Astronomy, Geophysics, and Atmospheric Sciences Institute at University of São Paulo campus (USP, 23.56 • S, 46.73 • W), which is located in the west side of the urbanized area of São Paulo City. The sampling site is situated in the western side of the city and is located approximately 1 km northeast and 4 km north of the two main roads with intense traffic (Marginal Tietê and Marginal Pinheiros, respectively). Each sample was collected during 12h intervals in polycarbonate membranes (Whatman R ) under constant air flow (16 L min −1 ). Other details of sampling procedures are described in Souto-Oliveira et al. (2018). The PM 2.5 mass concentration, deposited in polycarbonate filters (47 mm diameter), was determined gravimetrically using a microbalance (Mettler-Toledo, Columbus, OH, USA) with 1 µg of readability, which presents <1% of aerosol mass. Elemental concentrations were determined by an Energy Dispersive X-ray Fluorescence Spectrometer (EDX 700; Shimadzu Corporation, Analytical Instruments Division, Tokyo, Japan). The spectrometer operates at 5-50 kV and 1-1,000 µA, using a low-power Rh-target tube, and the elemental characteristic X-ray radiation emitted from the aerosol sample is detected with a Si(Li) detector. The spectra obtained were processed and quantified with the Win-QXAS program, available from the International Atomic Energy Agency (https://www-legacy.iaea.org/OurWork/ST/NA/NAAL/ pci/ins/xrf/pciXRFdown.php).
The BC analysis was performed by optical reflectance with a smoke stain reflectometer (model 43D; Diffusion Systems Ltd, London, UK) and detailed previously by Souto-Oliveira et al. (2016) and Hetem and Andrade (2016). The concentrations of CO and SO 2 regulated gases, measured by the Environmental Agency of São Paulo State, were used to support source attribution to PMF factors [details on the equipment and data treatment are described at Environmental Agency of São Paulo (CETESB), 2020].

PMF and Multi-Isotopic System Analysis
We used a correlation matrix plot (CM) with Pearson's correlation and a principal component analysis (PCA) in order to analyze the relationships within the dataset. The non-significant correlations (p < 0.01) were excluded. The PCA was calculated using the varimax rotation. The number of components was limited to five due to the eigenvalue of the sixth being <1. Both analyses were implemented in R language using the psych (Revelle, 2020) and corrplot (Wei and Simko, 2017) packages.
The source apportionment analysis was performed using the PMF-EPA model (version 5.0) provided by EPA. Several runs with different numbers of factors, species, and categorization were fulfilled in order to obtain the best adjustment and physical validity of the model. The goodness of fit was evaluated by minimization of Q (true, robust, and expected) and a scaled residual between −2 and +2, following the recommendations of Reff et al. (2007). In addition, uncertainties estimation of PMF solutions was assessed via bootstrapping (Brown et al., 2015). All tests to model optimization are described in the Supplementary Information.
The source apportionment results estimated via MIF based on Pb and Zn isotopic signatures and concentrations were accounted for during PMF runs for improvements of MIF source attribution estimations as well, as discussed in section Results and Discussion. All the Pb and Zn isotopic signatures and MIF model results were previously reported by Souto-Oliveira et al. (2018). The Pb and Zn isotopic compositions are denoted as 206 Pb/ 207 Pb ratios and δ 66 Zn JMC values. The JMC (Lyon, batch 3-0749L) is known as the international standard and has been extensively used as reference to report Zn isotopic compositions.

RESULTS AND DISCUSSION
Correlations and Multivariate (PCA and PMF) Analysis CM and multivariate (PCA and PMF) statistical analysis were employed to assess the database (Supplementary Table 2) regarding the investigation of elemental profiles for each factor and support for source attribution. Another new methodology involved examining strategies and relevance of including Pb and Zn isotopic data on PCA and PMF analysis, considering that any studies have already explored this approach, to the best of the authors' knowledge. The raw dataset (Supplementary Table 2) was assessed with CM (Supplementary Figure 1) in order to examine main correlations between species and support PCA and PMF species profiles. Strong correlations (R > 0.7) were found (Supplementary Figure 1) among crustal elements (Al, Si, Fe, Ti, Ca, and K), whereas PM 2.5 was correlated with BC. Two other groups with strong correlations, among species, were Pb, Cu, Zn, and Cl and P, S, and Mn. The Pb and Zn isotopic compositions and Cr did not show any positive correlation with other species, in line with PCA and PMF, which presented better adjustments after removal of these species (Supplementary Tables 3, 4).
PCA showed best adjustments with four factors, taking into account best-explained variance (Supplementary Table 3). In the PMF analysis, change in the number of factors showed a slight decrease in Q/Qexpec when changing from three to four factors and a great increase when changing from four to five factors. However, two important sources (road dust and biomass burning) remain combined in the same factor (Supplementary Figure 2), when only four factors were considered. The solution with five factors was chosen (Figure 2) in order to split these sources. In addition, all factors were mapped more than 85%, in the solution with five factors, which means that bootstrap uncertainties can be interpreted and the number of factors is appropriated (Supplementary Table 4). Bootstrap is an important parameter to detect random errors and, to a lesser extent, effects of rotational ambiguity . In this line, displacement consists of uncertainties estimation related to data uncertainties (data noise) and rotational ambiguity, which is focused on the lower %dQ and no factor changes (swaps) (Brown et al., 2015). Despite the main changes in factor or species during assessment of PMF settings, %dQ remains <0.1% and any swaps occurred (Supplementary Table 4), ensuring the absence of rotational ambiguity in our PMF results.
CM (Supplementary Figure 1), PCA (Supplementary Table 3 Table 4) results agreed on the lack of correlation among Pb and Zn isotope composition. To address this issue, we can consider that multivariate tools are based on linear correlations among species and variance of these species in the samples. On the other hand, MIF is not based on the variance, but instead on specific isotopic signatures of each source, such that discrete ranges of isotopic scales identify/discriminate between different sources. An additional explanation is based on non-linear correlation observed in the Pb and Zn elemental concentration vs. isotopic composition ( 206 Pb/ 207 Pb ratios and δ 66 Zn values) distributions . Therefore, the distinct scientific basis of PCA/PMF and MIF models could explain those insignificant correlations. An alternative approach to fix this limitation is further discussed in Section MIF Improvement and PMF-MIF Combined Source Apportionment.

), and PMF (Supplementary
Source attribution of PMF factors (Figure 2) was supported by previous works, which reported species profiles and source apportionment in São Paulo City (Supplementary Table 1). These previous studies provide important background to upcoming source apportionment assessments based on receptor modeling. In addition, PMF-species profiles (Figure 2) are in accordance with CM-species groups (Supplementary Figure 1), ensuring robustness of species profiles obtained in our analysis.
Factor 1 presented higher contributions of crustal elements (Al, Si, Ti, Fe, Ca, and K), linked to road dust, and lower contributions of BC, Zn, Pb, S, and Mn species, associated with road traffic. This species profile is in line with CM (Supplementary Figure 1) and PCA (Supplementary Table 3) analysis as well as studies performed in other cities and current road tunnels characterization carried in the São Paulo City (Amato et al., 2011;Gunawardana et al., 2012;Hetem and Andrade, 2016;Jithin and Srimuruganandam, 2020;Nory et al., 2021). It is important to mention that urban road dust is associated with road traffic, which combines contributions of vehicular non-exhaust emissions (tire wear and brakes abrasion) and road pavement/furniture (Thorpe and Harrison, 2008). This factor, named here as road dust/traffic, accounted for 16% of the PM 2.5 .
Factor 2 presented higher contributions of S and P, in line with CM (Supplementary Figure 1), and minor contributions of BC, Mn, Fe, Zn, and Pb (Figure 2). An important PM 2.5 source related to S and P, in sulfate form, is the particle formation process, which generates secondary aerosol in the atmosphere. Many inorganic (sulfate and nitrate) and organic compounds (primary and secondary) as well as physical-chemical conditions (temperature, solar radiation, and oxidants) are involved in this process (Holmes, 2007;Rönkkö and Timonen, 2019). In urban areas, a pool of anthropogenic activities (fossil fuel combustion, biomass, wood and waste burning, cooking, and industrial emissions) generate inorganic and volatile organic compounds, found to be important raw materials for secondary aerosol formation (Gordon et al., 2014;Kanellopoulos et al., 2021). Therefore, secondary aerosol combines contributions of different sources but presents specific formation process and characteristic species profile. Previous findings have showed higher concentrations of S and P in fossil fuels (oil, gasoline, and diesel) and minor concentrations of metals (Fe, Zn, Mn) (Jones et al., 2014;ACEA, 2018). The association of S and P with oil burning (Supplementary Table 1) and an empirical S vs. P correlation, linked to fuels, have been reported in São Paulo City (Brito et al., 2013;Marien, 2018).
Industries are present inside and outside MASP with two main industrial areas (Cubatão and Maua). Factor 3 presented species profile related to Pb, Cu, Zn, and Br elements (Figure 2), aligned with CM (Supplementary Figure 1), and showed a contribution of 7%, in line with earlier estimations (Supplementary Table 1). Cubatão is in the coastal region, about 48 km southeast of São Paulo City (sampling site), and ranked as one of the biggest Latin American industrial areas. Interestingly, chloride, a tracer of sea salt and industrial activities, had also contributed to factor 3. Aerosol transportation from the Cubatão industrial area to São Paulo City has been well-documented using Pb isotopes and will be further discussed in Section Validation of PMF Source Identification by Isotopic Fingerprints (Gioia et al., 2017;Souto-Oliveira et al., 2018).
Factor 4 accounted for 8% of PM 2.5 mass and presented strong association with CO, Cl, and P with minor contributions of Br and BC. Some works have demonstrated the linkage of those species with biomass burning (Bond et al., 2013;Calvo et al., 2013;Tian et al., 2016;Andreae, 2019). Recent studies have characterized and estimated contributions of biomass and wood burning to air pollution in São Paulo, coming from pizzerias (in the City), rural activities (São Paulo State), and forest burning in the north region of Brazil (Amazonas and Pantanal) (Pereira et al., 2016;Andrade et al., 2017;Ribeiro et al., 2018;Lima et al., 2020). Also, biomass burning transport to São Paulo City was previously demonstrated (Souto-Oliveira et al., 2016; Vara-Vela et al., 2018). PM 2.5 , CO, and BC species have been related to vehicular exhaust  Environmental Agency of São Paulo (CETESB), 2020]. These species were retained in factor 5 (Figure 2) and grouped in CM (Supplementary Figure 1). Factor 5 accounted for 50% of PM 2.5 mass, in accordance with intense vehicular traffic, regularly found in the São Paulo City, as well as previous source apportionment estimation carried out since 1994 (Supplementary Table 1). Linkage of this factor with vehicular exhaust was confirmed by Zn isotopes signatures, as discussed in Section Validation of PMF Source Identification by Isotopic Fingerprints.

Validation of PMF Source Identification by Isotopic Fingerprints
The Zn isotope signatures vs. PMF factor regression was employed to examine isotopic fingerprint applicability as an alternative to cross-validate the PMF source attribution. PMF factors related to road dust/traffic (non-exhaust) showed significant (p < 0.05) positive correlation (R = 0.63) with δ 66 Zn JMC values ( Figure 3A). This observation is in line with heavier Zn isotopes (δ 66 Zn JMC > 0.00‰) found in the road dust and vehicular non-exhaust samples (Dong et al., 2017;Souto-Oliveira et al., 2018;Nory et al., 2021). PMF factors attributed to vehicular exhaust showed negative correlation with δ 66 Zn JMC values (Figure 3B), in accordance with preliminary lighter Zn isotopes signatures (δ 66 Zn JMC > −0.60 and <0.00‰) characterized in the vehicular exhaust from São Paulo City (Gioia et al., 2008).
The Zn isotopic signatures of vehicular exhaust and industrial emissions must be extensively assessed in future works, considering different vehicle combustion features, such as fuel compositions and motor types, and industrial processes. However, here we can assume the occurrence of Zn isotopic fractionation during vehicular combustion, taking into account previous works that reported isotopic fractionation during the industrial combustion process, described below, as well as the negative linear trend found here between δ 66 Zn JMC and PMF vehicular exhaust contributions (Mattielli et al., 2009;Borrok et al., 2010).
To the best of the authors' knowledge, until now, the Zn isotopic signatures of biomass burning and secondary aerosol were not characterized. However, here we have observed positive correlations between δ 66 Zn values and PMF factors, associated with biomass burning, suggesting PM emissions enriched in heavier Zn isotopes in this source ( Figure 3E). Aligned with this evidence, industrial combustion of coal and tire-derived fuels has evidenced that the PM produced (solid phase) are enriched by heavier Zn isotopes, with higher δ 66 Zn values, while gas phase is enriched with lighter isotopes, showing lower δ 66 Zn values (Borrok et al., 2010;Ochoa Gonzalez and Weiss, 2015). These works support the positive correlation between biomass burning and δ 66 Zn values found here.
In relation to secondary aerosol, negative correlation was observed between PMF factor contributions and δ 66 Zn values ( Figure 3D). It is important to consider that secondary aerosol is formed by inorganic and organic compounds found in vapor/liquid phase in the atmosphere via photochemistry reactions (Holmes, 2007). Considering that lighter Zn isotopes are enriched in vapor phase during the combustion process (Borrok et al., 2010), we suppose that vapor phase with lower δ 66 Zn JMC values contributes to secondary aerosol formation. However, it is imperative experimental investigations t o confirm these evidences, considering the large heterogeneity of biomass burning and complexity of secondary aerosol composition.

MIF Improvement and PMF-MIF Combined Source Apportionment
One important limitation of the MIF model is related to quantification of source contribution, considering that these estimations are based on few elements. Therefore, source contributions calculated by the MIF model may be under-or overestimated in relation to total PM mass. In order to fix this limitation, MIF vs. PMF linear regression (Figure 4) was employed to recalculate the original MIF results, denoted as MIForig . Vehicular source contributions accounted for by MIF and PMF (F1-road dust/traffic and F5-vehicular traffic exhaust) showed a significant (p < 0.05) correlation (R = 0.64) ( Figure 4A). Also, industrial emissions of both models showed a positive correlation ( Figure 4B). These results endorse the strategy of validating source attribution of PMF factors by MIF.
The comparison of source contribution percentages obtained by PMF, MIF-orig, and those recalculated, denoted as MIF-PMF, is illustrated in Figure 4C. The vehicular contribution was very similar between PMF (72%), MIF-orig (66%), and MIF-PMF (75%). Industrial contribution modeled by MIF-orig (36%) was four times overestimated in comparison to PMF (7%). However, it must be remembered that MIF-orig is an estimation of industrial contribution to concentrations of Pb and Zn, which are mainly associated to that source. On the other hand, industrial contribution (6%) estimated by MIF-PMF was remarkably similar to PMF results, endorsing the validity of MIF-PMF correction strategy to report contribution for total PM mass.
The combination of PMF and MIF showed relevant improvements to source identification and discrimination in both models (Figure 5). Non-exhaust and exhaust vehicular emission, retained in PMF factors 1 and 5, respectively, showed the main (66%) contribution to PM 2.5 in São Paulo City. Vehicular exhaust was characterized by less radiogenic Pb isotopes and lighter Zn isotopes combined with BC, CO, and PM 2.5 species (Gioia et al., 2008(Gioia et al., , 2017Souto-Oliveira et al., 2018). Road dust and road traffic/non-exhaust were characterized by heavier Zn isotopic signatures and crustal (Al, Si, Ti, Fe, and Ca) and BC, Zn, and Pb species profiles, respectively (Dong et al., 2017;Souto-Oliveira et al., 2018;Nory et al., 2021). Biomass burning showed a positive correlation with δ 66 Zn values (Figure 3E), suggesting that this source is enriched with heavier Zn isotopes, which is supported by coal and tire-fuel combustion studies (Borrok et al., 2010;Ochoa Gonzalez and Weiss, 2015).
Industrial emissions were earlier characterized by lighter Zn isotopes (δ 66 Zn JMC < −0.60‰) and Pb isotopes, which could differentiate between the Cubatão industrial area ( 206 Pb/ 207 Pb > 1.20) and other industries ( 206 Pb/ 207 Pb < 1.19) (Mattielli et al., 2009;Yin et al., 2015;Gioia et al., 2017;Souto-Oliveira et al., 2018). Here, Zn isotopic signatures showed negative correlation with PMF factor 3, associated with industrial emissions (Figure 3D), confirming source attribution of this PMF factor. In line with this, the MIF industry showed a significant and positive correlation with PMF factor contributions, which reinforces source attribution of PMF factor analysis. This source accounted for 6% of PM 2.5 mass, and the contribution of Cubatão (3%) and other industries (3%) could be estimated based on the MIF model ( Figure 5). It is important to mention that the MIF model assimilates source discrimination resolution features of Pb and Zn isotopes  and therefore can differentiate between industrial emissions of specific areas (Cubatão and other industries), whereas in PMF, these sources were observed in the same factor. Secondary aerosol source accounted for 16% of PM 2.5 and was associated with S, P, and Mn species here and in earlier works (Supplementary Table 1). The PMF factor contributions of this source showed negative correlation with Zn isotopes, indicating that lighter Zn could be attributed to secondary aerosol and gas phase of anthropogenic sources.

CONCLUSION
The PMF and MIF solutions were combined to provide a step forward to assess source identification and apportionment of urban aerosol. Here, three main advances could be reached: (1) cross-validation of source identification based on two different tools (PMF elemental profiles and MIF isotopic FIGURE 5 | Schematic source identification and apportionment using Multi-Isotopic Fingerprints (MIF) and species profiles obtained by EPA-PMF model for PM 2.5 from São Paulo City. Species profiles from EPA-PMF could be cross-validated by isotopic fingerprints. Industrial source contributions were differentiated using Pb and Zn isotopic signatures. Positive linear regression between biomass burning vs. Zn isotopes suggested that particles enriched with heavier Zn signatures. Negative linear regression between secondary aerosol vs. Zn isotopic signatures suggested particles enriched with lighter Zn. fingerprints), (2) improvement of source contribution in the MIF model to total PM 2.5 mass, and (3) new insights into the investigation of Zn signatures in atmospheric processes and pollutant sources.
Cross-validation of PMF source identification by isotopic fingerprints seems to pose a significant way to improve the accuracy of source apportionment investigation. However, it is very important to advance the isotopic fingerprints and species profile characterization of the main sources. This approach is essential to develop a large source database in order to support robust MIF and PMF identification/discrimination. The strategy to recalculate the MIF model results to PM mass, based on linear regression with PMF solutions, is a valid method to be explored in future studies. On the other hand, the MIF model must be improved by inclusion of additional elements (Fe, S, Cu, C, and O) and also by elemental ratios, which could be a great alternative to fix over-or underestimations in relation to PM mass. The potential enrichment of heavier Zn isotopes in biomass burning particles and lighter Zn isotopes in secondary aerosol particles, suggested here, must be further investigated in future works in order to improve knowledge of isotopic fractionation in the source dispersion and atmospheric processes.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

AUTHOR CONTRIBUTIONS
CS-O manuscript write, figures creation, MIF, and PMF analysis. LK CM/PCA analysis and manuscript reviewer. MA and MB manuscript reviewer and funds acquisition. All authors contributed to the article and approved the submitted version.