Human Population Density is a Poor Predictor of Debris in the Environment

There have been a variety of attempts to model and quantify the amount of land-based waste entering the world’s oceans, most of which rely heavily on global estimates of population density as the key driving factor. Using empirical data collected in seven different countries/territories (China, Kenya, South Africa, South Korea, Sri Lanka, Taiwan and Vietnam), we assessed a variety of different factors that may drive plastic leakage to the environment. These factors included both globally available GIS data as well as observations made at a site level. While the driving factors that appear in the best models varied from country to country, it is clear from our analyses that population density is not the best predictor of plastic leakage to the environment. Factors such as land use, infrastructure and socio-economics, as well as local site-level variables (e.g., visible humans, vegetation height, site type) were more strongly correlated with plastic in the environment than was population density. This work highlights the importance of gathering empirical data and establishing regular monitoring programs not only to form accurate estimates of land-based waste entering the ocean, but also to be able to evaluate the effectiveness of land-based interventions.


INTRODUCTION
The impacts from marine plastic pollution to wildlife, human health, and the economy are well documented (Gall and Thompson, 2015;Beaumont et al., 2019) and are likely to continue to increase as global plastic production rises (Geyer et al., 2017). Because an estimated 80% of marine plastic pollution has land-based origins (Derraik, 2002), the most efficient way to address the problem is by stopping plastic waste leakage from land to the sea. Plastics typically enter the ocean from land as mismanaged waste transported via rivers or wind (Kershaw and Rochman, 2015), though local human deposition in coastal areas also contributes (Hardesty et al., 2016). While debris on land is found ubiquitously and has been reported from the most remote to the most densely populated corners of the earth, it is not equally distributed (Barnes et al., 2009;Martins et al., 2020;Napper et al., 2020).
Many studies have investigated debris at local or regional scales (e.g., Wessel et al., 2019;Miladinova et al., 2020;Vidyasakar et al., 2020) These studies are predominately carried out along the coastal margin (Serra-Gonçalves et al., 2019) though studies along rivers and at river outlets are becoming more common (e.g., Battulga et al., 2019;Cordova and Nurhati, 2019;Van Calcar and Van Emmerik, 2019). However, these empirical studies are, by necessity, restricted to a limited area, so in order to understand debris distribution on a broader scale, modeling and predictions are critical. For the most part, these studies use globally available data sources as proxies for the amount of mismanaged waste entering the environment (but see Lebreton et al., 2017).
Jambeck and colleagues used global data sets to predict mismanaged waste and calculated that an estimated 4.8-12.7 million MT of plastic entered the ocean in 2010 (2015). They hypothesized that population size and the quality of waste management systems in a country were most important predictors of the amount of debris lost to the marine environment. Lebreton et al. (2017) similarly relied predominately on global estimates of population density and mismanaged plastic waste, but additionally factored in runoff to estimated that between 1.15 and 2.41 MT of plastic is transported to the ocean via rivers. This research also relied on published empirical studies to calibrate the models. Models of floating plastic distribution in the ocean used coastal population density to seed the models (e.g., Van Sebille et al., 2012;Van Sebille, 2014), with (in instances) the addition of impervious surface area (Lebreton et al., 2012) and mismanaged waste (Van Sebille et al., 2015).
More recent studies have acknowledged the fact that mismanaged waste varies not only with population density, but also with factors such as socio-economic status (Borrelle et al., 2020;Lau et al., 2020). Gross Domestic Product (GDP) is positively correlated with reported per capita waste generation, but negatively correlated with the proportion of mismanaged waste, and these relationships can vary between rural and urban areas (Lebreton and Andrady, 2019).
If we are to make accurate predictions, it is critical to test the foundational assumptions that are being made when modeling waste leakage. To date, studies have predominately used global population density estimates without the addition of empirical data, and many of these studies have presumed that population density is an adequate proxy for debris leakage (e.g., Van Sebille et al., 2012). To test these assumptions, we gathered empirical data on debris in the environment in 7 countries/territories (hereafter referred to as countries for simplicity): mainland China, Kenya, South Africa, South Korea, Sri Lanka, Taiwan, and Vietnam. We asked three key questions: 1) What drives the distribution of debris in inland areas? 2) How similar (or different) are these drivers among the seven countries studied?
3) Do models based on population density accurately represent debris observed in the local environment?
To address these questions we assessed a number of potential drivers, including land use, survey type, infrastructure, environmental and socio-economic factors, local population density, and site level information such as steepness and vegetation height (Table 1).

What Drives the Distribution of Debris in Inland Areas?
This research was undertaken as part of the CSIRO global plastics losses project (https://research.csiro.au/marinedebris/projects/ globalplasticsleakageproject/), which is aimed at understanding the amount of plastic that is lost from land to the marine environment. The goal is to use empirical data to quantify and better understand debris leakage rates globally, based on locally collected data across an array of countries. We selected countries based on a combination of factors, including the country's ranking in estimated mismanaged waste generated annually (per Jambeck et al., 2015). Between 2017-2019 we worked with local partners in each country to select an urban area within a major watershed. The urban areas selected were Shanghai, China; Mombasa, Kenya; Capetown, South Africa; Yeongsan, South Korea; Negombo/Colombo, Sri Lanka; Kaohsiung, Taiwan, and Haiphong, Vietnam. Inland survey sites were then chosen within a 200 km radius of the central point (with the exception of Shanghai, China, where sites were chosen within 100 km due to excessively long travel time between sites). Sites were selected using a stratified random sampling design, taking into account a variety of environmental and socio-economic factors [population density, distance to infrastructure (roads and rail), distance to coast and river, proxies for socio-economic status, and land use]. For each country we pre-selected approximately 40 inland sites, but due to accessibility constraints and the variability in capacity of our local partners, the total number of sites surveyed varied between 23 and 47 ( Table 2).
At each site we conducted between 3-6 transects of 25 m 2 , distributed in proportion to the site uses present within 200 m of the central site point (e.g., walkways, natural vegetation, roadways, etc.). Transects were usually 12.5 m × 2 m, except in the case of roadsides, where they were 25 m × 1 m to ensure the safety of the participants. Observers walked the length of the transect, and categorized any anthropogenic debris within the transect that could be seen from standing height. Debris was placed into one of 84 categories, labeled as either fragment or whole (for a complete methodology, see Schuyler et al., 2018b). At each transect, data were also collected on local conditions that could influence the amount of debris found, including the number of people visible at the site, the steepness of the land, the height of the vegetation, substrate color (dark/light), and percent of bare ground in the transect. Both the methods for the survey as well as the local variables to consider were selected based on previous published studies conducted at large scales (Hardesty et al., 2017a;Hardesty et al., 2017c). Because one of the goals of this project is to estimate the amount of debris leakage on a global scale, we identified covariates for which global datasets existed, that might influence debris levels on a larger scale. In our previous work, land use, infrastructure, and socio-economic factors were among the most important influencers of debris levels (Hardesty et al., 2016). We identified the following environmental and socio-economic variables at each site: population density within 1km 2 , total value of the built environment (rural, urban, and total) identified from the United Nations Global Exposure dataset (GAR15) (UNISDR, 2015), distance to the coast, distance to the nearest rail line, road, and river, mean nightlights within 1 km 2 , and land use. We wanted to incorporate a globally available, socio-economic GIS layer at the finest resolution possible in our analysis, so we explored two potential options. While most socio-economic indicators are national, the GAR15 dataset, developed for assessing economic risk from disasters, is one of the only socio-economic datasets with near-global coverage and sub-national resolution. GAR-15 includes several indicators, including the value of the urban environment, the value of the rural environment, and the total value of assets in a given area, all of which we included in our analyses. Our second option was to use the relationship between lighting at night and population density. In general, the higher the population density, the more nightlights you would expect in a given area. However, in areas with higher income or resources, we would assume a disproportionally higher level of lights than would be predicted by population density alone. Therefore, we used the residual deviation around the linear relationship of nightlights regressed on population density as a second proxy for socio-economic status.
We combined the data from all seven and used model selection on generalized additive models (GAMs) with a Tweedie distribution (mgcv package) in the R statistical environment to find the models with the lowest AIC score (Burnham and Anderson, 2002;Wood, 2011;Bartoń, 2018;R Core Team, 2018). We chose GAMs so that we could experiment with smooths of different factors, though ultimately we settled on parametric terms to be able to predict debris outside of our study area. We used a Tweedie distribution because debris is measured as count data, and the distribution gives the flexibility for the model to range between gamma to Poisson. Because there were a number of factors that could potentially influence the debris in the environment, we used dredge (MuMin package) to determine which factors explained the greatest variability in the data. To avoid collinearity, we restricted the analysis to ensure that no two variables with a correlation factor greater than 0.7 could be included in the same model. We also restricted dredge from including both nightlights and population density in the same model, as nightlights, to some extent, could act as a proxy for population.
The dredge process yielded a range of models which were within 2 AICc points of the best model. Because these models are within the 95% confidence set around the best model in terms of AIC model selection, we used model averaging techniques (Table 3). To determine which factors best explained the variability in the averaged model, we calculated the effect size by multiplying the median value of the factor (assuming 1 for categorical variables), by the coefficient from the model ( Figures  1, 2). We also calculated the variable importance score, which represents the proportion of the total models in which each term appears. For example, if land use appeared in 8 out of the 10 models within 2 AIC points, it would receive a variable importance score of 0.8. The variable importance indicates how consistently a given term is included in the models (Table 4).

How Similar (or Different) are the Drivers Between Countries?
For each country individually, we used the same analyses as above to identify the covariates that best described the variability in debris, with the same restrictions as above ( Figure 2).

Do Models Based on Population Density Accurately Predict Debris?
To determine whether population density is an accurate proxy for debris, we ran a GAM using total debris counts as the response variable, and population density (within 1 km 2 ) as the predictor variable. We compared the deviance explained and AIC with the null model, and with our full model ( Table 3).

RESULTS
The total number of items on each transect varied from 0-242 items. South Korea had the overall lowest average debris density (0.4 items/m 2 ), while South Africa, had the highest (2.05 items/ m 2 ) based on inland surveys ( Table 2).

What Drives the Distribution of Debris in Inland Areas?
For the combined models, significant terms included visible humans (positively correlated), landuse (forested and dense settlements lower than urban settlements), survey type (disused significantly greater than agriculture), distance to river (positively correlated), rail and coast (negatively correlated), and country ( Figure 1). All of these terms appeared in all models, with a resulting effect size of 1.0 ( Table 4). Population/nightlight residuals appeared in 90% of all models, while the other terms appeared in fewer than half of the models. It is worth nothing though, that either nightlights or population density did appear in 62% of the models.
How Similar (or Different) are the Drivers Between Countries?
Drivers were not completely consistent among countries. The best models for each country individually varied both in the terms included, as well as the directions of those terms (e.g., whether they were positively or negatively correlated with observed debris densities) (Figure 2; Table 4). Two terms appeared in all models: visible humans (positive correlation in all countries except Sri Lanka), and distance to the coast (negative correlation in all countries except mainland China and Kenya). A further six terms occurred in all but one country: slope (all but Vietnam), distance to the nearest rail (all but mainland China), light/population residuals (all but Kenya), total built value of the rural environment (all but Sri Lanka), mean nightlights (all but Taiwan) and distance to the nearest road (all but Kenya). For individual country models, significant terms included visible humans (South Africa, Kenya), slope of land (South FIGURE 2 | Effect size plots for China, Kenya, South Africa, South Korea, Sri Lanka, Taiwan, and Vietnam. Color represents the p-value significance level, and the lines are the standard error for each term. Triangles denote a positive coefficient for a given factor, whereas circles denote a negative coefficient. The effect size is calculated as the median value of the factor times its coefficient. Frontiers in Environmental Science | www.frontiersin.org May 2021 | Volume 9 | Article 583454 Korea), vegetation height (Kenya), substrate color (South Africa), distance to the coast (Sri Lanka, Kenya), distance to the nearest rail (Vietnam, Sri Lanka), distance to the nearest road (Sri Lanka), total built value (rural) (Vietnam), landuse (Sri Lanka, Kenya), and distance to the nearest river (Vietnam) (Figure 1). Variable importance scores were similarly diverse, with different terms appearing more frequently in different countries ( Table 4).

Do Models Based on Population Density Accurately Predict Debris?
The relationship between population density alone and total debris was significant and positive (p < 0.001). The deviance explained was 1.25%. The deviance explained of the 23 full models contributing to the model averaging was between 35.9-36.9.

DISCUSSION
To date, most studies of plastic leakage rates consist either of surveys conducted predominately at coastal or beach locations in a single region or country (e.g., Hardesty et al., 2017a;Schöneich-Argent et al., 2019), or rely on globally available proxy data to model predicted debris on a global or regional scale, without incorporating empirical data (e.g., Jambeck et al., 2015;Lebreton and Andrady, 2019;Borrelle et al., 2020;Lau et al., 2020). Here we combine the two approaches, using survey data to test the utility of a variety of local and global proxy layers.

What Drives the Distribution of Debris in Inland Areas?
Studies of debris on the open ocean and along coastlines have found that physical factors such as currents, waves, wind and tides have an important effect on the distribution and accumulation of debris (Olivelli et al., 2020;Van Sebille et al., 2020), but the drivers of inland debris distribution are less well understood.
When the data were pooled, land use (a globally measured covariate) survey type (a locally determined covariate), and Country, explained a significant amount of the variability in the data. The significant influence of survey type was driven in large part by the elevated levels of litter found in disused areas. Both land use and survey type were also found to be significant in studies conducted in both the United States and Australia (Hardesty et al., 2017b;Hardesty et al., 2017c). Other factors that contributed to the patterns observed included distance to coast, distance to railroad station, and distance to river. However, their effect sizes were considerably lower than land use, survey type, and country.
Previous work has shown that socio-economic status is one of the most influential factors in predicting debris, with higher socioeconomic indicators associated with reduced debris loads (Schuyler et al., 2018a). This is likely due to a combination of influences including income, education, infrastructure, access to social structures, and behavioral norms (Ajzen, 1991). For the combined seven country model, three socio-economic indicators appeared among the best models; light/population residuals, the built value (urban), and the built value (all) (Figure 1). While none were statistically significant, they all contributed to explaining the variability in the data (and were thus included in the best models). The values for all three indicators trended toward a negative relationship with debris density, indicating that as an area was higher in socio-economic status, the debris loads were lower/ reduced. This finding reflects the negative correlation between GDP and per capita mismanaged waste that has been reported in other work (Lebreton and Andrady, 2019). While richer countries tend to generate more waste per capita, they also tend to have better waste management systems, which ultimately results in a lower proportion of mismanaged waste. Here we showed that the trend reported on a county-wide scale, also holds on a sub-national level.
How Similar (or Different) are the Drivers Between Countries?
Overall, the individual models were quite good at explaining the variability in the debris data, with deviance explained values of up to 72% (Table 3). These models generally incorporated factors measured at the site level, such as local land use, vegetation height, survey type, and substrate color. We found high heterogeneity between countries, both in terms of the magnitude of debris, and the most relevant drivers for the patterns observed. In fact, in the combined model, country is one of the strongest predictors of the total amount of debris reported. The differences observed between the countries remained present even after accounting for the driving factors measured, and may be a result of other underlying factors including political/social differences, legislation, and geography. This reveals another challenge in predicting debris leakage rates on a global level. Each country demonstrated different baseline quantities of debris, with substantial variability observed among survey sites, both within and among countries. This demonstrates the importance of establishing baselines and monitoring programs on a local and regional scale, rather than relying solely on largescale, global model-based predictions.

Do Models Based on Population Density Accurately Predict Debris?
One goal of both land and ocean-based debris research is to understand and quantify the distribution of debris, which can inform efforts to both prevent waste leaking and to remove litter than has already arrived in the environment. Because it is impossible to sample ubiquitously, researchers rely on globally available data sets to provide proxy measures for the amount of waste or leakage in a given area. Loss rates are often based on population density (e.g., Van Sebille et al., 2012), and, increasingly, the proportion of mismanaged waste (e.g., Lebreton and Andrady, 2019) though occasionally factors such as runoff and artificial barriers may be incorporated into estimates (e.g., Lebreton et al., 2017). These predictions assume that debris leakage rates are proportional to population density, though there is little empirical evidence to support this hypothesis. In the United States, research showed that while land-based debris did increase with population where population densities are low, this relationship did not hold at higher population densities (Ribic et al., 2010). Cities, even in less developed countries, can leverage economies of scale, and may have better systems for managing waste. Thus, the relationship between population and mismanaged waste is not necessarily linear.
In our modeling of empirical data from seven different countries, neither local population density nor one proxy for population (nightlights) were among the most critical factors to explain the variability in the data. Many of the top models did not include either term, and their effect was never significant, whether looking at individual countries or at all countries combined. Moreover, population density was negatively correlated with debris in two of the five individual models in which it appears, and nightlights were negatively correlated with debris density in four of the six countries. In our model regressing population density alone against the total amount of debris across all survey sites, while the relationship was significant and positive, the deviance explained was only 1.25% of the pattern observed.
The distribution of sampling sites in individual countries is quite wide ranging, both geographically as well across the suite of social and environmental factors, land use and human activities (e.g., incorporating urban and rural sites). If population density is not a critical factor at this scale, it is unlikely that the pattern will be reversed at international or continental scales.
The results of the this work indicate that it is critical to develop a more nuanced approach for estimating debris levels, if we are to develop accurate predictive models. Debris densities are extremely heterogeneous, and vary depending on a range of factors, including broad scale characteristics such as land use, finer scale details such as survey type, socio-economic patterns, existing infrastructure and environmental factors. The underlying drivers of debris distribution are complex, and difficult to capture accurately. What is clear, though, is that in all of the models, population density alone did not adequately explain the observed debris distribution. Relying on population density as a primary (or sole!) proxy, as has been done previously, will lead to an inaccurate characterization of debris distributions, and potentially to flawed policy responses based accordingly.
We ranked the seven countries surveyed based on empirical data collected by our teams according to the total debris load. We compared these counts to the per capita rank presented in Jambeck et al. (2015). Our ranking is based on the country coefficient in the model with all countries, and therefore considers population and the other debris drivers in the model. We found very little similarity in rank between our empirical estimates and those reported by Jambeck et al. (2015) (Table 5). This is likely due to a combination of factors. First, our analyses included local variables, as well as additional global scale variables that the Jambeck paper did not incorporate. Second, our analyses were based on empirical data. Finally, our surveys took place at a city or watershed scale rather than at a country level. The differences between the relative rankings only serves to highlight the importance of accurate models based on empirical data, so that limited resources for addressing the problems of litter and mismanaged waste can be most effectively deployed.
Many studies that empirically quantify debris leakage are conducted along beaches and coastlines (Serra-Gonçalves et al., 2019). However, it is of critical importance to also measure inland debris if we are to fully understand loss rates to the environment. Debris from inland areas is transported to the sea via rivers, along roadways and by wind transport. Measuring debris in non-coastal areas helps to contextualize the factors that influence where debris originates in the environment, before it moves along various pathways, potentially arriving in the coastal or marine environment. Because of the high heterogeneity of inland areas, and the idiosyncratic nature of waste generation, it is crucial to design sampling that takes into account the inherent variability not only in the physical landscape, but also in the suite of factors that influence debris distribution.

Summary/Conclusion
Efforts to remove or prevent debris from entering the environment would be facilitated by a better understanding of the variability in its distribution, and the factors that affect debris density in the environment. The models presented here can also be used to derive large scale predictions of debris hotspots based not only on global data layers, but also on empirical data. These predictions could inform local and regional waste management policies and decisions on waste infrastructure. The results of this study demonstrate that the environmental context (e.g., landuse, site type) is critical in understanding and predicting the amount of debris in the environment. Importantly, population density is not the driving force behind debris distribution, and there is significant variability in the drivers of inland debris across countries. It is of critical importance to establish monitoring programs to understand the baseline levels of debris, not only in order to have accurate estimates of ocean debris inputs, but also so that the effectiveness of land-based interventions can be assessed.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
QS, BH, and CW developed the methods. All authors contributed to data collection and logistics. QS and CW developed the analytical techniques. TL prepared GIS covariates. QS wrote the manuscript, with editing by BH, CW, RR, and JH.

FUNDING
This work has been funded by CSIRO Oceans and Atmosphere, Oak Family Foundation, Schmidt Marine Technology, the PM Angell Foundation and Earthwatch Institute Australia.

ACKNOWLEDGMENTS
We would like to extend our deepest gratitude to the tireless efforts of volunteers, students, and staff members who helped to collect data in the field. Also thank you to the two reviewers for their constructive comments.