Assessing the Complex Links Between Soils and Human Health: An Area of Pressing Need

Investigating the links between soils and human health provides a wealth of opportunities and challenges in the coming years for soil scientists as well as those working in other related and relevant fields. Many aspects of soils and human health have been investigated and reported on in the scientific literature. One of the earliest recognized and most investigated links is the role of soils in producing adequate quantities of nutritious foods, something that requires appropriate biological, chemical, and physical properties in agricultural soils (1, 2). Other areas of work have included soil contamination by trace metals (3) organic chemicals (4, 5), radioactive materials, including radon (3), microplastics (6), nanoparticles (7), the effects of interactions between soil and water on human health (8), and how soil organisms influence human health (9, 10). Many of the pharmaceuticals used in modern medicine have their origin in soils, from antibiotics to cancer treatments and antacids (11, 12). There has also been interest in topics such as communicating the importance of soil to human health to the public and creating connections between soil scientists and other professionals to work on the links between soils and human health (13, 14). The idea that there are links between soils and human health is not new (15–17), but despite this long history of recognition true scientific investigation of soil and human health links did not come until much later, with a large portion of such work being anecdotal even at the end of the twentieth century (18). Therefore, there is a pressing need to enhance the scientific study and understanding of these links. However, soil and human health connections are extremely complex, a situation that makes the traditional scientific approach of isolating a variable and seeing how it influences the system less than ideal for the investigation of these links. Improving on this situation is a major challenge in the area of soils and human health as we move forward.

Investigating the links between soils and human health provides a wealth of opportunities and challenges in the coming years for soil scientists as well as those working in other related and relevant fields. Many aspects of soils and human health have been investigated and reported on in the scientific literature. One of the earliest recognized and most investigated links is the role of soils in producing adequate quantities of nutritious foods, something that requires appropriate biological, chemical, and physical properties in agricultural soils (1,2). Other areas of work have included soil contamination by trace metals (3) organic chemicals (4,5), radioactive materials, including radon (3), microplastics (6), nanoparticles (7), the effects of interactions between soil and water on human health (8), and how soil organisms influence human health (9,10). Many of the pharmaceuticals used in modern medicine have their origin in soils, from antibiotics to cancer treatments and antacids (11,12). There has also been interest in topics such as communicating the importance of soil to human health to the public and creating connections between soil scientists and other professionals to work on the links between soils and human health (13,14).
The idea that there are links between soils and human health is not new (15)(16)(17), but despite this long history of recognition true scientific investigation of soil and human health links did not come until much later, with a large portion of such work being anecdotal even at the end of the twentieth century (18). Therefore, there is a pressing need to enhance the scientific study and understanding of these links. However, soil and human health connections are extremely complex, a situation that makes the traditional scientific approach of isolating a variable and seeing how it influences the system less than ideal for the investigation of these links. Improving on this situation is a major challenge in the area of soils and human health as we move forward.

COMPLEX LINKS
A classic case study will serve to demonstrate the complex links between soils and human health. Osteomalacia is a disease commonly known as itai-itai disease, which is Japanese for "it hurts, it hurts!, " a phrase often repeated by its victims in the disease's final stages (19). The symptoms of itai-itai include weak and brittle bones that break with increasing frequency as the disease progresses, a waddling gait when walking due to bone deformities, anemia, and renal failure, problems that ultimately lead to the victim's death (20). In some of the more extreme cases, one patient lost 30 cm of height due to fractures in the vertebrae and another suffered 28 fractures just in their ribs, with additional fractures in other bones (19). Cases of itai-itai were recognized in several regions of Japan in the early twentieth century, with the most serious of these occurring in the Toyama district (19). Itai-itai is linked to excessive cadmium exposure. In the case of Toyama, the exposure was linked to zinc extraction at the Kamioka mine, and most particularly with a change in the treatment method to separate zinc from the ore that was implemented in 1909 and released treatment waste into the Jinzu River (19). Rice paddies were irrigated with cadmium contaminated water from the river and cadmium accumulated in the rice and the people who consumed it. The first reports of itai-itai disease symptoms were made in about 1912 (20), but it was the 1960s before the Japanese government determined that itai-itai was caused by chronic cadmium poisoning (19). The maximum releases of cadmium from the mine occurred in 1970, and in 1971 itai-itai victims won a lawsuit against the Mitsui Mining and Smelting Co. that led to reparations for the victims and environmental remediation measures have been undertaken (19).
At first glance the itai-itai example looks like a straightforward case of trace metal release into the environment poisoning the Toyama district population. However, the residents of Shipham, England live with soils that have cadmium levels that are about 30 times higher than those found in the soils of the Toyama district, without suffering the same adverse health effects seen in Japan, meaning the links between soils and human health in the case of cadmium must be more complex than just the level of cadmium in the soil. Ideas advanced to explain this discrepancy include cadmium being more bioavailable in the anaerobic rice paddy soils than it is in the aerobic English soils and differences in pH between the soils creating different cadmium availability (21). Other differences may not be due to the soils, but to diet. Compared to the English diet, the diet of the Japanese victims was relatively deficient in iron and zinc, a situation that led to cadmium retention in the Japanese population (22). What the comparison of the Shipham and Japanese cases makes clear is that the relationship between soils and itai-itai disease is much more complex than just the level of cadmium in the soil.
Much like ita-itai disease, we have little knowledge of the complex reactions that take place between soils and the various organic chemicals and chemical mixes that we apply during, for example, modern agricultural operations (4). We have little understanding of the ecology of soil pathogens, something that depends on the complex and interwoven relationships in the soil environment (23). There is still considerable disagreement within the research community regarding whether geophagy is a net positive or negative to human health, under what conditions, and through which mechanisms (24), again because of the complex interactions. Therefore, it is critical that we find ways to adequately study and account for these complex interactions that influence human health through the soil environment.

APPROACHES TO RESEARCH AND ANALYSIS
Recognition that the soil-human health link is more complex than just the level of a chemical in the soil is not new. For example, several indices have been developed to relate soil contamination to human health risk, some of these can be seen in Table 1. However, making significant advances in our understanding of the links between soils and human health will require approaches to the study of these links that are able to account for multiple variables and have not typically, or traditionally, been used within soil science. The field of epidemiology studies the distributions and determinants of health issues, including environmental exposures (27), meaning there is significant overlap with the soil and human health work being conducted by soil scientists and related professionals. This also means there is opportunity for those interested in the links between soils and human health to learn from the approaches and techniques of epidemiology. There are several approaches that have been utilized in recent epidemiology research that may also have application in soils and human health studies, but that have not been widely used in soil science. These include causal diagrams, marginal structural models, and propensity score methods (28). Machine learning, as a part of artificial intelligence, is another technique being used in epidemiology (28) that has also received an increasing amount of attention in soil science (29), though not necessarily in studies of soil and human health.
Causal diagrams have been used by epidemiologists to help identify variables that need to be measured and controlled so that un-confounded effect estimates can be found (30) (Figure 1). The fundamental idea is that in studies where people have prolonged exposure to the pollutant being studied, risk factors are typically determined by subsequent exposures. A soil science example is as follows. A farm worker is exposed to a pollutant in the soils of their workplace. If that farm worker retires, their last day of employment at the polluted farm is a determinate of future exposure (exposure ceases once they leave the farm) and an independent risk factor for death due to the "healthy worker" effect (those who work are typically healthier than those who do not). To address these intertwined effects, causal diagrams use a graphical approach to study mortality issues where exposures happen over long time periods (32). This relative simplicity makes causal diagrams easy to work with, understand, recognize, and use. In other words, "simplicity makes things easy." Marginal structural models are a class of causal models that use both groups that have been exposed (e.g., to a pollutant or pathogen) and those that have not as the standard to arrive at a nonparametric standardization (33). This is important in studies where there are confounding factors that are time-dependent and are also affected by previous treatments (32).
Propensity score methods combine propensity score matching with measurement error regression models to address unmeasured confounding factors (34) (Figure 2). For example, say one wanted to study the health effects of geophagy, which requires an observational study. It might be considered unethical to assign individuals to consume soil because it may cause negative health effects, so the researcher would compare those who practice geophagy of their own volition to those who do not. This could introduce bias (confounding factors) because some people (e.g., women, particularly pregnant women, children, those with a nutrient deficiency, or a food toxicity problem) are more likely than the general population to practice geophagy.  (25,26).

Index
Basic measurement being made

Biogeochemical index
Compares the content of a given trace element in the O horizon to the content of the same element in the A horizon.

Contamination degree
The sum of the contamination factors for a given sample.

Contamination factor
The ratio between the content of a given metal and its background concentration.

Enrichment factor
Measures the impact of anthropogenic activity on soil trace element concentrations. It does this by comparing the ratio of the concentration of an element of interest to the concentration of a metal with a low variability of occurrence to the ratio of the background levels of the same two elements.

Geoaccumulation index
Evaluates soil contamination by a single trace element by calculating the log of its concentration divided by 1.5 times its background concentration.

Modified degree of contamination
Contamination degree divided by the number of analyzed elements.
Nemerow pollution index Calculated for O and A horizons, assesses overall degree of soil contamination considering all trace elements that have been analyzed. Single pollution index Used to determine which trace element is the highest threat in a given soil environment, calculated as the concentration of a trace element in the soil divided by its background value. Similar to contamination factor.

Sum of pollution index
The sum of all single pollution indices calculated for a given soil. Similar to contamination degree.
Propensity score methods seek to control for these biases. Within agricultural science, propensity score methods have been used to create valid comparison groups in a study of whether clinics (agronomic extension workshops) made farmers more knowledgeable of pests and diseases that negatively affect their crop yields, a food security issue that relates to human health. Specifically, propensity score methods allowed the researchers to increase the possibility that differences between the two study groups were due to the training they received in the clinics rather than some other uncontrolled for factor (age, education level, farm size, etc.) (35). Machine learning uses algorithms that "learn" how to process data to make predications or reach decisions that the computer was not explicitly programmed to. This ability opens a plethora of possibilities regarding the evaluation of complex relationships, such as those found in studies of soil and human health. However, utilizing machine learning is not popular in soil and human health studies for two reasons. The first is that the common machine learning algorithms [e.g., artificial neural networks (ANN)] have difficulty detecting the correct data patterns related to soil health studies. ANN requires a large sample size, which is a limitation in many agricultural studies. The second reason is that soil scientists are interested in using interpretable machine learning algorithms, and a large majority of these algorithms are black boxes. Khaledian and Miller (36) discussed the selection of appropriate machine learning algorithms for soil mapping based on the purpose of the mapping and nature of the data. For example, decision tree learning approaches, e.g., Cubist (37), can provide promising and intelligible results with small sample sizes.
Google Scholar searches were used to provide some quantitative comparisons regarding the use of the approaches discussed above in soil science and epidemiological studies. It was expected that more epidemiology papers are published than soil science papers, so the first goal was to establish the expected ratio between them. A basic search for "epidemiology" anywhere in the article from 2011 to 2021 yielded 2,030,000 results, the same search for soil science yielded 342,000. So, soil science papers over the last 10 years have only represented about 16.8% as many papers as epidemiology. A Google Scholar search for "soil science" and "causal diagram" returned 80 results, while the same search substituting "epidemiology" for "soil science" returned about 2,800 results, with soil science papers representing only 2.8% of the epidemiology papers. A Google Scholar search for "marginal structural model" and "soil science" did not return any results, and "marginal structural model" and "soil" only returned 34 results, while a search that substituted "epidemiology" for "soil science" returned about 3,320 results. Therefore, the more specific "soil science" publications were published at 0% the rate of the epidemiology papers, while the less FIGURE 2 | Flow charts for traditional statistical (straight from variable selection to results) and propensity score approaches. The traditional approach limits the number of variables that can be fed into the final model, while propensity score methods simplify the final model, which allows more covariates in the first step. The example given shows a potential set-up to investigate coccidioidomycosis (Valley Fever), a disease caused by a soil-borne fungus that is common in the western USA.
specific "soil" publications were published at 1.0% the rate of the epidemiology papers. A Google Scholar search for "propensity score methods" and "soil science" only returned 26 results, while the same search substituting "epidemiology" for "soil science" returned about 13,100 results, with soil science papers being published at only 0.2% the rate of epidemiology papers. A Google Scholar search for "soil science" and "machine learning" returned about 8,380 results, but searching "soil, " "human health, " and "machine learning" only returned 45 results (about 0.5% of all soil science papers that include machine learning). Therefore, soil scientists are showing interest in machine learning techniques, but that interest is not yet being extensively applied to soil and human health connections. Searching "epidemiology" and "machine learning" returned about 158,000 results, showing that machine learning papers in soil science are only published at 5.3% the rate of machine learning papers in epidemiology. Clearly these techniques to account for confounding factors are more widely used in epidemiology than in soil science, even though confounding factors are an issue in soil and human health (and other soil science) research.

CONCLUDING THOUGHTS
Using selected analysis techniques commonly utilized in epidemiological studies to account for confounding factors may help shed light on complex soil and human health issues, such as the example of different outcomes given high soil cadmium levels in Toyama and Shipham. However, Google Scholar searches show these techniques are not currently being widely used to investigate soil and human health relationships. The extensive use of these techniques in epidemiology studies, which include links between human health and various environmental factors, indicates these same techniques have promise to improve understanding of soil and human health relationships.