Environmental pressures, tumor characteristics, and death rate in a female breast cancer cohort: a seven-years Bayesian survival analysis using cancer registry data from a contaminated area in Italy

Introduction In Taranto, Southern Italy, adverse impacts on the environment and human health due to industrial installations have been studied. In the literature, few associations have been reported between environmental factors and breast cancer mortality in women. The aim of this study was to investigate the relationships between residence in areas with high environmental pressures, female breast cancer characteristics, and death rate. Methods Data from the Taranto Cancer Registry were used, including all women with invasive breast cancer diagnosed between 01 January 2015 and 31 December 2020 and with follow-up to 31 December 2021. Bayesian mixed effects logistic and Cox regression models were fitted with the approach of integrated nested Laplace approximation, adjusting for patients and disease characteristics. Results A total of 10,445 person-years were observed. Variables associated with higher death rate were residence in the contaminated site of national interest (SIN) (HR 1.22, 95% CrI 1.01–1.48), pathological/clinical stage III (HR 2.77, 95% CrI 1.93–3.97) and IV (HR 17.05, 95% CrI 11.94–24.34), histological grade 3 (HR 2.50, 95% CrI 1.20–5.23), Ki-67 proliferation index of 21–50% (HR 1.42, 95% CrI 1.10–1.83) and > 50% (HR 1.81, 95% CrI 1.29–2.55), and bilateral localization (HR 1.65, 95% CrI 1.01–2.68). Variables associated with lower death rate were estrogen and/or progesterone receptor positivity (HR 0.61, 95% CrI 0.45–0.81) and HER2/neu oncogene positivity (HR 0.59, 95% CrI 0.44–0.79). Discussion The findings confirmed the independent prognostic values of different female breast cancer characteristics. Even after adjusting for patients and disease characteristics, residence in the SIN of Taranto appeared to be associated with an increased death rate.


Introduction
Female breast cancer is a leading cause of cancer incidence and mortality worldwide, with a global estimate of 2.3 million incident cases, 7.8 million prevalent cases, and 685,000 deaths in 2020.It is a global phenomenon affecting individuals of any age after puberty.Among all types of cancer, it causes the most disability-adjusted life years (DALYs) among women (1,2).
Early cancer detection (screening) and treatment (surgery, radiation, and medical therapy) have proven to be effective, achieving survival probabilities up to 90% or higher (1).Specifically, the treatment of breast cancer is based on the tumor's biological subtyping.Numerous disease characteristics, such as TNM staging, histological grading, proliferation activity (Ki-67%), estrogen receptor (ER), progesterone receptor (PR) and HER2/neu positivity, topography, and bilaterality, have been researched as potential prognostic indicators or as targets for specific therapies (1,(3)(4)(5)(6)(7).Tumor, node, and metastases (TNM) classification is a system used to describe the amount and spread of cancer in a patient's body: T describes the size of the tumor and any spread of cancer into nearby tissues, N describes the spread of cancer to nearby lymph nodes, and M describes metastasis, i.e., the spread of cancer to other parts of the body.TNM combinations are grouped into five less-detailed stages, from 0 (carcinoma in situ: abnormal cells are present but have not spread to nearby tissues) to I-II-III (invasive cancer: the higher the number, the larger the tumor and the more it has spread into nearby tissues) to IV (invasive, metastatic cancer: cancer has spread to distant parts of the body) (3, 8-10).In addition to TNM staging, histologic grading from 1 to 3 is an important predictor of disease outcome in breast cancer patients, with a higher tumor grade (lower differentiation) being associated with poorer prognosis (differentiation describes how much a tumor resembles the normal tissue from which it arose) (3,5,11).Related to grading is the expression of Ki-67, a nuclear protein present during the late G1, S, G2, and M phases of the cell cycle.It has been demonstrated that high Ki-67 expression (%), which reflects the proliferation activity of tumor cells, is associated with a higher risk of relapse and worse survival in breast cancer patients (3,5).Hormonal receptor status is another important prognostic factor in breast cancer patients.Estrogen receptor-positive (ER+) and progesterone receptorpositive (PR+) tumors are likely to respond to endocrine therapies such as tamoxifen or aromatase inhibitors.Hormone therapy reduces the chance of recurrence by nearly half.Moreover, breast cancers may independently overexpress a molecule called the HER2/neu oncogene and these HER2/neu + tumors are amenable to treatment with targeted biological agents such as trastuzumab, with improvement of survival (1,3,4).Moreover, associations were reported between cancer topography, laterality, and death rate.Specifically, better survival was associated with greater tumor-nipple distance in old patients, while an increased death rate was reported for synchronous bilateral breast cancer patients (6,7).
With regard to socio-economic and environmental factors, a shorter breast cancer-specific survival in women from disadvantaged neighborhoods has been reported (12).Evidence from a meta-analysis indicated that patients residing in rural areas were more likely to be diagnosed with more advanced breast cancer compared with patients from urban areas (13).Moreover, associations were reported between air pollutants and mortality in women with breast cancer (14,15).
In some areas of the province of Taranto, a coastal city in the Apulia Region, Southern Italy, various industrial installations and polluting sources (a steel plant, an oil refinery, urban discharges, harbor activities, and the ship-yard of the Italian Navy) have been operating in close proximity to the resident population for decades with well-known and extensively studied adverse impacts on the environment and human health (16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30).With regard to environmental, feed, and food impacts, it is of particular importance that the area shows contamination of these matrices by metals and persistent organic pollutants, specifically dioxins and PCBs.Moreover, some of these substances have been detected in human biological samples (17,(20)(21)(22)(23)(24)(25).As far as human health effects are concerned, evidence has been produced after studying the populations who resided in the contaminated site of national interest (SIN) of Taranto.In particular, cohort studies have reported an increased risk for different types of cancer incidence, including breast cancer incidence in women (16, 27).Some studies have also noted an increased risk for all-cause hospitalization; for circulatory, respiratory, digestive, and urinary diseases hospitalization; and for different types of cancer hospitalization, including breast cancer hospitalization in women (16,29,30).Different studies have also indicated an increased risk for all-cause mortality; for circulatory and digestive diseases mortality; and for some types of cancer mortality, with no significant evidence for breast cancer mortality in women (16,19,26,29,30).
To summarize, associations have been reported between the aforementioned factors and breast cancer mortality in women.The aim of this study was to investigate the relationships between residence in areas with high environmental pressures, female breast cancer characteristics, and death rate.

Study area and baseline epidemiological data
The study area is the province of Taranto, which consists of 29 municipalities with a total resident population of 559,892 inhabitants on 1 January 2022 (31).The SIN of Taranto consists of two municipalities, Taranto (the provincial capital) and Statte, with 189,461 and 13,136 inhabitants, respectively, on 1 January 2022 (29)(30)(31).The study area with municipalities and SIN is shown in Figure 1.The map was created with QGIS version 3.28.4.
From 2015 to 2019, Taranto Province recorded 2,446 female breast cancer (ICD10 codes C50.0 to C50.9) cases, with a directly standardized rate of 147.6 cases per 100,000 inhabitants and a median age of 61 years.In the same period, 63% of patients with breast cancer requiring hospitalization were admitted to a hospital in the Taranto Province, 25% to an extra-provincial hospital in the Apulia Region, and 12% to an extra-regional hospital.Between 2013 and 2017, the relative standardized 5-year female breast cancer survival was 85.6 (95% CI 83.1-87.7)(27).
From 2013 to 2017, in the SIN, 216 female breast cancer deaths were recorded, with a standardized mortality ratio (reference: Apulia Region) of 99 (90% CI 89-111) (29).From 2015 to 2019, in the SIN, 979 female breast cancer cases were recorded, with a directly standardized rate of 155.6 cases per 100,000 inhabitants and a

Data source and cancer cohort
Data collected from the Taranto Cancer Registry of the Italian Association of Cancer Registries (AIRTUM) were used, including all women with invasive breast cancer (ICD10 codes C50.0 to C50.9) diagnosed between 1 January 2015 and 31 December 2020 who resided in Taranto Province at the time of diagnosis.The follow-up period considered for this study was until 31 December 2021.Death certificate only cases (n = 29), cases registered based on an autopsy report (n = 1), and patients under 30 years (n = 12) were excluded.As a general rule, baseline patients and disease characteristics refer to the time of diagnosis.Mortality data (all-cause mortality) relative to the follow-up period (2015-2021) were retrieved from the Taranto Province's Causes of Death and Health Registries.Patients with no mortality follow-up information due to extra-provincial transfer before 31 December 2021 (right-censoring, loss-to-follow-up) contributed to the person-time until the date of transfer (n = 7).
In the cross-sectional study, the studied outcomes were each of the tumor characteristics (prevalence), and the studied exposure was residence in areas with high environmental pressures.The aim of this step was to assess possible associations between environmental factors and each of the tumor characteristics.In the longitudinal study, the studied outcome was all-cause death (incidence), and the studied exposures were residence in areas with high environmental pressures and tumor characteristics.The aim of this step was to assess possible associations between environmental factors, tumor characteristics, and death.Adjustment variables recorded at the time of diagnosis were age class (30-39, 40-49, 50-59, 60-69, 70-79, and ≥ 80 years), year, month, patient ID, municipality of residence, and tumor morphology (ICDO3M classification).Adjustment for the patient's ID and municipality of residence was provided to account for the heterogeneity related to possible unobserved individual or ecological level variables (e.g., genetics, heredity, tobacco use, alcohol consumption, deprivation index, and access to health services).

Statistical analysis
Data analysis was performed using R version 4.2.3.Bayesian inference was performed with package INLA version 22.12.16.Complex models could be fitted with the Bayesian approach, including generalized linear models and survival analyses.The possible non-independence and heterogeneity of observations could be taken into account by fitting mixed models with both fixed and random effects.While traditional survival analysis relies on parameter estimation based on partial likelihood, Bayesian approaches for time-to-event data allow us to use the full likelihood to estimate all unknown elements in the model.Bayesian generalized linear models comprise Bayesian logistic regression for binary response data.However, the computation of the posterior and the other quantities of interest in these complex models is usually much more difficult than frequentist calculations.The Integrated Nested Laplace Approximation (INLA) is a deterministic method for Bayesian calculations that applies to a wide class of models called Latent Gaussian Models.INLA provides fast and accurate approximations to the posterior marginals through a clever use of Laplace approximations and advanced numerical methods taking computational advantage of sparse matrices.In most cases, INLA is both faster and more accurate than other methods for Bayesian computation (32)(33)(34)(35)(36).
The cross-sectional study analyzed the associations between residence in areas with high environmental pressures and tumor characteristics using a series of mixed effects binary logistic regressions.Pathological/clinical staging (TNM III-IV; TNM IV), histological grading (grade 3), proliferation index (Ki-67 > 20%), hormonal receptors status (ER+ and/or PR+), epidermal receptor status (HER2/neu+), topography (C50.0-1), and laterality (bilateral) were considered outcome measures binary variables.For each regression model, records with missing values for the analyzed outcome were excluded.Residence in areas with high environmental pressures was included as a fixed effect binary variable.Age class and year were included as fixed effects multinominal variables.Month was included as a cubic b-spline with 12 degrees of freedom.Patient ID and municipality of residence were included as random effects multinominal variables (random intercepts).Bayesian binary logistic regression models were fitted with the INLA approach for latent Gaussian models, computing odds ratios (OR) and 95% credible intervals (CrI).An independent and identically distributed random distribution was chosen for patient ID and municipality of residence (33)(34)(35).
The longitudinal study analyzed the associations between residence in areas with high environmental pressures, tumor characteristics, and death, using a mixed effects Cox proportional hazard regression.Time axis was the difference in days between the day of cancer diagnosis and the last day of follow-up (event or right censoring).All-cause death was considered as the outcome measure binary variable (event).The proportional hazard assumption was verified through the analysis of plotted survival curves between the different levels of the variables.Residence in areas with high environmental pressures, pathological/clinical staging (TNM I, II, III, IV), histological grading (grade 1, 2, 3), proliferation index (Ki-67 ≤ 20%, 21-50, >50%), hormonal receptors status (ER-and PR-, ER+ and/or PR+), epidermal receptor status (HER2/neu-, HER2/ neu+), topography (C50.0-1,C50.2-5, C50.6-8), and laterality (right, left, and bilateral) were included as fixed effects binary or multinominal variables.An "NA" (not available) category was created for the records with missing values for the analyzed exposures.Age class and year were included as fixed effects multinominal variables.Month was included as a cubic b-spline with 12 degrees of freedom.Patient ID, municipality of residence, and tumor morphology were included as random effects multinominal variables (random intercepts).Bayesian Cox regression models were fitted with the INLA approach for latent Gaussian models, computing hazard ratios (HR) and 95% credible intervals (CrI).An independent and identically distributed random distribution was chosen for patient ID, municipality of residence, and tumor morphology, while a random walk model of order two was chosen for the baseline hazard function (32)(33)(34)(35)(36).
Generalized Variance Inflation Factors (GVIF) were calculated to test the presence of multicollinearity in the data.Sensitivity analyses were performed by examining the extent to which the results were affected by changes in methods, models, variables, influential observations, and inclusion/exclusion criteria.Different combinations of included patients and variables were tested, and for the included variables, different collapsed categories, as well as changes in the type of estimated effects (fixed or random), were also tested.The models were iteratively refitted by excluding from the dataset each age class, year, and month one at a time.

Results
Baseline patients and disease characteristics are shown in Table 1.A total of 10,445 person-years were observed, 4,068 for residents in SIN and 997 for deceased patients, with a median (IQR) age of 61.0 (50.0,72.0)years.
The results of the mixed effects Bayesian binary logistic regression models are reported in Table 2. Adjusting for baseline age class, year, month, patient ID, and municipality of residence, the fixed effect variable residence in SIN did not appear to be clearly associated with the prevalence of the investigated tumor characteristics.However, the lower limit of the 95% credible interval for TNM IV is quite close to one (OR 1.29, 95% CrI 0.96-1.73).
Survival probabilities conditional on each analyzed variable and unconditional on other variables are shown in Figure 2. The curves suggested unconditional associations between survival probability and residence in SIN, pathological/clinical staging, histological grading, proliferation index, hormonal receptors status, epidermal receptor status, topography, and laterality.The results of the mixed effects Bayesian Cox proportional hazard regression model are reported in Table 3. Mutually adjusting and adjusting for baseline age class, year, month, patient ID, municipality of residence, and tumor morphology, the fixed effects variables associated with a higher death rate were residence in SIN (HR 1.22, 95% CrI

Discussion
The results of the present study confirmed that TNM staging, histological grading, proliferation index, estrogen and/or progesterone positivity, HER2/neu positivity, and bilaterality are independent prognostic factors in breast cancer patients.In our cohort, only topography did not seem to be independently associated with the analyzed outcome.Of interest was the finding during the follow-up period of an overall average negative association between HER2/neu positivity and death rate in our cohort, although it may be non-constant over time.This association could be explained by the development and implementation of HER2/neu targeting treatments.In fact, patients with HER2/neu + tumors are amenable to treatment with targeted biological agents such as trastuzumab.For these reasons, the presence of this marker could be assumed as a proxy of treatment with these drugs, which is information not directly available in the cancer registry.Given this assumption, our findings could be supported by the scientific literature about the improvement of survival in the treated patients (1, 3).Another interesting result was the negative prognostic value of the presence of missing data for some variables ("NA" category) in our cohort.This could be partly explained by the fact that patients without information on some variables (e.g., grading) could correspond to poor prognosis patients who, due to a severe condition at the time of diagnosis, were unable to undergo further interventions or investigations (37).Regarding the impacts of environmental pressures on tumor characteristics, this study found no clear association between living in the SIN and the prevalence of the prognostic factors mentioned above.Conversely, the most important finding seems to be the association between residence in SIN and increased all-cause death rate.This result was observed independently of all other factors analyzed, as the HR was adjusted for the other variables included in the Bayesian mixed effects regression model.In summary, this suggests that the patients in the studied cohort that resided in the SIN have an adjusted excess relative risk for all-cause mortality of 22% (95% CrI 1-48%) compared to the residents in the other municipalities of the province.
The specific environmental factors that may be associated with this excess mortality could be related to some documented anthropic pressures in the SIN: harbor, discharge, oil refinery, and steel plant (29,30).Several pieces of evidence in the study area have shown the contamination of environmental, feed, and food matrices (e.g., Frontiers in Public Health 08 frontiersin.orgmussels and eggs) with metals and persistent organic pollutants, such as dioxins and PCBs, in relation to potential foodborne exposure.Some of these substances or their metabolites/markers have also been detected in human biological samples (17,(20)(21)(22)(23)(24)(25).With regard to air pollution, studies have documented air pollution originating from the industrial area (e.g., particulate matter, sulfur dioxide, and polycyclic aromatic hydrocarbons) and the impacts on human health (16, 18, 19, 26, 28-30, 38).Our findings confirm previous knowledge regarding the increased risk of all-cause mortality reported for women who resided in the SIN of Taranto (16,19,26,29,30).However, they also suggest that the excess relative risk in the cohort analyzed may be greater.In fact, recent epidemiological studies on all the female population residing in the area have shown an excess relative risk for all-cause mortality of 7% (90% CI 5-9%) in the SIN of Taranto compared to the Apulia Region in the years 2013-17 (29).Although we are considering different periods and methods, and the credible interval of our study fully contains the aforementioned estimate with its confidence interval, we could not discount the possibility that the excess mortality risk in the SIN of Taranto might be higher in the cohort of women with breast cancer.Therefore, our findings suggest that frail female breast cancer patients may be more vulnerable to the risks associated with a disadvantaged or polluted external environment.This could be also consistent with the findings reported in the few studies that analyzed the association between socio-economic-environmental pressures and the prognosis of female breast cancer (12,14,15).
These results raise possible ethical questions, if confirmed.As a matter of fact, several epidemiological studies have reported an increased risk for breast cancer incidence and hospitalization and for all-cause mortality in the entire female population that resided in the SIN of Taranto (16,19,26,27,29,30).According to the present study, this excess relative risk for all-cause mortality might be higher in the cohort of female breast cancer patients.
However, since this study used an ecological variable (i.e., residence in the SIN of Taranto) to ascertain exposure to environmental pressures, this approach is potentially prone to ecological fallacy.In addition, there is a lack of specificity in the exposure assessment as the specific chemical pollutants could be varied and come from different sources in the studied region (37).Another limitation of the study could be the lack of information about genetics and hereditary.These data are unfortunately not available in cancer registries or health records.However, part of the influence of these factors may have been indirectly captured in the analysis using mixed models with random effects, which take into account the heterogeneity between patients and areas.
Nonetheless, regardless of these limitations, all the other evidence already available on Taranto gives a high a priori plausibility of the association between residence in SIN and increased mortality detected in this study, also considering that this association persists despite adjustment for all other measured patients and disease characteristics.Moreover, in addition to the well-known pressures of a strictly environmental nature, the two municipalities of Taranto and Statte present relatively high municipality-level deprivation indexes.This is a regionally referenced deprivation index that used the individual data of the general population and housing census of 2011.For the calculation of the index, five conditions were chosen by the authors to best describe the multidimensional concept of social and material deprivation: low level of education, being unemployed, living on rent, living on rent, and living in a single-parent family.The index was calculated as the sum of standardized indicators and is also available and categorized into quintiles (39).Although including the municipality of residence as a random effect in the regression models provided some ecological-level adjustment for the deprivation index, it can also be useful to consider the value of the index itself for descriptive purposes when interpreting the results.Anyhow, particular attention should be paid to the interpretation of this index both due to its ecological-level indicator nature and since the latest available index relates to the 2011 census (39).Therefore, the available deprivation index cannot be guaranteed to accurately indicate individual-level deprivation during the years covered by the present study.
However, it is important to consider that socio-economic status, deprivation, and inequalities could not only exert an effect on lifestyle harmful habits (e.g., cigarette smoking), health conditions, and mortality but also potentially affect the utilization of health services (39,40).Furthermore, in this regard, the SIN corresponds almost completely to the provincial capital Taranto, which could potentially influence access to health services at a territorial and hospital level, and in terms of regional and extra-regional mobility.This is linked to another limitation of the study, which is the lack of available data about the diagnostic-therapeutic pathways followed by these patients, including information regarding their access to breast cancer screening services.On the other hand, the fact that both SIN and extra-SIN municipalities belong to the same Local Health Authority, and so consequentially the entire studied cohort could virtually access the same healthcare and screening services, did not lead us to suppose that there could be relevant biases in relation to these aspects (37).
To summarize, as mentioned previously, the lack of information about individual-level environmental exposures, genetics and hereditary factors, socio-cultural indicators, harmful habits, and utilization of healthcare services could represent a limitation of the present study.However, from another perspective, the same elements could also be considered starting points for what can be done in the future.Specifically, it would be interesting to update and expand upon this epidemiologic study by recovering further individual-level data about genetics and hereditary factors (BRCA1, BRCA2, PTEN, and TP53 mutations) (41), environmental exposures (distance from polluting sources, exposure to airborne pollutants through dispersion models, and biomonitoring), socioeconomic factors (updated indicators of deprivation at individual or census-tract level), access to secondary prevention programs (screening path through mammography, echography, and genetic tests), and diagnostic-therapeutic-surgical paths (timing, place, and type of interventions).
In conclusion, the results confirmed the independent prognostic values of different female breast cancer characteristics.Despite the limitations discussed above, even after adjusting for patients and disease characteristics, in the cohort of women with invasive breast cancer, residence in the SIN of Taranto appeared to be associated with an increased death rate.

FIGURE 2
FIGURE 2Survival probabilities in the female breast cancer cohort, conditional on each analyzed variable and unconditional on other variables.Province of Taranto, 2015-2020, follow-up to 31/12/2021.Time: days of follow-up.Outcome (incidence): all-cause death.

TABLE 1
Baseline patients and disease characteristics and follow-up survival status in the female breast cancer cohort by residence in SIN and survival status.

TABLE 2
Results of the mixed effects Bayesian INLA binary logistic regression models in the female breast cancer cohort, adjusted for baseline age class, year, month, patient ID, and municipality of residence.

TABLE 3
Results of the mixed effects Bayesian INLA Cox proportional hazard regression model in the female breast cancer cohort, mutually adjusted and adjusted for baseline age class, year, month, patient ID, municipality of residence, and tumor morphology.Province of Taranto, 2015-2020, follow-up to 31/12/2021.Time: days of follow-up.Outcome (incidence): all-cause death.N: incident cases and non-cases.n: incident cases.