Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Public Health, 25 June 2025

Sec. Environmental Health and Exposome

Volume 13 - 2025 | https://doi.org/10.3389/fpubh.2025.1600729

Identifying emphysema risk using brominated flame retardants exposure: a machine learning predictive model based on the SHAP methodology

Qihang XieQihang Xie1Haoran QuHaoran Qu1Jianfeng LiJianfeng Li1Rui ZengRui Zeng1Wenhao LiWenhao Li1Rui OuyangRui Ouyang1Chengxiang ZhangChengxiang Zhang2Siyu Xie,
&#x;Siyu Xie3,4*Ming Du
&#x;Ming Du1*
  • 1Department of Cardiothoracic Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
  • 2Department of General Medicine, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
  • 3Department of Child Development, United Graduate School of Child Development, Osaka University, Suita, Osaka, Japan
  • 4United Graduate School of Child Development, The University of Osaka 2-2 Yamadaoka, Suita, Osaka, Japan

Background: Emphysema is a major contributor to lung disease progression and is associated with significant health risks, including exacerbations, mortality, and lung cancer. While environmental exposures, such as brominated flame retardants (BFRs), have been suggested as risk factors, their role in emphysema prediction has been largely overlooked. This study aimed to develop a machine learning (ML) model to predict emphysema risk incorporating BFRs exposure data and demographic characteristics.

Methods: Using data from the NHANES (2005–2016) dataset, 8,205 participants were included in the study. The participants were divided into a training set (70%) and a testing set (30%). Eight machine learning algorithms, including lightGBM, MLP, DT, KNN, RF, SVM, Enet, and XGBoost, were applied to build and evaluate the model. Demographic data and BFRs exposure levels were used as predictors. SHAP and Partial Dependence Plots (PDP) were used for model interpretability analysis.

Results: The MLP model showed the best performance with an AUC of 0.83. Age and PBB153 were identified as the most influential predictors. SHAP analysis revealed that higher exposure to BFRs, particularly PBB153, was strongly associated with increased emphysema risk. The WQS model further confirmed the positive relationship between BFRs exposure and emphysema.

Conclusion: This study demonstrates the significant predictive value of BFR exposure in emphysema risk assessment and highlights the importance of incorporating environmental factors into disease prediction models. The findings provide new insights for integrating BFRs into personalized health risk assessments and public health interventions.

1 Introduction

Emphysema, a chronic respiratory disorder defined by alveolar destruction, is a significant phenotype of COPD (1). Patients exhibiting more severe emphysema experience accelerated decline in lung function, body mass index, and fat-free mass index, accompanied by increased exacerbations, hospitalizations, and mortality rates (2). The efficacy of current treatment regimens is diminished in patients with emphysema associated with COPD (1). Emphysema serves as a substantial predictor of lung cancer risk and overall health outcomes. Studies have demonstrated that individuals with radiographic emphysema exhibit nearly double the incidence of lung cancer compared to those without (3). Given the elevated risk of exacerbations, mortality, and lung cancer associated with emphysema (4), its early identification through screening and advanced prediction methods could significantly improve patient outcomes and guide preventive interventions.

In previous studies, scholars employed blood-based emphysema predictive models that exhibited two notable limitations. Firstly, these models had relatively small sample sizes, which can compromise their ability to accurately predict outcomes. Secondly, they were limited in their scope, as they only tested one “omic” modality at a time (57). Other studies have used transcriptomic and proteomic features in combination with clinical features to evaluate the role of multiomics modeling in predicting emphysema. Ultimately, the best-performing predictive model (clinical + CBC + protein model) included predictors of clinical variables (age, sex, ethnicity, BMI, smoking), CBC (proportion of neutrophils, lymphocytes, platelets, monocytes, and eosinophils), and protein. In the clinical + CBC + gene + protein model, the top 10 predictors were ranked by absolute β coefficient, including BMI, sRAGE, PSMP protein and MIR124-1HG gene (8). There are also CT image-based models for predicting the progression of emphysema (9). However, none of them paid attention to the influence of environmental factors on the disease of emphysema.

Brominated flame retardants (BFRs) are utilized extensively in various industrial sectors, including plastics, textiles, electronics, and building materials, with the primary objective of mitigating the risk of fire hazards (10, 11). However, as additive compounds, some BFRs, including PBDEs and TBBPA, are prone to environmental release during production and use. The presence of these chemicals has been detected in various environmental media, including water, soil, dust, and even human biological fluids such as blood and breast milk (11, 12). BFRs are persistent in the environment and can accumulate in living organisms over time.

As has been demonstrated in prior studies, BFRs and their metabolites has the potential to induce a number of deleterious effects on bodily functions, including nephrotoxicity, hepatotoxicity, reproductive and developmental toxicity, neurotoxicity, and carcinogenic effects, which can ultimately result in severe adverse health consequences (13). In more detail, the presence of BFRs has been detected in the respiratory tracts of both animals and humans. In these locations, BFRs have been shown to affect bronchial epithelial cells by means of inhibiting cell viability, activating apoptosis, inducing DNA damage, and promoting inflammatory and oxidative stress responses (1417). These changes in the respiratory tract are significant, as they are known to play a key role in the development of emphysema (18, 19). Therefore, this study combined BFRs with demographic characteristics to construct a machine learning prediction model and determined the predictive value of BFRs exposure for emphysema.

2 Methods

2.1 Study population

The National Health and Nutrition Examination Survey (NHANES) is a comprehensive interdisciplinary research initiative spearheaded by the Centers for Disease Control and Prevention (CDC). The primary objective of NHANES is the collection, analysis, and publication of data concerning health, nutrition, and environmental exposures in the United States. Since its inception in the 1960s, it has been conducted on an annual basis and encompasses all age groups within the United States. In the present study, the participants from the NHANES from 2005 to 2016 with available data on BFRs were included. Participants with missing BFRs data, a diagnosis of emphysema, and covariates were excluded from the study. The inclusion and exclusion criteria utilized in this study are illustrated in Figure 1. To evaluate the potential impact of selection bias due to missing BFR data, we compared the weighted baseline characteristics between included and excluded participants.

Figure 1
www.frontiersin.org

Figure 1. Flow chart for model development and validation.

2.2 Assessment of BFR

In the NHANES database, concentrations of polybrominated diphenylethers (PBDEs) in serum were assessed employing a two-phase protocol encompassing automated liquid–liquid extraction and subsequent sample purification (NHANES, 2019). Serum concentrations of BFRs were measured as parent compounds. Metabolites were not included unless specifically reported in the dataset, and we restricted our analysis to parent compounds to ensure consistency. However, in this study, we exclusively focused on PBB-153 and nine PBDEs with a detection rate greater than 50% (20). Specifically, these include 2,4,4′-tribromodiphenyl ether (BDE-28), 2,2′,4,4′-tetrabromodiphenyl ether (BDE-47), 2,2′,3,4,4′-pentabromodiphenyl ether (BDE-85), 2,2′,4,4′, 5-pentabromodiphenyl ether (BDE-99), 2,2′,4,4′,6-pentabromodiphenyl ether (BDE-100), 2,2′,4,4′,5,5′-hexabromodiphenyl ether (BDE-153), 2, 2′,4,4′,5,6′-hexabromodiphenyl ether (BDE-154), 2,2′,3,4,4′,5′,6-heptabromodiphenyl ether (BDE-183), decabromodiphenyl ether (BDE-209), and 2,2′,4,4′,5,5′-hexabromobiphenyl (PBB-153). For values below the limit of quantification (LOQ), NHANES substitutes these with LOQ divided by the square root of 2 (LOQ/√2), in line with standard imputation practices. We applied this substitution consistently across all BFR congeners to maintain comparability and avoid data loss.

2.3 Definition of emphysema

The emphysema status of the participants was determined according to the variable MCQ160G in the questionnaire data of NHANES 2005–2016. Individuals who responded in the affirmative to the question “Ever told you had emphysema” were classified as patients with emphysema.

2.4 Covariates

The present study took into consideration a number of sociodemographic characteristics, as previously established in preceding research. The characteristics in question encompassed age, gender, race, educational level, poverty income ratio (PIR), smoking and drinking status, and body mass index (BMI) (21, 22). Race was categorized as Mexican American, Other Hispanic, Non-Hispanic White, Non-Hispanic Black, Other Race – Including Multi-Racial. Education level was less than high school, high school, or college or above. PIR measured socioeconomic status as the ratio of household income to poverty line. Drinking status was coded as never, mild/moderate, heavy or former drinker categories. Smoking status included never, former and now smoker.

2.5 Statistical analysis

Baseline characteristics were first compared between the training and test datasets in the NHANES. Then, within both the training and test datasets, we compared the baseline characteristics between the emphysema and non-emphysema groups. Continuous variables were presented as median (IQR) or mean (SD), and categorical variables as absolute numbers with associated percentages. The demographic characteristics of subjects with different emphysema statuses were evaluated using the chi-square test and t-test. Serum BFRs were Ln transformed to ensure the attainment of a near-normal distribution (continuous variables) or segmented into four quartiles (Q1, Q2, Q3, and Q4) to form categorical variables. To justify this transformation, we have included histograms comparing the distributions of BFR variables before and after Ln transformation in the Supplementary Figure 1. In order to ascertain the relationships among the concentrations of the ten BFRs, Pearson’s correlation was implemented as a statistical analysis tool. Principal component analysis (PCA) was employed to elucidate the disparities in subject composition among varying concentrations and to ascertain the underlying structure of subject variance. The Mann–Whitney U test was further used to compare the scores of the two groups on PC1 to see if there were significant differences in PC1 between different disease states.

To assess the correlation between BFRs and the incidence of emphysema, we employed univariate and multivariate logistic regression models. Odds ratios (OR) and the corresponding 95% confidence intervals (CI) were employed to identify trends in the correlations. The regression models were structured as follows: model 1 was not adjusted for any variable, while model 2 was adjusted for age, gender, race, education level, PIR, smoking status, drinking status, and BMI. To address the risk of false positives from testing 10 BFRs across two models and four quartiles, we applied False Discovery Rate (FDR) correction. To evaluate the potential impact of unmeasured confounders on the relationship between BFRs exposure and emphysema risk, we calculated the E-value through an online website1 (23, 24). The E-value offers a quantitative assessment of the strength of association that an unobserved confounder would need to have in order to fully nullify the observed relationship (25). As part of the sensitivity analyses, we excluded participants who had been diagnosed with emphysema within 2 years prior to the NHANES survey interview. This exclusion was applied to both logistic regression and machine learning models to minimize reverse causality and assess the robustness of the observed associations between BFR exposure and emphysema risk.

We performed a Weighted Quantile Sum (WQS) analysis to evaluate both the collective and individual effects of BFRs on the prevalence of emphysema by calculating a weighted linear index and assigning appropriate weights. Bootstrapping with 1,000 iterations was applied to construct WQS indices in both positive and negative directions. When the WQS index showed statistical significance, the corresponding weights were analyzed to determine the relative contribution of each BFR within the index to emphysema prevalence. The dataset was randomly partitioned, with 40% allocated to the training set and the remaining 60% designated as the validation set.

2.6 Model development and comparison

All model development procedures were conducted within the tidymodels framework in R. The dataset was first randomly split into training (70%) and testing (30%) sets. To reduce the adverse impact of high-dimensional data on model performance, feature selection was performed on the training set using the Boruta algorithm. This random forest–based wrapper method identifies important predictors through iterative comparison with shadow features and is considered more stable than conventional filtering techniques (26). Although drinking and smoking were not selected by Boruta, they were retained in the model due to their well-established association with emphysema, as reported in previous studies (27, 28). Sensitivity analyses excluding these two variables were also conducted to test the robustness of findings.

After variable selection, the Synthetic Minority Oversampling Technique (SMOTE) was applied to the training data to address class imbalance between emphysema and non-emphysema participants. We then trained eight machine learning algorithms: Light Gradient Boosting Machine (LightGBM), Multi-Layer Perceptron (MLP), Decision Tree (DT), K-Nearest Neighbors (KNN), Random Forest (RF), Support Vector Machine (SVM), Elastic Net (ENet), and Extreme Gradient Boosting (XGBoost). Each model underwent hyperparameter tuning through ten-fold cross-validation within the training set to optimize performance. The full grid search space and final selected hyperparameters are detailed in eMethods. The Area Under the Receiver Operating Characteristic Curve (AUROC) was used to assess the predictive accuracy of the models during validation, with the goal of comparing the models based on their best performance. AUROC values range from 0.5 to 1.0, with higher values indicating better predictive capability. In addition to AUROC, several other performance metrics, including F1 score, precision, accuracy, recall, sensitivity, specificity, and the Matthews correlation coefficient (MCC), were also calculated to provide a comprehensive assessment of model effectiveness. To formally compare model discrimination, pairwise DeLong tests were conducted across classifiers, with False Discovery Rate (FDR) correction applied to account for multiple comparisons. To address potential optimism introduced by internal SMOTE application, we performed 100 bootstrap resampling iterations on the training dataset. Apparent AUCs, optimism estimates, and bias-corrected AUCs were calculated across iterations to provide a more realistic assessment of model performance. Model calibration was evaluated using the Brier score, calibration intercept, and calibration slope. To improve the reliability of predicted probabilities, Platt scaling was applied, and calibration metrics were compared before and after adjustment. Performance variability and calibration quality were visualized using bootstrap AUC distribution plots and calibration curves, enabling a comprehensive assessment of both discrimination and probability estimation.

2.7 Model interpretation

Interpretability is defined as the process of elucidating how machine learning (ML) models produce results. The opacity intrinsic to machine learning (ML) models frequently hinders their effective utilization in clinical contexts, prompting comprehensive investigation into enhancing their interpretability (29, 30). In this study, we sought to integrate an interpretable method to ascertain the importance of features and the relationships between bronchodilator-related (BFR) variables and the risk of emphysema. An in-depth evaluation was conducted to ascertain the key features that exert a substantial influence on the risk of emphysema development. This evaluation utilized two approaches: shapley additive explanations (SHAP) and partial dependence plot (PDP) (31, 32). The present study analyses non-linear relationships with PDPs, thereby enabling the identification of relationships between emphysema and its associated predictors. Specifically, one-way PDPs have the capacity to elucidate the relationship between emphysema and a specific variable (33).

In this study, we adhered to the guidelines set forth in the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) to maintain transparency and methodological rigor throughout the development and validation of our predictive model. No weighted data were applied, as demographic factors were adjusted for in the analysis (34). All statistical analyses were performed using R statistical software version 4.4.3. and Python 3.11. p < 0.05 was considered statistically significant.

3 Results

3.1 Population characteristics

In the NHNAES database from 2005 to 2016, a total of 60,936 participants were initially included. Those with missing data on BFRs (n = 48,563), emphysema (n = 2,504), or covariate data (n = 1,664) were excluded from the study. Finally, our study included 8,205 participants, as shown in Figure 1. Table 1 presented the demographic features of the training and test datasets. Most features showed no significant differences between the training and test sets, indicating a relatively balanced distribution of data. However, the distribution of emphysema demonstrated a statistically significant difference (p = 0.015), with a slightly higher prevalence of emphysema observed in the test set compared to the training set. Table 2 presents the differences in baseline characteristics between the emphysema and non-emphysema groups within the training and test datasets. In the training dataset, there were 100 cases of emphysema, accounting for 1.75% of the total; in the test dataset, there were 63 cases of emphysema, accounting for 2.56%. Significant differences were observed between the emphysema and control groups in terms of age, race, education level, PIR, drinking status, and smoking status (all p < 0.05). In terms of BFRs, LBCBB1 and LBCBR9 in the training set showed significant differences between the emphysema and non-emphysema groups (p = 0.003 and p < 0.001), while in the test set, LBCBB1 and LBCBR2 exhibited statistically significant differences (p = 0.023 and p = 0.016). Pearson correlation analysis identified significant positive correlations among several BFRs (Supplementary Figure 2). Specifically, BDE28, BDE47, BDE85, BDE99, BDE100, and BDE154 demonstrated strong intercorrelations. These results indicate the possibility of shared exposure sources or similar environmental behaviors among PBDEs, potentially amplifying their cumulative effect on emphysema risk. PCA revealed the relationships between individual exposome factors and emphysema, focusing on the first two principal components (Supplementary Figure 3). The Mann–Whitney U test hinted significant differences of BFRs exposure between the emphysema and control groups (p = 0.008). As shown in Supplementary Table 1, participants with available BFR measurements differed significantly from those without in several characteristics. Individuals included in the BFR analysis were older (mean age: 47.2 vs. 34.9 years, p < 0.001), had higher education levels, and a greater proportion reported former or current smoking and alcohol consumption. They also exhibited higher BMI and PIR values (all p < 0.001). No significant difference was observed in gender distribution (p = 0.26). These differences highlight potential selection bias and were considered in the interpretation of the findings.

Table 1
www.frontiersin.org

Table 1. Baseline characteristics of study in the training and test cohorts.

Table 2
www.frontiersin.org

Table 2. Baseline characteristics of participants by emphysema status within the training and test datasets.

3.2 BFRs exposure and emphysema risk in the logistic regression model

Table 3 demonstrates that the ln-transformed PBB153 was significantly associated with an increased prevalence of emphysema. In Model I, without adjusting for covariates, the OR was 1.80 (95% CI: 1.58–2.05, p < 0.001). Similarly, in Model II, after adjusting for covariates, the OR was 1.32 (95% CI: 1.09–1.60, p = 0.005). Furthermore, a higher risk of emphysema was observed with increasing quartiles of PBB153 exposure. Specifically, individuals in the highest quartile (Q4) had a 4.8-fold higher risk of emphysema compared to those in the lowest quartile (Q1) in Model II (OR = 4.80, 95% CI: 1.59–14.52). A significant dose–response relationship between PBB153 and emphysema was identified (P for trend < 0.001). Similarly, BDE28 exhibited a comparable trend in Model I, with a significant dose–response relationship (P for trend = 0.004). Additionally, in Model I, BDE28, BDE47, BDE85, BDE99, and BDE153 were significantly positively associated with emphysema (p < 0.05). In the third quartile (Q3), BDE154 and BDE183 were associated with an increased risk of emphysema by 59 and 69%, respectively (BDE154: OR = 1.59, 95% CI: 1.01–2.51; BDE183: OR = 1.69, 95% CI: 1.05–2.72; all p < 0.05). After applying FDR correction, key associations remained statistically significant. Specifically, LnPBB153 (overall and Q4), LnBDE28 (overall and Q4), and LnBDE153 (overall) were significantly associated with emphysema (FDR-adjusted p < 0.05). Associations for other congeners, including LnBDE47 and LnBDE85, showed attenuated significance after correction but maintained consistent effect estimates. In the sensitivity analysis excluding participants diagnosed with emphysema within 2 years prior to the survey, the logistic regression results remained largely consistent with the primary analysis. Notably, the association between LnPBB153 (Q4) and emphysema remained statistically significant (OR = 4.45, 95% CI: 1.24–15.94, p = 0.022). Additional associations, such as those involving LnBDE153 and LnBDE209, showed effect estimates in the same direction as the main analysis (Supplementary Table 2).

Table 3
www.frontiersin.org

Table 3. Multivariate logistic regression analysis of Ln-transformed BFRs for the prevalence of emphysema.

3.3 BFRs exposure and emphysema risk in WQS model

We utilized the WQS model to evaluate the association between the combined effects of BFRs and the prevalence of emphysema. As shown in Supplementary Table 3, the WQS index demonstrated a positive association between BFR exposure and emphysema prevalence (Model I: OR = 2.28, 95% CI: 1.80–2.89, p < 0.001; Model II: OR = 1.51, 95% CI: 1.11–2.06, p = 0.008). Supplementary Figure 4 illustrates that, among the BFRs, PBB153 was assigned the highest weight (0.62) in the positive direction, indicating its substantial contribution to emphysema risk after adjusting for all covariates. In contrast, the WQS regression in the negative direction did not reveal any significant association between BFR exposure and emphysema prevalence (Model I: OR = 0.93, 95% CI: 0.75–1.16, p = 0.522; Model II: OR = 0.86, 95% CI: 0.65–1.14, p = 0.291).

3.4 Model variable selection

Subsequently, this study identified 15 potentially significant predictor variables (highlighted as green modules in Figure 2) using the Boruta algorithm with shaded features. These selected variables, including age, gender, race, education level, PIR, BMI, PBB153, BDE28, BDE47, BDE85, BDE99, BDE100, BDE153, BDE154, BDE183, and BDE209, were utilized to train and develop the machine learning model. However, considering that previous studies have shown that smoking and alcohol consumption are important risk factors for emphysema, we included smoking and drinking as predictors in the model (27, 28).

Figure 2
www.frontiersin.org

Figure 2. Image of Boruta method for selecting ML model variables.

3.5 Model development and performance comparison

Figure 3A presents the ROC curves for the test set across eight machine learning models: lightGBM, MLP, DT, KNN, RF, SVM, Enet, and XGBoost. Notably, the Enet and MLP models demonstrated the highest AUC performance (AUC = 0.83), significantly outperforming the other six models. Among these, the MLP model was selected for further analysis due to its superior performance across additional evaluation metrics. The Figure 3B also displays the ROC curves for both the training and test sets of the MLP model. Consequently, the interpretability analysis of the best-performing model, MLP, is prioritized in this study. Figure 4 illustrates the comparative performance of the various machine learning models on the training set (A) and the test set (B). Pairwise DeLong tests with FDR correction revealed no significant differences in AUC among RF, XGBoost, and LightGBM (adjusted p > 0.28). All ensemble models showed significantly higher AUCs compared to the Decision Tree classifier (adjusted p < 0.05) (Supplementary Figure 5). To address potential optimism due to internal SMOTE application, 100 bootstrap iterations were performed. The apparent AUC was 0.938, and the bias-corrected AUC was 0.878, indicating an estimated optimism of 0.060. Calibration analysis revealed initial miscalibration, with a Brier score of 0.136, calibration intercept of −3.116, and slope of 1.389. After applying Platt scaling, calibration substantially improved: Brier score decreased to 0.024, intercept adjusted to 0.000, and slope to 1.000, indicating near-perfect calibration. These results support both the discriminative ability and the probability accuracy of the final calibrated model. Detailed calibration metrics and plots are presented in Supplementary Table 4 and Supplementary Figure 6.

Figure 3
www.frontiersin.org

Figure 3. ROC curves of ML models. (A) ROC curves of the test sets of 8 ML models. (B) ROC curves of the training and test sets of MLP.

Figure 4
www.frontiersin.org

Figure 4. Performance comparison of various machine learning models on the training (A) and test (B) sets across multiple evaluation metrics. The heatmaps illustrate the performance of lightGBM, MLP, DT, KNN, RF, SVM, Enet, and XGBoost models. Each cell corresponds to the value of a specific evaluation metric, including accuracy, balanced accuracy, F1 score, J-index, kappa, Matthews correlation coefficient (MCC), positive predictive value (PPV), negative predictive value (NPV), precision, recall, ROC AUC, sensitivity (sens), and specificity (spec). Higher values are represented by blue, while lower values are indicated by white, providing a visual representation of model performance in both training and test sets.

In the sensitivity analysis, drinking and smoking variables were excluded, and models were constructed, yielding a maximum AUROC of 0.77 for both the Enet and KNN models. However, the performance of all models was inferior compared to those that included drinking and smoking as predictors (Supplementary Figure 7). By excluding participants diagnosed within 2 years prior to the survey, the performance of machine learning models remained stable. Among all classifiers, MLP again achieved the highest discrimination with an ROC-AUC of 0.83 (Supplementary Figure 8), similar to that observed in the primary analysis.

3.6 Model interpretation

SHAP analysis was conducted to evaluate the contribution and importance of each variable in the MLP model’s predictions, as illustrated in Figures 5A,B. The analysis consistently highlighted age as the most significant variable, exhibiting the highest SHAP value and serving as a critical risk factor for emphysema. The most important of the BFRs components was PBB153, which ranked fifth after age in the importance of all variables, while it was consistent with the logistic regression results, and both have a harmful effect on emphysema. The second highest variable with a SHAP value is that now smoker are at greater risk of emphysema, and it is also consistent with the performance improvement of the model after we added the smoking variable.

Figure 5
www.frontiersin.org

Figure 5. SHAP diagram of MLP model. (A) SHAP value ranking of the variables in the model. (B) SHAP honeycomb diagram of the MLP model.

The Figure 6 presents personalized feature attributions for two representative patients, one with and one without emphysema. The prediction begins from the base value (bias), which represents the average prediction across the training dataset (35). Each feature’s contribution is depicted as an arrow, indicating whether it decreases (negative value) or increases (positive value) the probability of the outcome. The arrows are sorted by their impact on the prediction, with colors representing positive (red) or negative (blue) contributions. The length of each arrow corresponds to the SHAP value for the respective feature. For the patient with emphysema, high levels of BDE28 (1.76), BDE85 (1.39), PBB153 (4.15), and now smoker were major contributors to the elevated risk, counteracted by high PIR (5) and age (48) (Figure 6A). In contrast, for the patient without emphysema, relatively high levels of BDE209 (1.81) and BDE47 (3.22), along with low levels of BDE28 (0.4) and being a current smoker, increased the risk. However, low levels of PBB153 (0.88), BDE85 (0.3), and age (30) reduced the probability of emphysema (Figure 6B).

Figure 6
www.frontiersin.org

Figure 6. Force plots for 2 patients with and without emphysema.

The PDPs provided a broader understanding of the model’s predictions, highlighting the relationships between emphysema and its predictors, as illustrated in Supplementary Figure 9. The PDP analysis revealed that older age and now smoking status were associated with an increased predicted risk of emphysema. Regarding BFRs, the analysis indicated an upward trend in the predicted probability of emphysema with higher levels of PBB153 and BDE85.

4 Discussion

Emphysema, with its rapidly increasing prevalence, has placed a significant burden on individual health and well-being. Environmental chemicals, such as BFRs, which function as endocrine disruptors, have been proposed as overlooked risk factors for COPD (11, 36). This study aimed to investigate the associations between BFR exposure and emphysema and to evaluate the potential predictive value of BFRs for emphysema risk.

Human exposure to various environmental chemicals in the real world is an unavoidable reality. Previous studies have established a significant association between BFR exposure and COPD (36), offering a novel perspective on incorporating BFRs into predictive models for emphysema. However, despite numerous studies on associations, predictive modeling studies that include BFRs as key variables remain limited. Identifying potential environmental biomarkers is crucial for developing high-resolution classifiers for emphysema.

In this study, we developed a ML model using data from the NHANES study (2005–2016) to predict the risk of emphysema in the U.S. population. The model incorporated basic demographic variables and BFR composition data as predictors. Multivariate logistic regression analysis identified significant associations between emphysema risk and several BFRs, including PBB153, BDE28, BDE47, BDE85, BDE99, and BDE153. Additionally, BDE154 and BDE183 were found to be associated with emphysema risk at certain concentrations. WQS analysis further revealed that co-exposure to BFR mixtures significantly increased the risk of emphysema. Among the eight ML models evaluated, the MLP model demonstrated the best predictive performance, achieving an AUC value of 0.83 after cross-validation, indicating its high accuracy in predicting emphysema risk. To further evaluate the robustness of these associations, we conducted a sensitivity analysis by excluding participants who had been diagnosed with emphysema within 2 years of the survey. This adjustment aimed to reduce potential recall bias and mitigate concerns regarding reverse causality. Notably, the results of both logistic regression and machine learning models remained consistent after this exclusion, reinforcing the temporal plausibility and stability of our primary findings. Moreover, given the low prevalence of emphysema and the use of SMOTE for internal oversampling, we conducted optimism-corrected validation using bootstrap analysis. The bias-corrected AUC remained high, indicating strong discriminative ability. Initial probability calibration was poor, but substantially improved after Platt scaling, demonstrating the final model’s clinical applicability in providing reliable risk estimates. SHAP interpretability analysis based on the MLP model highlighted age and PBB153 as the most influential variables, with age contributing the most to the risk of developing emphysema. These findings were corroborated by PDP analysis. Overall, our results suggest that integrating basic demographic information with environmental BFR exposure data has significant potential for enhancing disease risk prediction in future applications.

Previous studies have shown that BFRs are associated with decreased lung function and the development of COPD (36, 37). At the cellular level, BFRs can induce oxidative stress, inflammation, and apoptosis in lung epithelial cells through caspase-dependent mitochondrial pathways (38). BFRs, particularly polybrominated diphenyl ethers (PBDEs), impair the integrity of airway epithelium by decreasing tight junction resistance, reducing zonula occludens-1 expression, and altering mucus production and rheology (39). These effects contribute to barrier dysfunction and increased inflammatory responses in the lungs. Additionally, BFRs have been linked to cardiovascular toxicity and pro-atherosclerotic mechanisms, which may indirectly impact respiratory health (40).

Association and mechanistic studies collectively indicate that the BFRs identified in this study play a significant role in distinguishing emphysema cases. Traditionally, previous research has focused on identifying novel biomarkers or imaging-based predictors for emphysema, often overlooking the potential predictive value of environmental exposures (8). To address this gap, we developed machine learning models to evaluate whether BFRs can reliably predict the risk of emphysema. Given the challenges in accurately understanding ML methodologies and visually interpreting their results, we employed SHAP and PDP analyses in the MLP model to enhance both interpretability and impact. Considering that diseases closely linked to environmental exposures, such as respiratory and cardiovascular diseases, account for approximately one-fourth of all global diseases according to the World Health Organization (WHO), integrating BFRs into predictive models is undoubtedly meaningful. This approach highlights the importance of environmental factors in advancing risk prediction and improving public health strategies.

Our findings offer a novel perspective for researchers in the field of environment and health, contributing to personalized and accurate emphysema risk predictions for individuals at high risk of BFR exposure. However, this study has several limitations. First, emphysema diagnoses were based on self-reported data from questionnaires, which may introduce bias due to recall errors, potentially affecting the accuracy of the risk prediction model. Although we conducted a sensitivity analysis excluding participants diagnosed within 2 years prior to the survey, emphysema could not be reliably defined based on spirometry data in NHANES, as physician-confirmed diagnoses derived from FEV1/FVC measurements were unavailable. Second, serum measurements of BFR concentrations may not fully capture cumulative exposure or tissue-specific levels, as BFRs are known to accumulate in various organs and tissues. Additionally, reliance on a single measurement of BFR levels may not accurately represent long-term exposure patterns. Moreover, within-person variability in serum BFR concentrations may further limit the precision of exposure estimation. Third, more than 80% of the original NHANES sample was excluded due to missing BFR data, as these chemicals were measured only in a one-third subsample. While this missingness was by design, our comparison of weighted characteristics between included and excluded participants revealed systematic differences, suggesting potential selection bias. Fourth, important confounders such as pack-years of smoking, occupational exposure to dust and fumes, passive smoke exposure, and alpha-1-antitrypsin deficiency were not available in the NHANES dataset. Although we calculated E-values indicating the robustness of our findings to potential unmeasured confounding, residual confounding cannot be entirely ruled out. Lastly, given that NHANES is a cross-sectional study, further validation of our prediction model in independent cohort studies is necessary. The lack of imaging data or genetic biomarkers in NHANES also limited our ability to assess whether incorporating BFRs enhances the predictive power of traditional models based on these data types.

5 Conclusion

To the best of our knowledge, this is the first study to develop a ML model incorporating BFR exposure data to predict emphysema risk. Using BFR data from NHANES, we constructed and identified the optimal MLP model, which was further interpreted through SHAP and PDP analyses. The model demonstrated excellent predictive accuracy, with PBB153 and age emerging as the most influential variables in the prediction. This study underscores the significant role of BFR exposure in emphysema risk and paves the way for novel approaches to disease prediction, emphasizing the importance of environmental factors in advancing public health research and interventions.

Data availability statement

Publicly available datasets were analyzed in this study. This data can be found at: the National Health and Nutrition Examination Survey (NHANES) https://wwwn.cdc.gov/nchs/nhanes/.

Ethics statement

The studies involving humans were approved by Research Ethics Review Board of the National Center for Health Statistics (NCHS). The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

QX: Writing – original draft, Project administration. HQ: Conceptualization, Writing – original draft. JL: Data curation, Writing – original draft. RZ: Software, Writing – original draft. WL: Methodology, Writing – original draft. RO: Resources, Writing – original draft. CZ: Data curation, Writing – original draft. SX: Validation, Writing – review & editing. MD: Writing – original draft, Writing – review & editing.

Funding

The author(s) declare that no financial support was received for the research and/or publication of this article.

Acknowledgments

We would like to express our gratitude to the participants and researchers who participated in the National Health and Nutrition Examination Survey.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The authors declare that no Gen AI was used in the creation of this manuscript.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpubh.2025.1600729/full#supplementary-material

Footnotes

References

1. Janssen, R, Piscaer, I, Franssen, FME, and Wouters, EFM. Emphysema: looking beyond alpha-1 antitrypsin deficiency. Expert Rev Respir Med. (2019) 13:381–97. doi: 10.1080/17476348.2019.1580575

PubMed Abstract | Crossref Full Text | Google Scholar

2. Celli, BR, Locantore, N, Tal-Singer, R, Riley, J, Miller, B, Vestbo, J, et al. Emphysema and extrapulmonary tissue loss in COPD: a multi-organ loss of tissue phenotype. Eur Respir J. (2018) 51:1702146. doi: 10.1183/13993003.02146-2017

PubMed Abstract | Crossref Full Text | Google Scholar

3. Hunsaker, AR. Emphysema as a predictor of lung Cancer: implications for lung Cancer screening. Radiology. (2022) 304:331–2. doi: 10.1148/radiol.220697

PubMed Abstract | Crossref Full Text | Google Scholar

4. Grenier, PA. Deep learning assessment of emphysema progression at CT predicts outcomes. Radiology. (2022) 304:680–2. doi: 10.1148/radiol.220627

PubMed Abstract | Crossref Full Text | Google Scholar

5. Keene, JD, Jacobson, S, Kechris, K, Kinney, GL, Foreman, MG, Doerschuk, CM, et al. Biomarkers predictive of exacerbations in the SPIROMICS and COPDGene cohorts. Am J Respir Crit Care Med. (2017) 195:473–81. doi: 10.1164/rccm.201607-1330OC

PubMed Abstract | Crossref Full Text | Google Scholar

6. Zemans, RL, Jacobson, S, Keene, J, Kechris, K, Miller, BE, Tal-Singer, R, et al. Multiple biomarkers predict disease severity, progression and mortality in COPD. Respir Res. (2017) 18:117. doi: 10.1186/s12931-017-0597-7

PubMed Abstract | Crossref Full Text | Google Scholar

7. Zhang, Y-H, Hoopmann, MR, Castaldi, PJ, Simonsen, KA, Midha, MK, Cho, MH, et al. Lung proteomic biomarkers associated with chronic obstructive pulmonary disease. Am J Physiol Lung Cell Mol Physiol. (2021) 321:L1119–30. doi: 10.1152/ajplung.00198.2021

PubMed Abstract | Crossref Full Text | Google Scholar

8. Suryadevara, R, Gregory, A, Masoomi, A, Xu, Z, Berman, S, Yun, JH, et al. Blood transcriptomics-based machine learning prediction of emphysema in smokers. Chest. (2021) 160:A1841–2. doi: 10.1016/j.chest.2021.07.1653

Crossref Full Text | Google Scholar

9. Ko, FWS, and Hui, DSC. Ground-glass opacities on computed tomography of the thorax to predict progression of emphysema: are we there yet? Am J Respir Crit Care Med. (2024) 210:1392–4. doi: 10.1164/rccm.202405-1066ED

PubMed Abstract | Crossref Full Text | Google Scholar

10. Dobslaw, D, Woiski, C, Kiel, M, Kuch, B, and Breuer, J. Plant uptake, translocation and metabolism of PBDEs in plants of food and feed industry: a review. Rev Environ Sci Biotechnol. (2021) 20:75–142. doi: 10.1007/s11157-020-09557-7

Crossref Full Text | Google Scholar

11. Feiteiro, J, Mariana, M, and Cairrão, E. Health toxicity effects of brominated flame retardants: from environmental to human exposure. Environ Pollut. (2021) 285:117475. doi: 10.1016/j.envpol.2021.117475

PubMed Abstract | Crossref Full Text | Google Scholar

12. Huang, L-W, Lin, H-Y, Lee, M-S, Chiou, W-Y, Chen, L-C, Chi, C-L, et al. Radiotherapy with combined techniques of volumetric-modulated arc therapy (VMAT) and simultaneously intra-tumor inner escalated boost (SIEB) successfully managing a patient with locoregionally advanced maxillary sinus cancer: a case report. Ther Radiol Oncol. (2020) 4:11. doi: 10.21037/tro-19-46

Crossref Full Text | Google Scholar

13. Wang, G, Xu, X, and Li, Y. Distribution, transformation and biological effects of polybrominated diphenyl ethers and their derivatives in soil: a review. Res Environ Sci. (2021) 34:755–65. doi: 10.13198/j.issn.1001-6929.2020.08.13

Crossref Full Text | Google Scholar

14. Darnerud, PO, and Risberg, S. Tissue localisation of tetra- and pentabromodiphenyl ether congeners (BDE-47, −85 and −99) in perinatal and adult C57BL mice. Chemosphere. (2006) 62:485–93. doi: 10.1016/j.chemosphere.2005.04.004

PubMed Abstract | Crossref Full Text | Google Scholar

15. Feng, Y, Hu, Q, Meng, G, Wu, X, Zeng, W, Zhang, X, et al. Simulating long-term occupational exposure to decabrominated diphenyl ether using C57BL/6 mice: biodistribution and pathology. Chemosphere. (2015) 128:118–24. doi: 10.1016/j.chemosphere.2015.01.012

PubMed Abstract | Crossref Full Text | Google Scholar

16. Wei, W, Ramalho, O, and Mandin, C. Modeling the bioaccessibility of inhaled semivolatile organic compounds in the human respiratory tract. Int J Hyg Environ Health. (2020) 224:113436. doi: 10.1016/j.ijheh.2019.113436

PubMed Abstract | Crossref Full Text | Google Scholar

17. Zhang, Y, Mao, P, Li, G, Hu, J, Yu, Y, and An, T. Delineation of 3D dose-time-toxicity in human pulmonary epithelial Beas-2B cells induced by decabromodiphenyl ether (BDE209). Environ Pollut. (2018) 243:661–9. doi: 10.1016/j.envpol.2018.09.047

PubMed Abstract | Crossref Full Text | Google Scholar

18. Albano, GD, Gagliardo, RP, Montalbano, AM, and Profita, M. Overview of the mechanisms of oxidative stress: impact in inflammation of the airway diseases. Antioxidants (Basel). (2022) 11:2237. doi: 10.3390/antiox11112237

PubMed Abstract | Crossref Full Text | Google Scholar

19. Zeng, H, Li, T, He, X, Cai, S, Luo, H, Chen, P, et al. Oxidative stress mediates the apoptosis and epigenetic modification of the Bcl-2 promoter via DNMT1 in a cigarette smoke-induced emphysema model. Respir Res. (2020) 21:229. doi: 10.1186/s12931-020-01495-w

PubMed Abstract | Crossref Full Text | Google Scholar

20. Luo, K, Zhang, R, Aimuzi, R, Wang, Y, Nian, M, and Zhang, J. Exposure to organophosphate esters and metabolic syndrome in adults. Environ Int. (2020) 143:105941. doi: 10.1016/j.envint.2020.105941

PubMed Abstract | Crossref Full Text | Google Scholar

21. Chen, X, Hu, GH, He, B, Cao, Z, He, JF, Luo, HL, et al. Effect of brominated flame retardants exposure on liver function and the risk of non-alcoholic fatty liver disease in the US population. Ecotoxicol Environ Saf. (2024) 273:116142. doi: 10.1016/j.ecoenv.2024.116142

PubMed Abstract | Crossref Full Text | Google Scholar

22. Lv, J, Li, SY, Kong, XM, Zhao, Y, Li, XY, Guo, H, et al. Associations between exposure to brominated flame retardants and periodontitis in U.S. adults. Chemosphere. (2024) 364:143181. doi: 10.1016/j.chemosphere.2024.143181

PubMed Abstract | Crossref Full Text | Google Scholar

23. Mathur, MB, Ding, P, Riddell, CA, and VanderWeele, TJ. Web site and R package for computing E-values. Epidemiology. (2018) 29:e45:–e47. doi: 10.1097/EDE.0000000000000864

PubMed Abstract | Crossref Full Text | Google Scholar

24. VanderWeele, TJ, and Ding, P. Sensitivity analysis in observational research: introducing the E-value. Ann Intern Med. (2017) 167:268–74. doi: 10.7326/M16-2607

PubMed Abstract | Crossref Full Text | Google Scholar

25. Haneuse, S, VanderWeele, TJ, and Arterburn, D. Using the E-value to assess the potential effect of unmeasured confounding in observational studies. JAMA. (2019) 321:602–3. doi: 10.1001/jama.2018.21554

PubMed Abstract | Crossref Full Text | Google Scholar

26. Speiser, JL, Miller, ME, Tooze, J, and Ip, E. A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst Appl. (2019) 134:93–101. doi: 10.1016/j.eswa.2019.05.028

PubMed Abstract | Crossref Full Text | Google Scholar

27. Morse, D, and Rosas, IO. Tobacco smoke-induced lung fibrosis and emphysema. Annu Rev Physiol. (2014) 76:493–513. doi: 10.1146/annurev-physiol-021113-170411

PubMed Abstract | Crossref Full Text | Google Scholar

28. Yeboah-Kordieh, Y, Bene-Alhasan, Y, Mensah, S, Patel, M, and Malik, A. Association between alcohol use and chronic obstructive pulmonary disease in all of us research program. Chest. (2023) 164:A5013. doi: 10.1016/j.chest.2023.07.3245

Crossref Full Text | Google Scholar

29. Hunter, DJ, and Holmes, C. Where medical statistics meets artificial intelligence. N Engl J Med. (2023) 389:1211–9. doi: 10.1056/NEJMra2212850

PubMed Abstract | Crossref Full Text | Google Scholar

30. Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell. (2019) 1:206–15. doi: 10.1038/s42256-019-0048-x

PubMed Abstract | Crossref Full Text | Google Scholar

31. Jiang, S, Liang, Y, Shi, S, Wu, C, and Shi, Z. Improving predictions and understanding of primary and ultimate biodegradation rates with machine learning models. Sci Total Environ. (2023) 904:166623. doi: 10.1016/j.scitotenv.2023.166623

PubMed Abstract | Crossref Full Text | Google Scholar

32. Li, W, Huang, G, Tang, N, Lu, P, Jiang, L, Lv, J, et al. Effects of heavy metal exposure on hypertension: a machine learning modeling approach. Chemosphere. (2023) 337:139435. doi: 10.1016/j.chemosphere.2023.139435

PubMed Abstract | Crossref Full Text | Google Scholar

33. Xiao, H, Liang, X, Li, H, Chen, X, and Li, Y. Trends in the prevalence of osteoporosis and effects of heavy metal exposure using interpretable machine learning. Ecotoxicol Environ Saf. (2024) 286:117238. doi: 10.1016/j.ecoenv.2024.117238

PubMed Abstract | Crossref Full Text | Google Scholar

34. Zhang, Y, Dong, T, Hu, W, Wang, X, Xu, B, Lin, Z, et al. Association between exposure to a mixture of phenols, pesticides, and phthalates and obesity: comparison of three statistical models. Environ Int. (2019) 123:325–36. doi: 10.1016/j.envint.2018.11.076

PubMed Abstract | Crossref Full Text | Google Scholar

35. Fahmy, AS, Csecs, I, Arafati, A, Assana, S, Yankama, TT, Al-Otaibi, T, et al. An explainable machine learning approach reveals prognostic significance of right ventricular dysfunction in nonischemic cardiomyopathy. JACC Cardiovasc Imaging. (2022) 15:766–79. doi: 10.1016/j.jcmg.2021.11.029

PubMed Abstract | Crossref Full Text | Google Scholar

36. Han, L, and Wang, Q. Associations of brominated flame retardants exposure with chronic obstructive pulmonary disease: a US population-based cross-sectional analysis. Front Public Health. (2023) 11:1138811. doi: 10.3389/fpubh.2023.1138811

PubMed Abstract | Crossref Full Text | Google Scholar

37. Mao, H, Lin, T, Huang, S, Xie, Z, Jin, S, Shen, X, et al. The impact of brominated flame retardants (BFRs) on pulmonary function in US adults: a cross-sectional study based on NHANES (2007-2012). Sci Rep. (2024) 14:6486. doi: 10.1038/s41598-024-57302-9

PubMed Abstract | Crossref Full Text | Google Scholar

38. Yu, X, Yin, H, Peng, H, Lu, G, Liu, Z, and Dang, Z. OPFRs and BFRs induced A549 cell apoptosis by caspase-dependent mitochondrial pathway. Chemosphere. (2019) 221:693–702. doi: 10.1016/j.chemosphere.2019.01.074

PubMed Abstract | Crossref Full Text | Google Scholar

39. Albano, GD, Moscato, M, Montalbano, AM, Anzalone, G, Gagliardo, R, Bonanno, A, et al. Can PBDEs affect the pathophysiologic complex of epithelium in lung diseases? Chemosphere. (2020) 241:125087. doi: 10.1016/j.chemosphere.2019.125087

PubMed Abstract | Crossref Full Text | Google Scholar

40. Wu, H-D, Yang, LW, Deng, DY, Jiang, RN, Song, ZK, and Zhou, LT. The effects of brominated flame retardants (BFRs) on pro-atherosclerosis mechanisms. Ecotoxicol Environ Saf. (2023) 262:115325. doi: 10.1016/j.ecoenv.2023.115325

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: machine learning, SHAP, environmental exposure, brominated flame retardants, emphysema

Citation: Xie Q, Qu H, Li J, Zeng R, Li W, Ouyang R, Zhang C, Xie S and Du M (2025) Identifying emphysema risk using brominated flame retardants exposure: a machine learning predictive model based on the SHAP methodology. Front. Public Health. 13:1600729. doi: 10.3389/fpubh.2025.1600729

Received: 26 March 2025; Accepted: 12 June 2025;
Published: 25 June 2025.

Edited by:

Mohiuddin Md. Taimur Khan, Washington State University Tri-Cities, United States

Reviewed by:

Keith Dana Thomsen, Washington River Protection Solutions, United States
Xuehai Wang, Karolinska Institutet (KI), Sweden

Copyright © 2025 Xie, Qu, Li, Zeng, Li, Ouyang, Zhang, Xie and Du. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Ming Du, ZG9jeGllOTIzQGdtYWlsLmNvbQ==; Siyu Xie, eHN5ZG9jQDEyNi5jb20=

These authors share senior authorship

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.