Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Public Health, 10 July 2025

Sec. Environmental Health and Exposome

Volume 13 - 2025 | https://doi.org/10.3389/fpubh.2025.1602566

This article is part of the Research TopicNew Environmental Pollutants, Aging, and Age-Related DiseasesView all 7 articles

Developing an interpretable machine learning predictive model of chronic obstructive pulmonary disease by serum PFAS concentration


Xiaomei Shao
&#x;Xiaomei Shao1*Ling Zhang&#x;Ling Zhang2Yuting Wang&#x;Yuting Wang1Youmei YingYoumei Ying1Xueqin Chen
Xueqin Chen3*
  • 1Nanjing Jiangbei Hospital, Affiliated Nanjing Jiangbei Hospital of Xinglin College, Nantong University, Jiangsu, China
  • 2Huai'an No. 3 People's Hospital, Huaian Second Clinical College of Xuzhou Medical University, Jiangsu, China
  • 3The Affiliated Taizhou People's Hospital of Nanjing Medical University, Taizhou School of Clinical Medicine, Nanjing Medical University, Taizhou, Jiangsu, China

Background: Chronic obstructive pulmonary disease (COPD) is a leading cause of morbidity and mortality worldwide, with limited early detection strategies. While previous studies have examined the relationship between per- and polyfluoroalkyl substances (PFAS) and COPD, limited research has applied interpretable machine learning (ML) techniques to this association.

Methods: We investigated the association between PFAS exposure and COPD risk in 4,450 National Health and Nutrition Examination Survey (NHANES) participants from 2013 to 2018. After excluding missing covariates and extreme PFAS values and applying K-nearest neighbors (KNN) imputation, nine ML models, including CatBoost, were built and evaluated using metrics like accuracy, area under the curve (AUC), sensitivity, and specificity. The best-performing model was further analyzed using partial dependence plots (PDP) and SHapley additive exPlanations (SHAP) analysis. To enhance clinical applicability, the final model was deployed as a publicly accessible web-based risk calculator.

Results: CatBoost emerged as the best model, achieving an accuracy of 84%, AUC of 0.89, sensitivity of 81%, and specificity of 84%. PDP revealed that higher perfluorooctane sulfonic acid (PFOS) and perfluoroundecanoic acid (PFUA) levels were associated with reduced COPD risk, whereas perfluorooctanoic acid (PFOA) and 2-(N-Methyl-perfluorooctane sulfonamido) acetic acid (MPAH) showed positive associations with COPD. perfluorononanoic acid (PFNA), perfluorodecanoic acid (PFDE), and perfluorohexane sulfonic acid (PFHxS) demonstrated mixed or non-linear effects. SHAP analysis provided insights into individual predictions and overall variable contributions, clarifying the complex PFAS-COPD relationship. The deployed web-based calculator enables interactive prediction and risk interpretation, supporting potential public health applications.

Conclusion: CatBoost identified PFOS and PFUA as protective factors against COPD, while PFOA and MPAH increased risk of COPD. These findings emphasize the need for stricter PFAS regulation and highlight the potential of machine learning in guiding prevention strategies.

Introduction

Global burden and trends of COPD

Chronic obstructive pulmonary disease (COPD) is a major global health issue, affecting an estimated 328 million people worldwide (13). While smoking is the leading cause, other factors such as biomass fuel exposure, occupational hazards, and air pollution also contribute significantly, especially in low- and middle-income countries (2, 4). Despite its high prevalence, 70%−80% of COPD cases remain undiagnosed due to the challenges in early detection (57).

Machine learning in disease prediction

Machine learning (ML) has emerged as a transformative tool for COPD screening and risk assessment by analyzing complex, multi-dimensional healthcare data (810). For instance, Lin et al. (11) developed a machine learning-based decision system using gradient boosting classifiers (CatBoost, LightGBM, and XGBoost), which achieved an area under the curve (AUC) of 99.85% in identifying high-risk COPD groups. Similarly, Wang et al. (12) created a COPD risk screening model using logistic regression and generalized additive models, with an AUC exceeding 0.8, showing strong predictive performance. Zeng et al. (13) developed a ML model using data from over 43,000 COPD patients, achieving an AUC of 0.866 for predicting severe exacerbations within 1 year, outperforming previous models. These studies highlight the potential of ML to improve COPD screening, enhance diagnostic accuracy, and support more effective interventions.

Environmental exposures and respiratory health

Global environmental pollution exposure is widespread, with 91% of the world's population living in areas exceeding WHO safety guidelines for pollutants like PM2.5 and ozone (14, 15). Environmental conditions are linked to 24% of all deaths globally, with air pollution alone causing 400,000 premature deaths annually in Europe and reducing average life expectancy by 1 year (16, 17). Niu et al. (18) found that particulate matter exposure increased COPD exacerbation risk, particularly in younger and severe COPD patients. Yan et al. (19) demonstrated that higher blood cadmium and lead levels were associated with increased COPD risk, while anthocyanidin intake above 11.56 mg/day reduced cadmium-related COPD risk by 27%. Madani et al. (20) showed that volatile organic compounds from local sources significantly increased respiratory disease-related emergency room visits, with ethylbenzene having the greatest impact on asthma and COPD. Environmental pollutants pose significant respiratory health risks globally, with effects varying by pollutant type and population vulnerability.

PFAS exposure: background and health impacts

Per- and polyfluoroalkyl substances (PFAS) are a group of synthetic chemicals widely used in industrial and consumer products due to their exceptional chemical stability, water resistance, and heat resistance (2123). However, their persistence in the environment and bioaccumulation in human tissues have raised significant public health concerns (2426). PFAS exposure has been linked to various adverse health outcomes, including metabolic disorders (27), liver damage (28), immune dysfunction (29), and respiratory diseases, such as asthma (30) and reduced lung function (31, 32). Recent studies have also explored the relationship between PFAS and COPD. For instance, Wang et al. analyzed data from the National Health and Nutrition Examination Survey (NHANES) 2007–2018 and found that perfluorooctanoic acid (PFOA) and PFNA exposure significantly increased COPD risk, particularly in males, with a J-shaped dose-response relationship (33, 34). Their study further identified serum albumin as a mediator in the association between PFOA and COPD, with a mediation proportion of 17.94%, suggesting potential pathways involving oxidative stress and chronic inflammation (34). Despite these advancements, research on PFAS and COPD remains limited, and limited studies have applied ML approaches to investigate this relationship or develop predictive models.

Rationale for model interpretability in public health

Despite emerging evidence linking PFAS exposure to COPD, current research remains limited in both scope and methodology (33, 34). Most existing studies rely on conventional statistical models, which may not fully capture the complex, non-linear relationships between PFAS and COPD risk, nor do they provide individualized risk estimation (35, 36). Moreover, few have explored the use of machine learning to enhance predictive performance or model interpretability in this domain. To address these gaps, our study aims to systematically evaluate the relationship between PFAS exposure and COPD risk using advanced ML approaches. By leveraging nationally representative data from the 2013–2018 NHANES, we developed interpretable ML models to predict individual COPD risk, focusing on performance metrics such as AUC, sensitivity, and specificity. We further applied SHAP and partial dependence analyses to uncover both global and personalized insights into how specific PFAS contribute to COPD risk. Finally, to support real-world application, we translated our findings into an accessible online risk calculator, facilitating early screening and informing prevention strategies in public health practice.

Method

Study population

The National Health and Nutrition Examination Survey (NHANES) is a program conducted by the CDC to study the health and nutrition of people living in the United States (34). For this study, we used data from three NHANES cycles (2013–2018), which included 29,400 participants. After excluding individuals with missing covariates or serum PFAS concentration data, 4,844 participants remained. Missing values, present in < 20% of the data, were addressed using the K-nearest neighbors (KNN) imputation method. To ensure robust results, we further excluded extreme PFAS values below the 1st percentile and above the 99th percentile (37), leaving a final sample of 4,450 participants, as shown in Figure 1. All participants provided written informed consent, and the study was approved by the National Center for Health Statistics Research Ethics Review Board.

Figure 1
Flowchart detailing the process of selecting participants and developing a web-based risk calculator from the NHANES database. Initially, 29,400 participants were considered from 2013 to 2018. After applying exclusion criteria for missing data on PFAS and KNN imputation, 4,844 participants remained. Further exclusion of extreme PFAS values resulted in 4,450 participants for analysis. The dataset was divided into a training set of 3,560 and a test set of 890. Machine learning models were constructed and evaluated for metrics like accuracy, sensitivity, specificity. Model explanation included partial dependence and SHAP analysis, leading to a web-based risk calculator development.

Figure 1. Study workflow for PFAS exposure and COPD risk analysis. From 29,400 NHANES participants (2013–2018), 4,450 were included after data preprocessing. The dataset was split into training (n = 3,560) and test (n = 890) sets. Nine machine learning (ML) models were trained using these covariates as predictors. The best-performing model (CatBoost) was further analyzed using partial dependence plots (PDP) and SHapley Additive exPlanations (SHAP).

Serum PFAS

The PFAS analyzed were 2-(N-Methyl-perfluorooctane sulfonamido) acetic acid (MPAH), perfluorodecanoic acid (PFDE), perfluorohexane sulfonic acid (PFHxS), perfluorononanoic acid (PFNA), perfluorooctane sulfonic acid (PFOS), perfluorooctanoic acid (PFOA), and perfluoroundecanoic acid (PFUA). Total concentrations of PFOS and PFOA were calculated by combining their isomers: linear (n-PFOA) and branched (Sb-PFOA) for PFOA, and linear (n-PFOS) and monomethyl branched (Sm-PFOS) for PFOS. Pearson correlation coefficients were used to evaluate relationships among the seven PFAS.

Covariates

This study included age, gender, race, education level, marital status, body mass index (BMI), family income, and smoking status as covariates. Race was divided into five categories: Mexican American, other Hispanic, non-Hispanic White, non-Hispanic Black, and other. Education was grouped into two levels: high school or less, and more than high school. Marital status options were married, widowed, divorced, separated, never married, and living with a partner. Family income was calculated as a ratio of family income to poverty guidelines, with any value above 5 recorded as 5. Smoking status was defined as having smoked at least 100 cigarettes over a lifetime. To assess multicollinearity among covariates, we calculated the variance inflation factor (VIF). Variables with a VIF < 10 were retained for model construction, consistent with prior methodological recommendations for avoiding instability in multivariate models (38).

ML model construction and evaluation

The ML models were built using 15 variables, comprising 10 continuous variables (age, family income, BMI, and seven PFAS biomarkers: MPAH, PFDE, PFHxS, PFNA, PFOA, PFOS, and PFUA) and five categorical variables (gender, race, education level, marital status, and smoking status). Continuous variables were standardized using StandardScaler from scikit-learn to ensure zero mean and unit variance. Categorical variables were encoded as integers without additional transformation. The dataset was randomly split into training (80%, n = 3,560) and testing (20%, n = 890) sets using stratified sampling to maintain the proportion of COPD cases in both sets.

Nine machine learning algorithms were implemented using Python 3.9.19 and scikit-learn 1.3.0: random forest (RF), support vector machine (SVM), decision tree (DT), K-nearest neighbors (KNN), multilayer perceptron (MLP), voting classifier (VC), light gradient boosting machine (LightGBM), CatBoost, and Extreme Gradient Boosting (XGBoost). These models were chosen based on their demonstrated performance in prior studies involving clinical or environmental health prediction tasks (39, 40).

Hyperparameter tuning was performed using grid search, with the optimized parameters provided in Supplementary Table S1. The workflow of the study is shown in Figure 1. Model performance was evaluated using metrics such as the receiver operating characteristic (ROC) curve, area under the curve (AUC), accuracy, sensitivity (recall), specificity, false-positive rate (FPR), false-negative rate (FNR), positive predictive value (PPV), negative predictive value (NPV), and F1 score. These metrics are widely used in medical machine learning studies to assess both discriminatory power and classification balance, especially under imbalanced conditions (39, 41).

ML model interpretation

To analyze the impact of individual PFAS on COPD risk, partial dependence plots (PDPs) were created using the sklearn.inspection module with a grid resolution of 50 points. These plots demonstrate how a specific feature influences the model's predictions while holding other variables constant. Using the trained CatBoost model, the relationship between selected features and COPD risk was calculated and visualized. The trends were smoothed using B-spline interpolation (scipy.interpolate.splrep with smoothing parameter s = 30) to enhance readability, and individual variability was highlighted through sample-specific curves. Additionally, rug plots were included to show the distribution of feature values, providing a deeper understanding of their range within the dataset.

SHapley Additive exPlanations (SHAP) analysis was applied to understand how individual features influenced the predictions made by the trained CatBoost model (42). The SHAP values, calculated using “TreeExplainer,” provided a breakdown of each feature's contribution to the model output. A combined visualization was created, consisting of a dot plot to display the distribution and direction of feature impacts and a bar plot to rank features by their average contribution. This dual representation provided a clear view of the importance and variability of each feature, offering valuable insights into the factors driving COPD risk predictions. All analysis code and data are made publicly available at https://huggingface.co/spaces/MLML202512/COPD/tree/main for reproducibility.

Web-based risk calculator development

To translate the trained machine learning model into a user-friendly application, we developed an interactive web-based COPD risk calculator using the Gradio framework (https://www.gradio.app/). The calculator was built based on the final CatBoost model, which was trained using selected demographic, socioeconomic, lifestyle, and PFAS biomarker variables. Only the numeric features were standardized using StandardScaler, consistent with the model training pipeline, while categorical variables were kept in their original format as encoded integers. The interface allows users to input raw values for 15 features, including five categorical (gender, race, education level, marital status, and smoking) and 10 numeric variables (age, family income, BMI, and seven PFAS biomarkers: MPAH, PFDE, PFHxS, PFNA, PFOA, PFOS, and PFUA). Upon input, the backend applies the same preprocessing pipeline and uses the trained CatBoost model to generate a binary prediction (COPD or Healthy), a probability score, and a qualitative risk level categorized as low, medium, or high.

Statistical analysis

Continuous variables were reported as means with standard deviations (SD), and categorical variables as counts with percentages. T-tests and chi-square tests were used to compare PFAS levels and demographics between COPD and non-COPD groups. Analyses were performed using Python (3.9.19) and R (4.4.0), with p-value < 0.05 considered significant (43).

Result

Baseline characteristics

Among 4,450 participants, as shown in Table 1, 180 (4.0%) had COPD. Participants with COPD were older (64.6 ± 11.5 vs. 49.0 ± 17.6 years, p-value < 0.001) and more likely to be non-Hispanic White (61.7 vs. 36.5%, p-value < 0.001) or have a lower education level (60.0 vs. 43.7%, p-value < 0.001). Marital status also differed, with more widowed individuals in the COPD group (15.6% vs. 6.8%, p-value < 0.001). While smoking prevalence was lower in the COPD group (13.9 vs. 59.7%, p-value < 0.001), this may reflect smoking cessation after diagnosis or survivor bias. PFAS analysis showed higher levels of MPAH (p-value < 0.001), lower PFDE (p-value = 0.004), and lower PFUA (p-value = 0.006) in the COPD group, with no significant differences for PFHxS, PFNA, PFOA, or PFOS.

Table 1
www.frontiersin.org

Table 1. Demographic and clinical features of the participants.

Serum PFAS concentrations showed significant changes from 2013 to 2018 (p-value < 0.001), as shown in Table 2. PFHxS, PFNA, PFOA, and PFOS levels declined over time, with PFOS dropping from 6.91 ng/ml in 2013–2014 to 6.22 ng/ml in 2017–2018, and PFOA from 2.23 to 1.62 ng/ml. MPAH, PFDE, and PFUA levels remained relatively stable. These trends suggest reduced PFAS exposure, likely due to regulatory measures and shifts in industrial practices. The Pearson correlation analysis showed strong relationships between PFUA and PFDE (r = 0.74) and PFOS with PFNA (r = 0.62), while MPAH exhibited weak correlations with other PFAS (Supplementary Figure S1). These results suggest shared sources or pathways for certain PFAS.

Table 2
www.frontiersin.org

Table 2. Serum concentration of PFAS from 2013 to 2018.

ML models construction and evaluation

Nine ML models, including RF, SVM, DT, KNN, MLP, VC, LGB, CB, and XGB, were constructed and evaluated to predict COPD risk. Performance metrics such as AUC, accuracy, sensitivity, and specificity were used to assess the models, as shown in Table 3. Among these, CatBoost emerged as the best-performing model, achieving the highest accuracy (84%), AUC (0.89), sensitivity (81%), and specificity (84%). The ROC curves in Figure 2 further confirmed the robust performance of CatBoost, showing minimal overfitting and consistent AUC values between training and testing datasets. In contrast, other models like KNN exhibited significant overfitting, with a large performance gap between training (AUC = 0.92) and testing (AUC = 0.69). Given its superior performance, CatBoost was selected as the final model for further analysis.

Table 3
www.frontiersin.org

Table 3. Discrimination characteristics among nine ML models.

Figure 2
Nine subplots display ROC curves for various classifiers, showing train and test AUC values. Random forest, support vector machine, decision tree, k-nearest neighbors, multi-layer perceptron, voting classifier, LightGBM, CatBoost, and XGBoosting are compared. Train AUC values range from 0.76 to 1.00, and test AUC values range from 0.69 to 0.89. Each plot includes true positive rate versus false positive rate.

Figure 2. ROC curves of nine ML models for COPD prediction. ROC curves (A–I) illustrate the model performance on both training and test sets using covariates including age, sex, BMI, smoking status, family income, and seven PFAS biomarkers. The nine models include: (A) Random Forest (RF), (B) Support Vector Machine (SVM), (C) Decision Tree (DT), (D) K-nearest neighbors (KNN), (E) Multi-Layer Perceptron (MLP), (F) Voting Classifier (VC), (G) LightGBM (LGB), (H) CatBoost (CB), and (I) XGBoost (XGB). CatBoost achieved the highest test AUC of 0.89.

ML models interpretation

To investigate the relationship between specific PFAS exposure and COPD risk, we performed partial dependence analysis in the trained CatBoost model (Figure 3). The results revealed varying, non-linear associations for different PFAS. COPD risk decreased with higher levels of PFOS and PFUA, suggesting a potential protective effect, while PFOA and MPAH showed a positive association, with risk increasing at higher concentrations. PFNA exhibited a U-shaped relationship, indicating increased risk at both low and high levels, while moderate levels were associated with lower risk. PFDE demonstrated a decreasing trend in risk at moderate levels, followed by an increase at higher concentrations. PFHxS showed a fluctuating pattern without a clear monotonic trend. These findings highlighted the complex influence of PFAS on COPD risk, suggesting that different PFAS may affect the disease through distinct mechanisms.

Figure 3
Seven partial dependence plots labeled A to G display relationships between various chemical compounds and their influences. Each plot shows a line with a shaded confidence interval indicating variability. Compounds include PFOS, PFUA, PFOA, MPAH, PFNA, PFDE, and PFHxS, with individual variations in trend shapes and overlap.

Figure 3. Partial Dependence Plots (PDP) for PFAS and COPD risk. PDPs for selected PFAS predictors—PFOS, PFUA, PFOA, MPAH, PFNA, PFDE, and PFHxS (A–G)—illustrate the marginal effect of each feature on predicted COPD risk, while holding other covariates constant. For each panel, shaded bands indicate 95% confidence intervals and rug plots show the distribution of data points. Adjusted covariates include demographic and behavioral variables such as age, sex, BMI, smoking status, and income level.

To further interpret the contributions of individual features to COPD risk, SHAP analysis was performed. Figure 4A illustrated a waterfall plot, which highlighted the impact of key features on an individual prediction. Smoking status had the largest positive contribution to COPD risk, followed by PFNA and MPAH. Conversely, family income and PFUA were associated with reduced risk. The plot clearly showed how individual features influenced the model's prediction for a specific instance. Figure 4B presented a summary plot of SHAP values across the entire dataset, ranking features by their overall importance. Age was the most significant contributor to COPD risk, with older age associated with higher risk. Among PFAS, PFUA, PFHxS, and PFOS demonstrated negative contributions, indicating that lower levels of these PFAS were linked to higher COPD risk. Conversely, MPAH and PFOA showed positive contributions, meaning that higher levels were associated with increased risk. PFNA and PFDE exhibited a mixed effect, with both low and high levels contributing differently to the risk. The SHAP summary plot illustrated these trends, with red indicating feature values that increase COPD risk and blue indicating values that decrease COPD risk, providing a clear and detailed understanding of the directionality of each PFAS's impact on COPD risk.

Figure 4
Two panels illustrate the impact of various features using SHAP values. Panel A shows a waterfall plot with features such as smoking, family income, and race impacting predictions, with smoking having the most significant influence. Panel B displays a summary plot with violin distributions for each feature, revealing age and smoking's high importance. Red indicates higher feature values, while blue shows lower. The feature contributions range from negative to positive, affecting the model's output.

Figure 4. SHapley Additive exPlanations (SHAP) analysis for COPD risk prediction. (A) Waterfall plot showing the contribution of top features (e.g., smoke, family income, MPAH, PFNA, PFUA) to an individual prediction. Positive (red) values increase risk, while negative (blue) values reduce risk. (B) Summary plot displaying mean SHAP values for all features across the dataset, ranked by importance. Age and smoke are the strongest predictors, with PFAS (PFUA, PFOS, PFOA, MPAH, and PFNA) showing varied directional impacts on COPD risk. The color gradient represents feature values, with red indicating high values and blue low values.

Web-based risk calculator

To enhance accessibility and clinical applicability, we implemented a web-based COPD risk calculator using the Gradio framework. This interactive tool integrates the trained CatBoost model and allows users to input raw demographic, lifestyle, and PFAS biomarker data through a browser interface (Figure 5). The calculator automatically standardizes numeric features in the backend and provides real-time predictions, including binary classification (COPD or Healthy), probability of risk, and a qualitative risk level (low, medium, or high). The web-based calculator serves as a user-friendly prototype for personalized risk assessment and may assist clinicians or public health professionals in early identification and stratification of COPD risk, particularly in PFAS-exposed populations (https://huggingface.co/spaces/MLML202512/COPD).

Figure 5
COPD Risk Prediction Calculator form with fields for demographic information such as age (40), gender (male), race, education, marital status, and family income. Health indicators include BMI (25) and smoking status. Biomarkers listed are MPAH, PFDE, PFHxS, PFNA, PFOA, PFOS, and PFUA with respective values. The prediction result indicates “Healthy” with a 4.46% risk probability, categorized as low risk.

Figure 5. The web-based COPD risk prediction calculator. This calculator, developed using the Gradio framework, integrates the trained CatBoost model. Users input values for age, sex, BMI, smoking, income, and serum PFAS levels. The tool applies the same standardization and feature scaling as in model training, and outputs a COPD risk probability, risk category (Low/Medium/High), and binary prediction (COPD or Healthy). It is accessible at: https://huggingface.co/spaces/MLML202512/COPD.

Discussion

Summary of main findings and model performance

This study is the first to use interpretable ML techniques to investigate the association between PFAS exposure and COPD risk, utilizing data from the US NHANES (2013–2018). Among the nine ML models tested, CatBoost emerged as the best performer, achieving an accuracy of 84%, AUC of 0.89, sensitivity of 81%, and specificity of 84%, making it the optimal choice for predicting COPD risk. To provide deeper insights, feature importance analysis, partial dependence plots and SHAP analysis were conducted to evaluate how individual PFAS and other factors influence COPD risk. These findings underscored the importance of regulating PFAS exposure to mitigate health risks and demonstrated the potential of interpretable ML methods to identify high-risk populations, guiding targeted interventions and improving public health outcomes.

PFAS as key predictors of COPD risk: consistency with prior studies

Previous research highlights that PFOA and PFNA are strongly associated with increased COPD risk, particularly among males, exhibiting a characteristic nonlinear and J-shaped dose-response relationship for PFOA exposure (34). Similarly, Pan et al. demonstrated significant associations between serum levels of PFOS and PFOA and increased COPD risk, noting differential impacts based on sex, age, and smoking status, and indicating protective roles of moderate-intensity physical activity in mitigating PFAS-related COPD risk (33). Our study aligned with these findings, as the CatBoost model identified PFAS, particularly MPAH and PFOA, as significant predictors of COPD risk. Notably, our study uniquely identified PFOS and PFUA as potentially protective against COPD risk, differing from findings reported by Wang et al. (34) and Pan et al. (33), who found positive associations for PFOS. These discrepancies may result from variations in demographic characteristics, exposure measurement methodologies, or different adjustments for confounding variables across studies. While previous literature suggests that PFAS may influence COPD development through inflammation and oxidative stress pathways (44, 45), the specific biological roles of individual PFAS compounds like PFOS and PFUA remain complex and heterogeneous. Thus, further longitudinal and mechanistic studies are needed to clarify these differences and establish causality. Moreover, SHAP analysis in our study highlighted the notable contribution of PFAS to COPD risk, alongside demographic and socioeconomic factors. These results reinforced the hypothesis that PFAS may influence COPD development through mechanisms such as inflammation and oxidative stress (44, 45), further emphasizing the need for stricter PFAS regulation and further exploration of their impact on respiratory health.

Biological mechanisms underlying PFAS–COPD associations

The observed relationships between PFAS levels and COPD risk in our study can be explained by underlying biological mechanisms, including inflammation (44), oxidative stress (45), and PFAS interactions with albumin and lung tissues (46). For PFOS and PFUA, the protective association at higher concentrations may reflect their ability to stabilize pulmonary surfactants and reduce oxidative stress (47). Albumin, known to bind PFOS and PFUA, could facilitate their targeted delivery to lung tissues (46), while moderate and lower levels might help maintain epithelial integrity (48) and mitigate inflammation (49), key drivers of COPD progression. In contrast, PFOA and MPAH were positively associated with COPD risk at higher concentrations, which aligned with their known pro-inflammatory and oxidative effects (50). PFOA has been shown to activate the NLRP3 inflammasome and increase cytokine production, including IL-6 and TNF-α, leading to sustained inflammation in lung tissues (51). MPAH may exert similar effects by disrupting epithelial barriers and exacerbating oxidative stress (52), contributing to airway damage and disease progression. These findings highlighted the role of chronic inflammation and oxidative damage as central mechanisms linking higher PFOA and MPAH levels to increased COPD risk.

Non-linear effects of PFNA, PFDE, and PFHxS

The U-shaped relationship observed with PFNA and the mixed pattern with PFDE reflected their dual roles in COPD risk. At moderate concentrations, PFNA and PFDE may exhibit stabilizing effects on lung tissues, potentially reducing inflammation and oxidative stress. However, at very low or high concentrations, these PFAS may disrupt immune homeostasis and amplify inflammatory responses, leading to increased COPD risk (34). The fluctuating trend for PFHxS likely stems from its complex interplay with inflammatory and antioxidant pathways, which may vary depending on individual susceptibility and exposure levels (34). These findings emphasized the nuanced and concentration-dependent effects of PFAS on COPD risk, highlighting the importance of further mechanistic studies to better understand their roles in respiratory health. These findings emphasized the need for further toxicological studies to elucidate the specific mechanisms by which different PFAS contribute to COPD risk. Experimental research is also needed to determine whether certain PFAS exhibit synergistic or antagonistic effects, particularly in cases of mixed exposure. Understanding these interactions will be critical for developing targeted strategies to mitigate the health impacts of PFAS exposure and for informing regulatory policies aimed at reducing risks associated with these persistent environmental pollutants.

Study limitations

This study has several limitations. First, as NHANES used a multi-stage stratified sampling design, the findings may not fully represent the entire U.S. population. Second, while our machine learning models demonstrated strong predictive performance, they lack external validation on independent datasets, which is essential to assess model stability and generalizability. Third, COPD status in NHANES was based on self-reported questionnaire data rather than spirometry or clinical diagnosis, which may lead to recall bias or disease misclassification. Additionally, smoking status was also self-reported and may be subject to underreporting, particularly among certain demographic groups. Fourth, although we adjusted for several known covariates, potential unmeasured confounders such as physical activity, dietary factors, occupational exposures, and access to healthcare services were not available in our dataset. These variables could influence both PFAS exposure and COPD risk and may have biased the observed associations. Fifth, PFAS concentrations were measured at a single time point, which may not accurately reflect long-term or cumulative exposure levels. Given the chronic nature of COPD, longer-term exposure assessments would provide a more accurate understanding of causal relationships. Furthermore, the exclusion of participants with missing data may have introduced sampling bias, and the lack of access to detailed healthcare records—such as medication history, comorbidities, or imaging findings—limited our ability to fully characterize disease severity or differentiate COPD subtypes. Moreover, this study did not formally compare models with and without PFAS variables, which may limit the assessment of their specific contribution to COPD risk prediction. Finally, cultural and regional differences in environmental exposure, healthcare access, and disease awareness may limit the generalizability of these findings to other populations or countries. These limitations underscore the need for further longitudinal studies incorporating detailed clinical records, long-term exposure measurements, and more comprehensive confounding adjustment to validate and expand upon our findings.

Conclusion

This study explored the relationship between PFAS exposure and COPD risk using NHANES (2013–2018) data, applying interpretable machine learning techniques for the first time. Among the nine models, CatBoost performed best, achieving an accuracy of 84%, an AUC of 0.89, a sensitivity of 81%, and a specificity of 84%, making it the optimal model. PDP analysis revealed that higher PFOS and PFUA levels were associated with reduced COPD risk, while higher PFOA and MPAH increased risk. PFNA, PFHxS, and PFDE showed complex, non-linear associations. SHAP analysis provided individual risk predictions and overall variable contributions, while an interactive web-based calculator was deployed for real-time risk assessment. This is the first study to integrate interpretable ML algorithms with large-scale epidemiological data to examine concentration-dependent effects of individual PFAS compounds on COPD risk. By combining advanced modeling with user-friendly tools, our approach bridges data science and clinical application. These results emphasize the need for PFAS regulatory actions and demonstrate how transparent ML can enhance precision risk stratification in chronic respiratory diseases, providing a scalable framework adaptable to other environmental exposures and health outcomes.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding authors.

Ethics statement

The studies involving humans were approved by all procedures and protocols for the National Health and Nutrition Examination Survey (NHANES) have been reviewed and approved by the National Center for Health Statistics (NCHS) Research Ethics Review Board. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

XS: Methodology, Conceptualization, Formal analysis, Project administration, Software, Writing – original draft, Writing – review & editing. LZ: Conceptualization, Data curation, Methodology, Software, Writing – original draft, Writing – review & editing. YW: Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing. YY: Conceptualization, Methodology, Software, Writing – original draft, Writing – review & editing. XC: Supervision, Writing – original draft, Writing – review & editing.

Funding

The author(s) declare that no financial support was received for the research and/or publication of this article.

Acknowledgments

We appreciate the contribution of all staffs and participants in the U.S. National Health and Nutrition Examination Survey (NHANES).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Gen AI was used in the creation of this manuscript.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpubh.2025.1602566/full#supplementary-material

References

1. López-Campos JL, Tan W, Soriano JB. Global burden of COPD. Respirology. (2016) 21:14–23. doi: 10.1111/resp.12660

PubMed Abstract | Crossref Full Text | Google Scholar

2. Mannino DM, Buist AS. Global burden of COPD: risk factors, prevalence, and future trends. Lancet. (2007) 370:765–73. doi: 10.1016/S0140-6736(07)61380-4

PubMed Abstract | Crossref Full Text | Google Scholar

3. Christenson SA, Smith BM, Bafadhel M, Putcha N. Chronic obstructive pulmonary disease. Lancet. (2022) 399:2227–42. doi: 10.1016/S0140-6736(22)00470-6

PubMed Abstract | Crossref Full Text | Google Scholar

4. Zhu B, Wang Y, Ming J, Chen W, Zhang L. Disease burden of COPD in China: a systematic review. Int J Chron Obstruct Pulmon Dis. (2018) 13:1353–64. doi: 10.2147/COPD.S161555

PubMed Abstract | Crossref Full Text | Google Scholar

5. Fazleen A, Wilkinson T. Early COPD: current evidence for diagnosis and management. Ther Adv Respir Dis. (2020) 14:1753466620942128. doi: 10.1177/1753466620942128

PubMed Abstract | Crossref Full Text | Google Scholar

6. Ruvuna L, Sood A. Epidemiology of chronic obstructive pulmonary disease. Clin Chest Med. (2020) 41:315–27. doi: 10.1016/j.ccm.2020.05.002

PubMed Abstract | Crossref Full Text | Google Scholar

7. Xu J, Ji Z, Zhang P, Chen T, Xie Y, Li J. Disease burden of COPD in the Chinese population: a systematic review. Ther Adv Respir Dis. (2023) 17:17534666231218899. doi: 10.1177/17534666231218899

PubMed Abstract | Crossref Full Text | Google Scholar

8. Castaldi PJ, Boueiz A, Yun J, Estepar RSJ, Ross JC, Washko G, et al. Machine learning characterization of COPD subtypes: insights from the COPDGene study. Chest. (2020) 157:1147–57. doi: 10.1016/j.chest.2019.11.039

PubMed Abstract | Crossref Full Text | Google Scholar

9. Kaplan A, Cao H, FitzGerald JM, Iannotti N, Yang E, Kocks JWH, et al. Artificial intelligence/machine learning in respiratory medicine and potential role in asthma and COPD diagnosis. J Allergy Clin Immunol Pract. (2021) 9:2255–61. doi: 10.1016/j.jaip.2021.02.014

PubMed Abstract | Crossref Full Text | Google Scholar

10. Shen X, Liu H. Using machine learning for early detection of chronic obstructive pulmonary disease: a narrative review. Respir Res. (2024) 25:336. doi: 10.1186/s12931-024-02960-6

PubMed Abstract | Crossref Full Text | Google Scholar

11. Lin X Lei Y, Chen J, Xing Z, Yang T, Wang Q et al. A case-finding clinical decision support system to identify subjects with chronic obstructive Pulmonary Disease based on Public Health Data. Tsinghua Sci Technol. (2023) 28:525–40. doi: 10.26599/TST.2022.9010010

PubMed Abstract | Crossref Full Text | Google Scholar

12. Wang X, He H, Xu L, Chen C, Zhang J, Li N, et al. Developing and validating a chronic obstructive pulmonary disease quick screening questionnaire using statistical learning models. Chron Respir Dis. (2022) 19:14799731221116585. doi: 10.1177/14799731221116585

PubMed Abstract | Crossref Full Text | Google Scholar

13. Zeng S, Arjomandi M, Tong Y, Liao ZC. Luo, G. Developing a machine learning model to predict severe chronic obstructive pulmonary disease exacerbations: retrospective cohort study. J Med Internet Res. (2022) 24:e28953. doi: 10.2196/28953

PubMed Abstract | Crossref Full Text | Google Scholar

14. Mocelin HT, Fischer GB, Bush A. Adverse early-life environmental exposures and their repercussions on adult respiratory health. J Pediatr. (2022) 98:S86–95. doi: 10.1016/j.jped.2021.11.005

PubMed Abstract | Crossref Full Text | Google Scholar

15. Tran HM, Tsai FJ, Lee YL, Chang JH, Chang LT, Chang TY, et al. The impact of air pollution on respiratory diseases in an era of climate change: a review of the current evidence. Sci Total Environ. (2023) 898:166340. doi: 10.1016/j.scitotenv.2023.166340

PubMed Abstract | Crossref Full Text | Google Scholar

16. WHO. WHO Air Pollution (2021).

Google Scholar

17. Zhuo B, Ran S, Qian AM, Zhang J, Tabet M, Howard SW, et al. Air pollution metabolomic signatures and chronic respiratory diseases risk: a longitudinal study. Chest. (2024) 166:975–86. doi: 10.1016/j.chest.2024.06.3809

PubMed Abstract | Crossref Full Text | Google Scholar

18. Niu Y, Niu H, Meng X, Zhu Y, Ren X, He R, et al. Associations between air pollution and the onset of acute exacerbations of COPD: a time-stratified case-crossover study in China. Chest. (2024) 166:998–1009. doi: 10.1016/j.chest.2024.05.030

PubMed Abstract | Crossref Full Text | Google Scholar

19. Yan Z, Xu Y, Li K, Liu L. Heavy metal levels and flavonoid intakes are associated with chronic obstructive pulmonary disease: an NHANES analysis (2007-2010 to 2017-2018). BMC Public Health. (2023) 23:2335. doi: 10.1186/s12889-023-17250-x

PubMed Abstract | Crossref Full Text | Google Scholar

20. Madani NA, Jones LE, Carpenter DO. Different volatile organic compounds in local point source air pollution pose distinctive elevated risks for respiratory disease-associated emergency room visits. Chemosphere. (2023) 344:140403. doi: 10.1016/j.chemosphere.2023.140403

PubMed Abstract | Crossref Full Text | Google Scholar

21. Evich MG, Davis MJB, McCord JP, Acrey B, Awkerman JA, Knappe DRU, et al. Per- and polyfluoroalkyl substances in the environment. Science. (2022) 375:eabg9065. doi: 10.1126/science.abg9065

PubMed Abstract | Crossref Full Text | Google Scholar

22. Cao Y, Ng C. Absorption, distribution, and toxicity of per- and polyfluoroalkyl substances (PFAS) in the brain: a review. Environ Sci Process Impacts. (2021) 23:1623–40. doi: 10.1039/D1EM00228G

PubMed Abstract | Crossref Full Text | Google Scholar

23. Domingo JL, Nadal M. Human exposure to per- and polyfluoroalkyl substances (PFAS) through drinking water: a review of the recent scientific literature. Environ Res. (2019) 177:108648. doi: 10.1016/j.envres.2019.108648

PubMed Abstract | Crossref Full Text | Google Scholar

24. Wen ZJ, Wei YJ, Zhang YF, Zhang YF. A review of cardiovascular effects and underlying mechanisms of legacy and emerging per- and polyfluoroalkyl substances (PFAS). Arch Toxicol. (2023) 97:1195–245. doi: 10.1007/s00204-023-03477-5

PubMed Abstract | Crossref Full Text | Google Scholar

25. Fenton SE, Ducatman A, Boobis A, DeWitt JC, Lau C, Ng C, et al. Per- and polyfluoroalkyl substance toxicity and human health review: current state of knowledge and strategies for informing future research. Environ Toxicol Chem. (2021) 40:606–30. doi: 10.1002/etc.4890

PubMed Abstract | Crossref Full Text | Google Scholar

26. He A, Liang Y, Li J, Zhou Z, Li F, Li Z, et al. A critical review of populations with occupational exposure to per- and polyfluoroalkyl substances: external exposome, internal exposure levels, and health effects. Environ Sci Technol. (2025) 10:10715–33. doi: 10.1021/acs.est.4c14478

PubMed Abstract | Crossref Full Text | Google Scholar

27. Schlezinger JJ, Gokce N. Perfluoroalkyl/polyfluoroalkyl substances: links to cardiovascular disease risk. Circ Res. (2024) 134:1136–59. doi: 10.1161/CIRCRESAHA.124.323697

PubMed Abstract | Crossref Full Text | Google Scholar

28. Costello E, Rock S, Stratakis N, Eckel SP, Walker DI, Valvi D, et al. Exposure to per- and polyfluoroalkyl substances and markers of liver injury: a systematic review and meta-analysis. Environ Health Perspect. (2022) 130:46001. doi: 10.1289/EHP10092

PubMed Abstract | Crossref Full Text | Google Scholar

29. Wang LQ, Liu T, Yang S, Sun L, Zhao ZY, Li LY, et al. Perfluoroalkyl substance pollutants activate the innate immune system through the AIM2 inflammasome. Nat Commun. (2021) 12:2915. doi: 10.1038/s41467-021-23201-0

PubMed Abstract | Crossref Full Text | Google Scholar

30. Wang YF, Xie B, Zou YX. Association between PFAS congeners exposure and asthma among US children in a nationally representative sample. Environ Geochem Health. (2023) 45:5981–90. doi: 10.1007/s10653-023-01614-8

PubMed Abstract | Crossref Full Text | Google Scholar

31. Rafiee A, Faridi S, Sly PD, Stone L, Kennedy LP, Mahabee-Gittens EM. Asthma and decreased lung function in children exposed to perfluoroalkyl and polyfluoroalkyl substances (PFAS): an updated meta-analysis unveiling research gaps. Environ Res. (2024) 262:119827. doi: 10.1016/j.envres.2024.119827

PubMed Abstract | Crossref Full Text | Google Scholar

32. Solan ME, Park JA. Per- and poly-fluoroalkyl substances (PFAS) effects on lung health: a perspective on the current literature and future recommendations. Front Toxicol. (2024) 6:1423449. doi: 10.3389/ftox.2024.1423449

PubMed Abstract | Crossref Full Text | Google Scholar

33. Pan M, Zou Y, Wei G, Zhang C, Zhang K, Guo H, et al. Moderate-intensity physical activity reduces the role of serum PFAS on COPD: a cross-sectional analysis with NHANES data. PLoS ONE. (2024) 19:e0308148. doi: 10.1371/journal.pone.0308148

PubMed Abstract | Crossref Full Text | Google Scholar

34. Wang Y, Zhang J, Zhang J, Hou M, Kong L, Lin X, et al. Association between per- and polyfluoroalkyl substances exposure and prevalence of chronic obstructive pulmonary disease: the mediating role of serum albumin. Sci Total Environ. (2024) 925:171742. doi: 10.1016/j.scitotenv.2024.171742

PubMed Abstract | Crossref Full Text | Google Scholar

35. Li X, Li Z, Ye J, Ye W. Relationship of perfluoroalkyl chemicals with chronic obstructive pulmonary disease: a cross-sectional study. Toxicol Ind Health. (2025) 41:176–85. doi: 10.1177/07482337251315216

PubMed Abstract | Crossref Full Text | Google Scholar

36. Wu LY, He WT, Zeeshan M, Zhou Y, Zhang YT, Liang LX, et al. Incidence of respiratory diseases associated with per- and polyfluoroalkyl substances (PFAS) in PM(25): New evidence from a population-based survey of Pearl River Delta (PRD), China. J Hazard Mater. (2025) 494:138485. doi: 10.1016/j.jhazmat.2025.138485

PubMed Abstract | Crossref Full Text | Google Scholar

37. Venuta A, Lloyd M, Ganji A, Xu J, Simon L, Zhang M, et al. Predicting within-city spatiotemporal variations in daily median outdoor ultrafine particle number concentrations and size in Montreal and Toronto, Canada. Environ Epidemiol. (2024) 8:e323. doi: 10.1097/EE9.0000000000000323

PubMed Abstract | Crossref Full Text | Google Scholar

38. Li S, Li M, Wu J, Li Y, Han J, Song Y, et al. Developing and validating a clinlabomics-based machine-learning model for early detection of retinal detachment in patients with high myopia. J Transl Med. (2024) 22:405. doi: 10.1186/s12967-024-05131-9

PubMed Abstract | Crossref Full Text | Google Scholar

39. Li W, Huang G, Tang N, Lu P, Jiang L, Lv J, et al. Effects of heavy metal exposure on hypertension: a machine learning modeling approach. Chemosphere. (2023) 337:139435. doi: 10.1016/j.chemosphere.2023.139435

PubMed Abstract | Crossref Full Text | Google Scholar

40. Li X, Zhao Y, Zhang D, Kuang L, Huang H, Chen W, et al. Development of an interpretable machine learning model associated with heavy metals' exposure to identify coronary heart disease among US adults via SHAP: findings of the US NHANES from 2003 to 2018. Chemosphere. (2023) 311:137039. doi: 10.1016/j.chemosphere.2022.137039

PubMed Abstract | Crossref Full Text | Google Scholar

41. Bai Q, Chen H, Gao Z, Li B, Liu S, Dong W, et al. Advanced prediction of heart failure risk in elderly diabetic and hypertensive patients using nine machine learning models and novel composite indices: insights from NHANES 2003-2016. Eur J Prev Cardiol. (2025) zwaf081. doi: 10.1093/eurjpc/zwaf081

PubMed Abstract | Crossref Full Text | Google Scholar

42. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, CA: Curran Associates Inc. (2017). p. 4768–77. doi: 10.5555/3295222.3295230

Crossref Full Text | Google Scholar

43. Zibibula Y, Tayier G, Maimaiti A, Liu T, Lu J. Machine learning approaches to identify the link between heavy metal exposure and ischemic stroke using the US NHANES data from 2003 to 2018. Front Public Health. (2024) 12:1388257. doi: 10.3389/fpubh.2024.1388257

PubMed Abstract | Crossref Full Text | Google Scholar

44. Dragon J, Hoaglund M, Badireddy AR, Nielsen G, Schlezinger J, Shukla A. Perfluoroalkyl substances (PFAS) affect inflammation in lung cells and tissues. Int J Mol Sci. (2023) 24:8539. doi: 10.3390/ijms24108539

PubMed Abstract | Crossref Full Text | Google Scholar

45. Omoike OE, Pack RP, Mamudu HM, Liu Y, Strasser S, Zheng S, et al. Association between per and polyfluoroalkyl substances and markers of inflammation and oxidative stress. Environ Res. (2021) 196:110361. doi: 10.1016/j.envres.2020.110361

PubMed Abstract | Crossref Full Text | Google Scholar

46. Pye ES, Wallace SE, Marangoni DG, Foo ACY. Albumin proteins as delivery vehicles for PFAS contaminants into respiratory membranes. ACS Omega. (2023) 8:44036–43. doi: 10.1021/acsomega.3c06239

PubMed Abstract | Crossref Full Text | Google Scholar

47. Wielsøe M, Long M, Ghisari M. Bonefeld-Jørgensen EC. Perfluoroalkylated substances (PFAS) affect oxidative stress biomarkers in vitro. Chemosphere. (2015) 129:239–45. doi: 10.1016/j.chemosphere.2014.10.014

PubMed Abstract | Crossref Full Text | Google Scholar

48. Laube M, Thome UH. Albumin stimulates epithelial Na(+) transport and barrier integrity by activating the PI3K/AKT/SGK1 pathway. Int J Mol Sci. (2022) 23:8823. doi: 10.3390/ijms23158823

PubMed Abstract | Crossref Full Text | Google Scholar

49. Eckart A, Struja T, Kutz A, Baumgartner A, Baumgartner T, Zurfluh S, et al. Relationship of nutritional status, inflammation, and serum albumin levels during acute illness: a prospective study. Am J Med. (2020) 133:713–22.e717 doi: 10.1016/j.amjmed.2019.10.031

PubMed Abstract | Crossref Full Text | Google Scholar

50. Pierozan P, Kosnik M, Karlsson O. High-content analysis shows synergistic effects of low perfluorooctanoic acid (PFOS) and perfluorooctane sulfonic acid (PFOA) mixture concentrations on human breast epithelial cell carcinogenesis. Environ Int. (2023) 172:107746. doi: 10.1016/j.envint.2023.107746

PubMed Abstract | Crossref Full Text | Google Scholar

51. Weng Z, Xu C, Zhang X, Pang L, Xu J, Liu Q, et al. Autophagy mediates perfluorooctanoic acid-induced lipid metabolism disorder and NLRP3 inflammasome activation in hepatocytes. Environ Pollut. (2020) 267:115655. doi: 10.1016/j.envpol.2020.115655

PubMed Abstract | Crossref Full Text | Google Scholar

52. Siwakoti RC, Park S, Ferguson KK, Hao W, Cantonwine DE, Mukherjee B, et al. Prenatal per- and polyfluoroalkyl substances (PFAS) and maternal oxidative stress: evidence from the LIFECODES study. Chemosphere. (2024) 360:142363. doi: 10.1016/j.chemosphere.2024.142363

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: chronic obstructive pulmonary disease, machine learning, partial dependence plot, SHapley additive exPlanations, environment pollution

Citation: Shao X, Zhang L, Wang Y, Ying Y and Chen X (2025) Developing an interpretable machine learning predictive model of chronic obstructive pulmonary disease by serum PFAS concentration. Front. Public Health 13:1602566. doi: 10.3389/fpubh.2025.1602566

Received: 30 March 2025; Accepted: 18 June 2025;
Published: 10 July 2025.

Edited by:

Ling Zhang, Wuhan University of Science and Technology, China

Reviewed by:

Chengyong Jia, Albert Einstein College of Medicine, United States
Najm Alsadat Madani, State University of New York, United States

Copyright © 2025 Shao, Zhang, Wang, Ying and Chen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Xiaomei Shao, MTgwNjE2OTc1MjdAMTYzLmNvbQ==; Xueqin Chen, Y3hxaGw5OTY2QDE2My5jb20=

These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.