ORIGINAL RESEARCH article

Front. Public Health

Sec. Environmental Health and Exposome

Volume 13 - 2025 | doi: 10.3389/fpubh.2025.1602566

This article is part of the Research TopicNew Environmental Pollutants, Aging, and Age-Related DiseasesView all 7 articles

Developing an Interpretable Machine Learning Predictive Model of Chronic obstructive pulmonary disease by Serum PFAS Concentration

Provisionally accepted
Youmei  YingYoumei Ying1Ling  ZhangLing Zhang2Yuting  WangYuting Wang1*Xueqin  ChenXueqin Chen3*
  • 1Nanjing Jiangbei Hospital, Affiliated Nanjing Jiangbei Hospital of Xinglin College, Nantong University, Nanjing, China
  • 2Huai’an No. 3 People's Hospital, Huaian Second Clinical College of Xuzhou Medical University, Huaian, China
  • 3The Affiliated Taizhou People's Hospital, Nanjing Medical University, Taizhou, China

The final, formatted version of the article will be published soon.

Chronic obstructive pulmonary disease (COPD) is a leading cause of morbidity and mortality worldwide, with limited early detection strategies. While previous studies have examined the relationship between per-and polyfluoroalkyl substances (PFAS) and COPD, limited research has applied interpretable machine learning (ML) techniques to this association.We investigated the association between PFAS exposure and COPD risk in 4450 NHANES participants from 2013 to 2018. After excluding missing covariates and extreme PFAS values and applying k-nearest neighbors (KNN) imputation, nine ML models, including CatBoost, were built and evaluated using metrics like accuracy, area under the curve (AUC), sensitivity, and specificity. The best-performing model was further analyzed using Partial Dependence Plots (PDP) and SHapley additive exPlanations (SHAP) analysis. To enhance clinical applicability, the final model was deployed as a publicly accessible web-based risk calculator.CatBoost emerged as the best model, achieving an accuracy of 84%, AUC of 0.89, sensitivity of 81%, and specificity of 84%. PDP revealed that higher PFOS and PFUA levels were associated with reduced COPD risk, whereas PFOA and MPAH showed positive associations with COPD. PFNA, PFDE, and PFHxS demonstrated mixed or non-linear effects. SHAP analysis provided insights into individual predictions and overall variable contributions, clarifying the complex PFAS-COPD relationship. The deployed web-based calculator enables interactive prediction and risk interpretation, supporting potential public health applications.CatBoost identified PFOS and PFUA as protective factors against COPD, while PFOA and MPAH increased risk of COPD. These findings emphasize the need for stricter PFAS regulation and highlight the potential of machine learning in guiding prevention strategies.

Keywords: chronic obstructive pulmonary disease, machine learning, Partial dependence plot, Shapley additive explanations, Environment pollution

Received: 30 Mar 2025; Accepted: 18 Jun 2025.

Copyright: © 2025 Ying, Zhang, Wang and Chen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Yuting Wang, Nanjing Jiangbei Hospital, Affiliated Nanjing Jiangbei Hospital of Xinglin College, Nantong University, Nanjing, China
Xueqin Chen, The Affiliated Taizhou People's Hospital, Nanjing Medical University, Taizhou, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.