ORIGINAL RESEARCH article

Front. Public Health

Sec. Environmental Health and Exposome

Volume 13 - 2025 | doi: 10.3389/fpubh.2025.1581717

Exploring the Relationship Between Per-and Polyfluoroalkyl Substances Exposure and Rheumatoid Arthritis Risk Using Interpretable Machine Learning

Provisionally accepted
Zhi  LiZhi Li1*Xinping  XuXinping Xu2Ke  ZhangKe Zhang3,4*
  • 1Nanjing Jiangbei Hospital, Affiliated Nanjing Jiangbei Hospital of Xinglin College, Jiangsu, China
  • 2Huai’an No. 3 People's Hospital, Huaian Second Clinical College of Xuzhou Medical University, jiangsu, China
  • 3Nanjing University of Chinese Medicine, Nanjing, China
  • 4Huai’an TCM Hospital Affiliated to Nanjing University of Chinese Medicine, Jiangsu, China

The final, formatted version of the article will be published soon.

Rheumatoid arthritis is a chronic autoimmune disease influenced by environmental exposures, including per-and polyfluoroalkyl substances (PFAS). Although previous studies have suggested links between PFAS and rheumatoid arthritis risk, none have used interpretable machine learning models for prediction. This study aimed to develop such a model to assess risk based on PFAS exposure.We analyzed data from 11,705 participants in the National Health and Nutrition Examination Survey (2003-2018). Twelve machine learning algorithms were evaluated using metrics including area under the curve (AUC), accuracy, sensitivity, specificity, and F1 score. Key predictors were identified using SHapley Additive exPlanations (SHAP). Partial dependence plots and locally weighted scatterplot smoothing (LOWESS) curves were used to examine nonlinear associations and exposure thresholds. A web-based risk calculator was developed to enhance clinical and public health applicability.CatBoost showed the best performance (AUC: 0.82; Accuracy: 74%; F 1 score: 0.62) and was selected for further interpretation. SHAP analysis identified perfluorooctane sulfonic acid (PFOS) and 2-(N-Methyl-perfluorooctane sulfonamido) acetic acid (MPAH) as major contributors to risk prediction. PFOS exhibited a U-shaped relationship with increased risk above 15.10 ng/mL, while MPAH showed a risk transition at 0.22 ng/mL. Waterfall plots illustrated the contribution of individual exposures. The interactive web-based calculator allows users to input PFAS levels and receive personalized rheumatoid arthritis risk estimates. It is freely available on Hugging Face Spaces (https://huggingface.co/spaces/Machine199710/RA_ML).This study demonstrates the potential of machine learning to predict rheumatoid arthritis risk based on PFAS exposure. The identified nonlinear patterns provide insights into environmental contributions to disease risk and may inform future prevention strategies.

Keywords: machine learning, Rheumatoid arthritis, PFAS, Shap, Environmental Pollution

Received: 22 Feb 2025; Accepted: 13 May 2025.

Copyright: © 2025 Li, Xu and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Zhi Li, Nanjing Jiangbei Hospital, Affiliated Nanjing Jiangbei Hospital of Xinglin College, Jiangsu, China
Ke Zhang, Nanjing University of Chinese Medicine, Nanjing, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.