ORIGINAL RESEARCH article

Front. Public Health

Sec. Environmental Health and Exposome

Volume 13 - 2025 | doi: 10.3389/fpubh.2025.1536509

Epidemiological association and machine learning-based prediction of lung cancer risk linked to long-term lagged satellite-derived PM2.5 in China

Provisionally accepted
Feiran  WeiFeiran Wei1*Gaoqiang  FeiGaoqiang Fei2Shijun  YangShijun Yang3Huiying  WangHuiying Wang4Meng  ZhaoMeng Zhao1Jinyi  ZHOUJinyi ZHOU5Xiaobing  ShenXiaobing Shen1Renqiang  HanRenqiang Han5*
  • 1Southeast University, Nanjing, China
  • 2Jiangsu Cancer Hospital, Nanjing Medical University, Nanjing, Jiangsu Province, China
  • 3Guangxi Meteorological Observatory, Nanning, China
  • 4Lianyungang Meteorological Bureau, Lianyungang, China
  • 5Jiangsu Provincial Center for Disease Control And Prevention, Nanjing, Jiangsu Province, China

The final, formatted version of the article will be published soon.

Objectives: This study investigated association between long-term PM2.5 exposure and lung cancer incidence, focusing on Jiangsu Province, China. We aimed to explore the effects of historical PM2.5 with time lags and build a prediction model using machine learning methods.Study Design: An ecological epidemiology study.Methods: Lung cancer incidence data from Jiangsu Province (2014-2018) were combined with annual PM2.5 concentration data from satellite sources for the previous 10 years (lag 0 to lag 9). Correlation and grey correlation analyses were performed to evaluate the lagged relationship between PM2.5 exposure and lung cancer incidence. To address the multicollinearity problem in the data, ridge regression, support vector regression, and back propagation artificial neural network were employed. The combined prediction model was constructed using the optimal weighting method.Results: The incidence of lung cancer was significantly correlated with PM2.5 concentration at different historical time points, with the strongest correlation at lag 9. The combined prediction model that integrates multiple prediction methods showed higher accuracy and reliability in predicting lung cancer incidence than a single model.Conclusions: Long-term exposure to PM2.5, especially exposure with a long lag time, is closely related to lung cancer incidence. The integrated machine learning prediction model can be used as a reliable tool to assess the health risks of air pollution.

Keywords: PM2.5, lung cancer, Long-term exposure, machine learning, Prediction model, Public Health

Received: 29 Nov 2024; Accepted: 28 Apr 2025.

Copyright: © 2025 Wei, Fei, Yang, Wang, Zhao, ZHOU, Shen and Han. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Feiran Wei, Southeast University, Nanjing, China
Renqiang Han, Jiangsu Provincial Center for Disease Control And Prevention, Nanjing, 210028, Jiangsu Province, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.