Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Immunol.

Sec. Cancer Immunity and Immunotherapy

Development and validation of an interpretable machine learning model for predicting progression-free survival after immunotherapy in patients with non-small cell lung cancer: a multicentre study

Provisionally accepted
Ya  LiYa Li1,2Jie  PengJie Peng1,2*Ji  XiaJi Xia1Tianchu  HeTianchu He3Yong  HuYong Hu4Daobin  ZhouDaobin Zhou3Dan  ZouDan Zou2Benlan  LiBenlan Li1,2Min  ZhangMin Zhang5Zhongjun  HuangZhongjun Huang1Mi  ZhangMi Zhang6Xian  LiuXian Liu1Minfang  WangMinfang Wang1Hongyan  LuoHongyan Luo1Fangyang  LuFangyang Lu2Chuan  ZhangChuan Zhang2Xingxing  ZhaoXingxing Zhao2ShengFa  SuShengFa Su1*
  • 1Guizhou Medical University, Guiyang, China
  • 2The Second Affiliated Hospital of Guizhou Medical University, Kaili, China
  • 3Qiandongnan Prefecture People's Hospital, Kaili, China
  • 4The Fifth People's Hospital of Guiyang, Guiyang, China
  • 5Dujiangyan Shoujia Hospital, Chengdu, China
  • 6Panzhou People’s Hospital, Liupanshui, China

The final, formatted version of the article will be published soon.

Background: This study aimed to develop and validate an interpretable machine learning model that harnesses circulating tumor DNA (ctDNA) to predict progression-free survival (PFS) in patients with non-small cell lung cancer (NSCLC) undergoing immunotherapy, thereby addressing the inherent limitations of conventional biomarkers such as PD-L1 expression and tumor mutational burden. Methods: This multicenter study involved pretreatment ctDNA profiling of 441 patients with non-small cell lung cancer (NSCLC), stratified into three independent cohorts: a training set (n=303, OAK trial), a validation set (n=97, POPLAR trial), and a local test set (n=41, multicenter retrospective cohort, 2023–2024). Using 5-fold cross-validated LASSO-Cox (Least Absolute Shrinkage and Selection Operator-Cox Proportional Hazards) regression, 25 prognostic genomic features were identified for integration into an eXtreme Gradient Boosting (XGBoost) model. Model performance was systematically evaluated via three approaches: (1) discrimination metrics, including AUC with 95% confidence intervals, accuracy, sensitivity, and specificity; (2) Kaplan-Meier survival analysis complemented by log-rank testing; and (3) SHapley Additive exPlanations (SHAP) for interpreting feature importance. Results: The model exhibited robust predictive performance, with AUCs of 0.82 (training cohort), 0.79 (validation cohort), and 0.77 (test cohort). Key genomic predictors included TP53 mutations, which were associated with shorter PFS, and BRCA2 mutations, which correlated with longer PFS. SHAP analysis identified NOTCH1 as a novel predictive biomarker, whose feature contribution profile suggests a role in immune modulation in lung squamous cell carcinoma. Risk stratification significantly distinguished PFS outcomes (log-rank P < 0.05). Decision curve analysis confirmed the model's clinical utility, as it outperformed "treat-all" strategies. Conclusion: This study establishes a robust, interpretable ctDNA-derived machine learning algorithm for predicting PFS in NSCLC patients receiving immune checkpoint inhibitors. The identification of TP53, BRCA2, and NOTCH1 as biologically plausible predictive biomarkers advances understanding of immunotherapy response mechanisms and enables clinically actionable risk stratification to guide therapeutic decision-making. These findings underscore the need for prospective multicenter validation to facilitate translation into precision oncology practice.

Keywords: NSCLC, Immunotherapy, CtDNA, PFS, XGBoost

Received: 15 Aug 2025; Accepted: 03 Dec 2025.

Copyright: © 2025 Li, Peng, Xia, He, Hu, Zhou, Zou, Li, Zhang, Huang, Zhang, Liu, Wang, Luo, Lu, Zhang, Zhao and Su. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Jie Peng
ShengFa Su

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.