ORIGINAL RESEARCH article
Front. Immunol.
Sec. Cancer Immunity and Immunotherapy
Development and validation of an interpretable machine learning model for predicting progression-free survival after immunotherapy in patients with non-small cell lung cancer: a multicentre study
Provisionally accepted- 1Guizhou Medical University, Guiyang, China
- 2The Second Affiliated Hospital of Guizhou Medical University, Kaili, China
- 3Qiandongnan Prefecture People's Hospital, Kaili, China
- 4The Fifth People's Hospital of Guiyang, Guiyang, China
- 5Dujiangyan Shoujia Hospital, Chengdu, China
- 6Panzhou People’s Hospital, Liupanshui, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Background: This study aimed to develop and validate an interpretable machine learning model that harnesses circulating tumor DNA (ctDNA) to predict progression-free survival (PFS) in patients with non-small cell lung cancer (NSCLC) undergoing immunotherapy, thereby addressing the inherent limitations of conventional biomarkers such as PD-L1 expression and tumor mutational burden. Methods: This multicenter study involved pretreatment ctDNA profiling of 441 patients with non-small cell lung cancer (NSCLC), stratified into three independent cohorts: a training set (n=303, OAK trial), a validation set (n=97, POPLAR trial), and a local test set (n=41, multicenter retrospective cohort, 2023–2024). Using 5-fold cross-validated LASSO-Cox (Least Absolute Shrinkage and Selection Operator-Cox Proportional Hazards) regression, 25 prognostic genomic features were identified for integration into an eXtreme Gradient Boosting (XGBoost) model. Model performance was systematically evaluated via three approaches: (1) discrimination metrics, including AUC with 95% confidence intervals, accuracy, sensitivity, and specificity; (2) Kaplan-Meier survival analysis complemented by log-rank testing; and (3) SHapley Additive exPlanations (SHAP) for interpreting feature importance. Results: The model exhibited robust predictive performance, with AUCs of 0.82 (training cohort), 0.79 (validation cohort), and 0.77 (test cohort). Key genomic predictors included TP53 mutations, which were associated with shorter PFS, and BRCA2 mutations, which correlated with longer PFS. SHAP analysis identified NOTCH1 as a novel predictive biomarker, whose feature contribution profile suggests a role in immune modulation in lung squamous cell carcinoma. Risk stratification significantly distinguished PFS outcomes (log-rank P < 0.05). Decision curve analysis confirmed the model's clinical utility, as it outperformed "treat-all" strategies. Conclusion: This study establishes a robust, interpretable ctDNA-derived machine learning algorithm for predicting PFS in NSCLC patients receiving immune checkpoint inhibitors. The identification of TP53, BRCA2, and NOTCH1 as biologically plausible predictive biomarkers advances understanding of immunotherapy response mechanisms and enables clinically actionable risk stratification to guide therapeutic decision-making. These findings underscore the need for prospective multicenter validation to facilitate translation into precision oncology practice.
Keywords: NSCLC, Immunotherapy, CtDNA, PFS, XGBoost
Received: 15 Aug 2025; Accepted: 03 Dec 2025.
Copyright: © 2025 Li, Peng, Xia, He, Hu, Zhou, Zou, Li, Zhang, Huang, Zhang, Liu, Wang, Luo, Lu, Zhang, Zhao and Su. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Jie Peng
ShengFa Su
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
