Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Oncol.

Sec. Thoracic Oncology

Volume 15 - 2025 | doi: 10.3389/fonc.2025.1633635

This article is part of the Research TopicTailored Strategies for Lung Cancer Diagnosis and Treatment in Special PopulationsView all 7 articles

Machine Learning-Driven Prognostic Prediction Model for Composite Small Cell Lung Cancer: Identifying Risk Factors with Network Tools and Validation Using SEER Data and External Cohorts

Provisionally accepted
Fei  LiFei Li1Mengfan  ZhaoMengfan Zhao1Linlin  CaoLinlin Cao2Shuai  QieShuai Qie3*
  • 1School of Nursing, Hebei University, Baoding, China
  • 2Nerological intensive care unit, Affiliated Hospital of Hebei University, Baoding, China
  • 3Radiation Oncology Department, Affiliated Hospital of Hebei University, Baoding, China

The final, formatted version of the article will be published soon.

Background: Lung cancer continues to be the primary cause of cancer-related mortality globally, with combined small cell lung carcinoma (C-SCLC) constituting a relatively uncommon yet highly aggressive subset of this disease. Despite its clinical significance, limited efforts have been made to develop survival prediction models tailored to the clinical characteristics of C-SCLC patients. Additionally, the interpretability of existing models remains limited.Methods:This study aimed to develop and validate an interpretable machine learning model for predicting survival outcomes in C-SCLC patients using clinical data from the SEER database and external validation with Chinese patient cohorts. Initially, we employed the Cox proportional hazards model for rigorous variable selection. Subsequently, through 10-fold cross-validation and grid search for optimal parameters, we selected the XGBoost model as the best-performing one among four candidates. Furthermore, we enhanced the model's interpretability by incorporating the SHapley Additive exPlanations (SHAP) method, which helped us understand the contribution of each variable within the model.Results: We constructed a predictive model using data from 1,230 SEER patients and validated it externally with data from 154 Chinese patients. The XGBoost model demonstrated excellent performance in predicting survival outcomes at 1-year, 3-year, and 5-year. The AUC values for the external validation cohort were 0.849, 0.830, and 0.811, respectively. SHAP analysis revealed that N stage, T stage, radiotherapy, surgery, and gender are key factors influencing the ML model's predictions. To enhance clinical utility, we have developed an interpretable web-based tool to predict patients' 1-year survival probability.Conclusion:The XGBoost model, integrating demographic and clinical factors of C-SCLC patients, demonstrated excellent predictive performance. Our web-based prediction tool will promote the development of personalized treatment strategies and optimize clinical decision-making.

Keywords: SEER (Surveillance Epidemiology and End Results) database, c-SCLC, SHAP (Shapley Additive explanation), machine learning, Composite Small Cell Lung Cancer

Received: 22 May 2025; Accepted: 07 Jul 2025.

Copyright: © 2025 Li, Zhao, Cao and Qie. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Shuai Qie, Radiation Oncology Department, Affiliated Hospital of Hebei University, Baoding, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.