AUTHOR=Li Fei , Zhao Mengfan , Cao Linlin , Qie Shuai TITLE=Machine learning-driven prognostic prediction model for composite small cell lung cancer: identifying risk factors with network tools and validation using SEER data and external cohorts JOURNAL=Frontiers in Oncology VOLUME=Volume 15 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2025.1633635 DOI=10.3389/fonc.2025.1633635 ISSN=2234-943X ABSTRACT=BackgroundLung cancer continues to be the primary cause of cancer-related mortality globally, with combined small cell lung carcinoma (C-SCLC) constituting a relatively uncommon yet highly aggressive subset of this disease. Despite its clinical significance, limited efforts have been made to develop survival prediction models tailored to the clinical characteristics of C-SCLC patients. Additionally, the interpretability of existing models remains limited.MethodsThis study aimed to develop and validate an interpretable machine learning model for predicting survival outcomes in C-SCLC patients using clinical data from the SEER database and external validation with Chinese patient cohorts. Initially, we employed the Cox proportional hazards model for rigorous variable selection. Subsequently, through 10-fold cross-validation and grid search for optimal parameters, we selected the XGBoost model as the best-performing one among four candidates. Furthermore, we enhanced the model’s interpretability by incorporating the SHapley Additive exPlanations (SHAP) method, which helped us understand the contribution of each variable within the model.ResultsWe constructed a predictive model using data from 1,230 SEER patients and validated it externally with data from 154 Chinese patients. The XGBoost model demonstrated excellent performance in predicting survival outcomes at 1-year, 3-year, and 5-year. The AUC values for the external validation cohort were 0.849, 0.830, and 0.811, respectively. SHAP analysis revealed that N stage, T stage, radiotherapy, surgery, and gender are key factors influencing the ML model’s predictions. To enhance clinical utility, we have developed an interpretable web-based tool to predict patients’ 1-year survival probability.ConclusionThe XGBoost model, integrating demographic and clinical factors of C-SCLC patients, demonstrated excellent predictive performance. Our web-based prediction tool will promote the development of personalized treatment strategies and optimize clinical decision-making.