AUTHOR=Zhang Yalong , Zhang Zunni , Wei Liuxiang , Wei Shujing TITLE=Construction and validation of nomograms combined with novel machine learning algorithms to predict early death of patients with metastatic colorectal cancer JOURNAL=Frontiers in Public Health VOLUME=Volume 10 - 2022 YEAR=2022 URL=https://www.frontiersin.org/journals/public-health/articles/10.3389/fpubh.2022.1008137 DOI=10.3389/fpubh.2022.1008137 ISSN=2296-2565 ABSTRACT=Purpose: The purpose of this study was to investigate the clinical and non-clinical characteristics that may affect the early death of patients with metastatic colorectal carcinoma (mCRC) and develop accurate prognostic predictive models for mCRC. Method: A total of 35,639 mCRC patients diagnosed from 2010 to 2019 were obtained from the SEER database. All the patients were randomly divided into a training cohort and a validation cohort in a ratio of 7:3. The X-tile software was utilized to identify the optimal cut-off point for age and tumor size. Univariate and multivariate logistic regression models were used to determine the independent predictors associated with overall early death and specific early death caused by mCRC. Simultaneously, predictive and dynamic nomograms were constructed. Moreover, logistic regression, random forest, CatBoost, LightGBM, and XGBoost were included to establish machine learning (ML) models. Additionally, receiver operating characteristic curves (ROC) and calibration plots were obtained to estimate the accuracy of the models. Decision curve analysis (DCA) was employed to determine the clinical benefits of ML models. Results: The optimal cut-off points were 58 and 77 years of age and tumor size of 45 and 76. Fifteen independent risk factors including age, marital status, race, tumor localization, histologic type, grade, N-stage, tumor size, surgery, radiation, chemotherapy, bone-metastasis, brain-metastasis, liver metastasis, lung metastasis were significantly associated with the overall early death of mCRC patients and cancer-specific early death of mCRC patients, following which the nomogram was constructed. The ML models revealed that the random forest excelled at properly predicting outcomes, followed by logistic regression, CatBoost, XGBoost, and LightGBM models. Compared with other algorithms, the random forest provided more clinical benefits than other models and could be used to support clinical decisions in overall early death and specific early death caused by mCRC. Conclusion: ML algorithms combined with nomograms may play an important role in distinguishing early deaths owing to mCRC and potentially help clinicians make clinical decisions and follow-up strategies.