AUTHOR=Talebi Raheleh , Celis-Morales Carlos A. , Akbari Abolfazl , Talebi Atefeh , Borumandnia Nasrin , Pourhoseingholi Mohamad Amin TITLE=Machine learning-based classifiers to predict metastasis in colorectal cancer patients JOURNAL=Frontiers in Artificial Intelligence VOLUME=Volume 7 - 2024 YEAR=2024 URL=https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2024.1285037 DOI=10.3389/frai.2024.1285037 ISSN=2624-8212 ABSTRACT=Background: The increasing prevalence of colorectal cancer (CRC) in Iran over the past three decades has made it a key public health burden. The goal of the study was to predict metastasis in CRC patients using machine learning (ML) approaches in terms of demographic and clinical factors. Methods: This study focuses on 1,127 CRC patients who underwent appropriate treatments at Taleghani Hospital, a tertiary care facility. The patients were divided into training and test datasets in a 80:20 ratio. various ML methods, including Naive Bayes (NB), Random Forest (RF), Support Vector Machine (SVM), Neural Network (NN), Decision Tree (DT) and Logistic Regression (LR) were used to predict metastasis in CRC patients. Model performance was evaluated using 5-fold cross validation and reporting the sensitivity, specificity, the Area under the curve (AUC) and other indexes. Results: Out of 1,127 patients, 183(16%) experienced metastasis. In predicting of metastasis, both NN and RF algorithms had the highest AUC, while SVM ranked third in both the original and balanced datasets. The NN and RF achieved the highest AUC (100%), the highest sensitivity (100% and 100%, respectively) and accuracy (99.2% and 99.3%, respectively) on the balanced dataset, followed by the SVM with AUC of 98.8%, sensitivity of 97.5% and accuracy of 97%. Moreover, less FNR, FPR and more NPV can be confirmed by these two methods. The results also showed that all methods exhibited good performance in the test datasets and the balanced dataset improved the performance of most ML methods. The most important variables for predicting of metastasis were the tumor stage, number of involved lymph nodes, treatment type, respectively. In separate analysis on patients with tumor stage I to III, grade, size and stage of tumor are the most important features. Conclusion: This study indicated that NN and RF were the best approach among ML-based approaches for predicting of metastasis in CRC patients. Both tumor stage and the number of involved lymph nodes were considered as the most important features.